How to split a huge size log file and only get what you want in Linux?

October 02, 2015

Today I encounter a task as follow:

I need to extract a 7.3GB Apache access log and ONLY grab the access log starting from 30 August and up to today.

I don't need to say it is non-sense to open a 7.3 GB file in a text editor, even in vi under linux, the RAM that is used is huge, and it just doesn't work that way.

So to solve this problem, here are the steps I took, thanks to the following reference:

- http://stackoverflow.com/questions/3066948/how-to-file-split-at-a-line-number

Find out the exact string first occurrence in the log file and print the first 5 lines. Actually I can only print one line but I just like 5:

grep -n "30/Aug/2015" access_log | head -n 5

The returned line will be:

61445828:203.129.95.51 - - [30/Aug/2015:00:00:01 +0800] "GET <somewebsite>/index.htm HTTP/1.1" 200 10824
The first item: 61445828 is the line number

Count the total number of lines in access log, and get the number:

wc -l access_log

The return is: 64328208 access_log.old
Now, do this calculation: (Total line of access_log - Starting line of required text) = Starting line, so we just need the log starting from: 64328208 - 61445828 = 2882380. 2882380 is the starting line number
We export the content starting from 2282380 to a new file:

tail -n 2882380 access_log > custom_log_file_for_analysis.log

Now that file is just 414 MB, which is 95% smaller than the original file.

You can now do whatever analysis you want with this file.

Hope it helps someone!

Search This Blog

My Life

How to split a huge size log file and only get what you want in Linux?

Comments

Popular posts from this blog

TCPDF How to show/display Chinese Character?

How to fix fancy box/Easy Fancybox scroll not work in mobile

Wordpress Load balancing: 2 web servers 1 MySQL without any Cloud services