Session 07 - Analysis of Tarballed Message Logs

Requirements are found here

Extracting a tar Archive

Assuming you're logged into your Linux machine, and you're in your home directory, change directory into your analysis directory (if you don't have it any more, create it with mkdir).

$ mkdir -p ~/analysis/logs


Ensure that you've transferred the logs.tar.gz file over to your machine and that the file lives in ~/analysis/logs.


Use the tar command to interact with tarballed files (they are archives that have had compression added to them, a bit like ZIP).

The tar command has a number of flags, such as:

  • t - list archive contents
  • z - decompress using gunzip
  • v - verbose output
  • f - apply actions to the file in question (e.g., logs.tar.gz)

Let's first examine the content of the archive (here, we'll also be chaining the flags without each having a separate - prefix, this only works on some commands, such as tar):

$ tar -tzvf logs.tar.gz

What you should get back is an output of 5 log files (messages*), numbering from log rotation.


Does anything look dangerous? Once you've satisfied yourself, you can proceed to actually "untar" the archive:

$ tar xzvf logs.tar.gz (as you might have already guessed, x stands for extract)


The 5 files you saw earlier should now be extracted in your current directory.

Look at one log entry from one of the files:

$ cat messages | head -n 1 (remember, if none of these commands mean anything to you, look them up with man)

Look at the format of the log files:

  • Date
  • Time
  • Hostname
  • Generating application
  • Message

Analysing Messages

We want useful information, and the logs all have the same structure. So, we can concatenate them:

$ cat messages* | less (press q if you want to exit less, you can use your up and down arrow keys to page/scroll through the buffer).


There's a problem, though. The dates ascend, then jump to an earlier date, then ascend again. This is because the later log entries are added to the bottom of each file, so as files are concatenated, the dates appear out of order.

We can fix that with the power of command pipes:

$ tac messages* | less (not sure what tac does? Think about it, and look it up using man)

Let's Get Funky

We want to manipulate specific fields using awk. This command uses whitespace as its default field separator:

$ tac messages* | awk '{print $1" "$2}' | less


Sweet! Let's clean it up a bit, so we only have one entry per date:

$ tac messages* | awk '{print $1" "$2}' | uniq | less (not sure what uniq does? Again, think about it, and look it up using man)


Nice. What about if we're interested in a particular date?

$ tac messages* | grep "Nov 4"

$ tac messages* | grep ^"Nov 4"

$ tac messages* | grep ^"Nov[ ]*4


The more complex looking search strings are called "RegEx", short for "Regular Expressions", and they are very powerful ways of searching for data.

Now, let's check some suspect entries, e.g., "Did not receive identification string from ".

$ tac messages* | grep "identification string" | less


Let's say we just want date (f1&f2), time (f3) and remote IP (last field, $NF, number of fields)

$ tac messages* | grep "identification string" | awk '{print $1" "$2" "$3" "$NF}' | less


We can make that even neater by introducing the tab escape character \t.

$ tac messages* | grep "identification string" | awk '{print $1" "$2"\t"$3"\t"$NF}' | less

Report Time

We can use all of this to easily create a delicious looking report.

First, let's write a header:

$ echo "hostname123: Log entries from /var/log/messages" > report.txt


Now let's add what this part of the report is about:

$ echo "\"Did not receive identification string\":" >> report.txt (the >> amends the already existing file, if you typed > then you've overwritten the previous message, so make sure you use >> to amend!)


Now let's add the actual information to the report:

$ tac messages* | grep "identification string" | awk '{print $1" "$2"\t"$3"\t"$NF}' >> report.txt


Let's continue by adding a sorted list of unique IP addresses, we start with the header information so we know what this part is about:

$ echo "Unique IP addresses:" >> report.txt


Now let's add the actual unique information:

$ tac messages* | grep "identification string" | awk '{print $NF}' | sort -u >> report.txt

BAM! Awesome. As you can probably already tell, this lends itself really well to be completely automated in, say, a script? You could feed this information into a script that then continues with some automation tasks, such as taking the IP addresses and putting them through nslookup and whois queries to get more details about the IP addresses.


Anyway, now have a look at your report, which should look like this:

$ less report.txt

hostname123: LOg entries from /var/log/messages
"Did not receive identification string":
Nov     22      23:48:47        19x.xx9.220.35
Nov     22      23:48:47        19x.xx9.220.35
Nov     20      14:13:11        200.xx.114.131
Nov     18      18:55:06        6x.x2.248.243
Nov     17      19:26:43        200.xx.72.129
Nov     17      10:57:11        2xx.71.188.192
Nov     17      10:57:11        2xx.71.188.192
Nov     17      10:57:11        2xx.71.188.192
Nov     11      16:37:29        6x.x44.180.27
Nov     11      16:37:29        6x.x44.180.27
Nov     11      16:37:24        6x.x44.180.27
Nov     5       18:56:00        212.xx.13.130
Nov     5       18:56:00        212.xx.13.130
Nov     5       18:56:00        212.xx.13.130
Nov     3       01:52:31        2xx.54.67.197
Nov     3       01:52:01        2xx.54.67.197
Nov     3       01:52:00        2xx.54.67.197
Oct     31      16:11:17        xx.192.39.131
Oct     31      16:11:17        xx.192.39.131
Oct     31      16:11:17        xx.192.39.131
Oct     31      16:10:51        xx.192.39.131
Oct     31      16:10:51        xx.192.39.131
Oct     31      16:10:51        xx.192.39.131
Oct     27      12:40:26        2xx.x48.210.129
Oct     27      12:40:26        2xx.x48.210.129
Oct     27      12:40:26        2xx.x48.210.129
Oct     27      12:39:38        2xx.x48.210.129
Oct     27      12:39:37        2xx.x48.210.129
Oct     27      12:39:37        2xx.x48.210.129
Oct     27      12:39:14        2xx.x48.210.129
Oct     27      12:39:13        2xx.x48.210.129
Oct     27      12:39:13        2xx.x48.210.129
Oct     27      12:39:02        2xx.x48.210.129
Oct     27      12:39:01        2xx.x48.210.129
Oct     27      12:39:01        2xx.x48.210.129
Unique IP addressess:
19x.xx9.220.35
200.xx.114.131
200.xx.72.129
212.xx.13.130
2xx.54.67.197
2xx.71.188.192
2xx.x48.210.129
6x.x2.248.243
6x.x44.180.27
xx.192.39.131

Neat.


Technically, you could even save your command history up until now to record what you did and how you did it:

$ history -w history.txt


Which you then could feed into the beginnings of a script:

$ tail history.txt > reportmaker.sh

But that's for another time.