Session 07 - Analysis of Tarballed Message Logs
Requirements are found here
Extracting a tar
Archive
Assuming you're logged into your Linux machine, and you're in your home directory, change directory into your analysis
directory (if you don't have it any more, create it with mkdir
).
$ mkdir -p ~/analysis/logs
Ensure that you've transferred the logs.tar.gz
file over to your machine and that the file lives in ~/analysis/logs
.
Use the tar
command to interact with tarballed files (they are archives that have had compression added to them, a bit like ZIP).
The tar
command has a number of flags, such as:
t
- list archive contentsz
- decompress usinggunzip
v
- verbose outputf
- apply actions to the file in question (e.g.,logs.tar.gz
)
Let's first examine the content of the archive (here, we'll also be chaining the flags without each having a separate -
prefix, this only works on some commands, such as tar
):
$ tar -tzvf logs.tar.gz
What you should get back is an output of 5 log files (messages*
), numbering from log rotation.
Does anything look dangerous? Once you've satisfied yourself, you can proceed to actually "untar" the archive:
$ tar xzvf logs.tar.gz
(as you might have already guessed, x
stands for extract)
The 5 files you saw earlier should now be extracted in your current directory.
Look at one log entry from one of the files:
$ cat messages | head -n 1
(remember, if none of these commands mean anything to you, look them up with man
)
Look at the format of the log files:
- Date
- Time
- Hostname
- Generating application
- Message
Analysing Messages
We want useful information, and the logs all have the same structure. So, we can concatenate them:
$ cat messages* | less
(press q
if you want to exit less
, you can use your up and down arrow keys to page/scroll through the buffer).
There's a problem, though. The dates ascend, then jump to an earlier date, then ascend again. This is because the later log entries are added to the bottom of each file, so as files are concatenated, the dates appear out of order.
We can fix that with the power of command pipes:
$ tac messages* | less
(not sure what tac
does? Think about it, and look it up using man
)
Let's Get Funky
We want to manipulate specific fields using awk
. This command uses whitespace as its default field separator:
$ tac messages* | awk '{print $1" "$2}' | less
Sweet! Let's clean it up a bit, so we only have one entry per date:
$ tac messages* | awk '{print $1" "$2}' | uniq | less
(not sure what uniq
does? Again, think about it, and look it up using man
)
Nice. What about if we're interested in a particular date?
$ tac messages* | grep "Nov 4"
$ tac messages* | grep ^"Nov 4"
$ tac messages* | grep ^"Nov[ ]*4
The more complex looking search strings are called "RegEx", short for "Regular Expressions", and they are very powerful ways of searching for data.
Now, let's check some suspect entries, e.g., "Did not receive identification string from
$ tac messages* | grep "identification string" | less
Let's say we just want date (f1&f2), time (f3) and remote IP (last field, $NF, number of fields)
$ tac messages* | grep "identification string" | awk '{print $1" "$2" "$3" "$NF}' | less
We can make that even neater by introducing the tab escape character \t
.
$ tac messages* | grep "identification string" | awk '{print $1" "$2"\t"$3"\t"$NF}' | less
Report Time
We can use all of this to easily create a delicious looking report.
First, let's write a header:
$ echo "hostname123: Log entries from /var/log/messages" > report.txt
Now let's add what this part of the report is about:
$ echo "\"Did not receive identification string\":" >> report.txt
(the >>
amends the already existing file, if you typed >
then you've overwritten the previous message, so make sure you use >>
to amend!)
Now let's add the actual information to the report:
$ tac messages* | grep "identification string" | awk '{print $1" "$2"\t"$3"\t"$NF}' >> report.txt
Let's continue by adding a sorted list of unique IP addresses, we start with the header information so we know what this part is about:
$ echo "Unique IP addresses:" >> report.txt
Now let's add the actual unique information:
$ tac messages* | grep "identification string" | awk '{print $NF}' | sort -u >> report.txt
BAM! Awesome. As you can probably already tell, this lends itself really well to be completely automated in, say, a script? You could feed this information into a script that then continues with some automation tasks, such as taking the IP addresses and putting them through nslookup
and whois
queries to get more details about the IP addresses.
Anyway, now have a look at your report, which should look like this:
$ less report.txt
hostname123: LOg entries from /var/log/messages
"Did not receive identification string":
Nov 22 23:48:47 19x.xx9.220.35
Nov 22 23:48:47 19x.xx9.220.35
Nov 20 14:13:11 200.xx.114.131
Nov 18 18:55:06 6x.x2.248.243
Nov 17 19:26:43 200.xx.72.129
Nov 17 10:57:11 2xx.71.188.192
Nov 17 10:57:11 2xx.71.188.192
Nov 17 10:57:11 2xx.71.188.192
Nov 11 16:37:29 6x.x44.180.27
Nov 11 16:37:29 6x.x44.180.27
Nov 11 16:37:24 6x.x44.180.27
Nov 5 18:56:00 212.xx.13.130
Nov 5 18:56:00 212.xx.13.130
Nov 5 18:56:00 212.xx.13.130
Nov 3 01:52:31 2xx.54.67.197
Nov 3 01:52:01 2xx.54.67.197
Nov 3 01:52:00 2xx.54.67.197
Oct 31 16:11:17 xx.192.39.131
Oct 31 16:11:17 xx.192.39.131
Oct 31 16:11:17 xx.192.39.131
Oct 31 16:10:51 xx.192.39.131
Oct 31 16:10:51 xx.192.39.131
Oct 31 16:10:51 xx.192.39.131
Oct 27 12:40:26 2xx.x48.210.129
Oct 27 12:40:26 2xx.x48.210.129
Oct 27 12:40:26 2xx.x48.210.129
Oct 27 12:39:38 2xx.x48.210.129
Oct 27 12:39:37 2xx.x48.210.129
Oct 27 12:39:37 2xx.x48.210.129
Oct 27 12:39:14 2xx.x48.210.129
Oct 27 12:39:13 2xx.x48.210.129
Oct 27 12:39:13 2xx.x48.210.129
Oct 27 12:39:02 2xx.x48.210.129
Oct 27 12:39:01 2xx.x48.210.129
Oct 27 12:39:01 2xx.x48.210.129
Unique IP addressess:
19x.xx9.220.35
200.xx.114.131
200.xx.72.129
212.xx.13.130
2xx.54.67.197
2xx.71.188.192
2xx.x48.210.129
6x.x2.248.243
6x.x44.180.27
xx.192.39.131
Neat.
Technically, you could even save your command history up until now to record what you did and how you did it:
$ history -w history.txt
Which you then could feed into the beginnings of a script:
$ tail history.txt > reportmaker.sh
But that's for another time.