Session 09 - Standard Linux Tools - Gathering System Information - EXERCISE
Requirements:
access_log.tgz
(on Blackboard)
- Get the file
access_log.tgz
access_log.tgz
is a tarred and compressed file. Use thefile
command to verify this.
Click to reveal answer
$ file access_log.tgz
access_log.tgz: gzip compressed data, last modified: Thu Apr 26 07:50:09 2007, from Unix, original size modulo 2^32 768000
- Use
tar
to extract and decompress the file.
Click to reveal answer
$ tar tzf access_log.tgz
access_log
access_log
is a log file from an Apache web server. Where would this file normally be located?
Click to reveal answer
$ tar xzf access_log.tgz
$ ls -l
-rw-r--r-- 1 user users 756899 Apr 26 2007 access_log
-rw-r--r-- 1 user users 36991 Nov 28 12:41 access_log.tgz
- Is it text or binary? Use
file
to find out.
Click to reveal answer
$ file access_log
- What is the size of the file in bytes?
Click to reveal answer
756899
- How many lines are in this file?
Click to reveal answer
$ wc -l access_log
4224 access_log
- Look in the file to get some idea of the contents. Use at least three different commands.
Click to reveal answer
$ less access_log (exit less by pressing 'q')
$ cat access_log
$ head access_log
$ tail access_log
- Over what period of time did the web server use this file?
- Look at the beginning of the file (head)
- Look at the end of the file (tail)
Click to reveal answer
$ head access_log (note the top-most date and time)
$ tail access_log (note the bottom-most date and time)
- What is the first column of the log file? How are the columns separated?
Click to reveal answer
- Display the first column on your screen using
cut
Click to reveal answer
$ cut -d " " -f1 access_log
145.99.163.83
145.99.163.83
62.234.138.228
213.84.219.101
...
- Display and
sort
the first column
Click to reveal answer
$ cut -d " " -f1 access_log | sort
128.30.52.13
128.30.52.13
128.30.52.13
128.30.52.34
128.30.52.34
...
- Reuse the previous command and add
less
to the command chain
Click to reveal answer
$ cut -d " " -f1 access_log | sort | less (exit less by pressing 'q')
- Use
uniq
to filter out duplicates in the first column
Click to reveal answer
$ cut -d " " -f1 access_log | sort | uniq
128.30.52.13
128.30.52.34
131.211.84.18
133.27.228.132
...
- Reuse the previous command and add
less
to the command chain
Click to reveal answer
$ cut -d " " -f1 access_log | sort | uniq | less (exit less by pressing 'q')
- How many unique IP addresses are there in the first column?
Click to reveal answer
$ cut -d " " -f1 access_log | sort | uniq | wc -l
117
- Use
less
again to look at theaccess_log
file, and see if you can spot entries that include a mention of 'Googlebot'.
Click to reveal answer
Example:
66.249.65.115 - - [10/Apr/2006:19:09:11 +0200] "GET /robots.txt HTTP/1.1" 404 296 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.115 - - [10/Apr/2006:19:09:11 +0200] "HEAD / HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
- This web server has been regularly visited by the Googlebot. Use
grep
to find the lines which contain the string 'Googlebot'
Click to reveal answer
$ grep Googlebot access_log OR
$ grep -i googlebot access_log (what does -i do?)
- When did the Googlebot first visit?
Click to reveal answer
$ grep -i googlebot access_log | head
- When did the Googlebot last visit?
Click to reveal answer
$ grep -i googlebot access_log | tail
- From how many different IP addresses did the Googlebot visit the web server? Solve in one command using pipes or use temporary files. Use
sort
,grep
,cut
,wc
, anduniq
. Thecut
command needs the flags-f
and-d
.
Click to reveal answer
$ grep -i googlebot access_log | cut -d " " -f1 | sort | uniq | wc -l
73
- **BONUS: Create a command chain (aka. "oneliner") that displays the top 10 most used Googlebot IP addresses.
Click to reveal answer
$ grep -i googlebot access_log | cut -d " " -f1 | sort | uniq -c | sort -n -r | head