Standard Linux Tools - Working with hashes
Introduction
These notes cover some new commands and try to put some together using pipes (|
). We'll generate a list of hashes and use this list to match files. Advanced topics in this session:
- Command substitution
- Return value of commands
- Conditional execution of commands
Working with Hashes
A hash is basically a (very big) number which can be used to identify files. Sometimes referred to as a fingerprint of a file. In principle there exist many more files than hashes. Thus, in theory there have to be different files which map to the same hash. Two types of hashes that we'll use (though they're outdated and shouldn't be used for anything serious in forensics): Message Digest 5 (MD5), and Secure Hash Algorithm 1 (SHA1).
md5sum
To calculate the MD5 hash of a file, use:
$ md5sum myfile
To calculate the MD5 hashes of all files in a directory:
$ md5sum *
To keep the results:
$ md5sum * > hashes.txt
To keep results without filenames:
$ md5sum * | cut -c1-32 > hashes.txt
sha1sum
To calculate the SHA1 hash of a file:
$ sha1sum myfile
To calculate the SHA1 hashes of all files in a directory:
$ sha1sum *
To keep the results:
$ sha1sum * > hashes.txt
Matching Files using MD5
We will try to match some files based on hash. There are many more programs which can do this. However, we'll see many tricks which can be done in the shell. Also, it's a nice opportunity to show how to combine some of the commands we already learned.
Intermezzo - find
When investigating many files, the find
command may come in handy. We've seen already some examples of using find
. Now, we look at the printf
function of find
. printf
takes a format string and prints information about every file it encounters. The format string determines what information about the file will be printed. The format string may also contain arbitrary characters which will be printed verbatim.
Intermezzo - command substitution
Command substitution allows replacing the command itself by its output (we saw this earlier in the course with the use of back ticks (``)). Syntax for this looks like follows:
`$(command)`
Substitution is used to execute a command "in place" and reuse the output at the position of the original command. For example:
Check output of: $ which md5sum
Then run: $ ls -l $(which md5sum)
The above first expands to ls -l /usr/bin/md5sum
and will subsequently be executed.
Intermezzo - return values of commands
Every command returns a value to the shell when it finishes execution. For example grep
returns 0
when it finds a match and 1
when it does not find a match. Consult the man
page to find out what a command returns under which condition.
You can't "see" the return value. It's not printed on the terminal. You can, however, ask the shell what the return value of the last command was. It is stored in the variable $?
. To see the value of this variable use the following echo
command:
$ echo $?
Intermezzo - conditional execution
The return value of a command can be used to conditionally execute a subsequent command:
$ cmd1 && cmd2
cmd2
will only be executed if cmd1
returns 0
Example using grep
and echo
:
$ grep -q root /etc/passwd && echo Match
If grep
finds a match it will return 0
. "echo "Match"
" will only execute when the previous command returns 0
. Therefore, "echo "Match"
" will only run when grep
finds a match.
Consult the man
page to find out what -q
does.
Combining what we've learned
Consider the following:
$ grep -q $(md5sum img001.jpg | cut -c1-32) hashes.txt && echo "Match"
What does it do?
md5sum img001.jpg
: calculates the Hash| cut -c1-32
: filters out the first 32 characters of the output ofmd5sum
(the hash)- So, this expands to "
grep <hash_of_img001.jpg> hashes.txt && echo Match
" (we've already seen what this does)
Combining with find
The previous example works fine for one file. But maybe we have thousands of files all scattered around the place. Let's combine it with find
.
Consider the following:
$ find . -type f -printf 'grep -q $(md5sum %p | cut -c1-32) hashes.txt && echo Match %p \n'
This prints commands like we've seen above, only with different filenames. Note the use of single quotes, as opposed to double quotes, in the printf
directive. They prevent the shell from interpreting the $()
construct too early.
So, now we have a bunch of printed commands on our screen. But it doesn't do anything, it just sits there. We have to make the commands run.
Combining commands
We have seen that output of one command can be used as input to another command using a pipe (|
). Remember cmd1 | cmd2
. Of course, what find
prints is the output of find
. Could we send this output to a shell and make it more useful?
Yes, we can, and the shell will do what it normally does if you type a command and press enter. So, we can just generate a bunch of commands with find
and send them to bash
. Remember, bash
is just a program interpreting strings.
So, let's try to pipe the output of find
to a shell (in this case, bash
):
$ find . -type f -printf 'grep -q $(md5sum %p | cut -c1-32) hashes.txt && echo Match %p \n' | bash
Of course, this is much slower than all the other tools which you could use to match files with hashes. The point here is to show you the power of combining commands into command chains.
Taking it further
We can take the previous example even further (you need to ensure that ~/matchedfiles/
exists):
$ find . -type f -printf '(grep -q $(md5sum %p | cut -c1-32) hashes.txt && mv %p ~/matchedfiles/%f) || echo No Match %p \n' | bash
- Note the use of
||
: Only execute if previous command returned1
- Note use of
%f
: Print filename without the leading pathname - Note use of
mv
: Move the matched files to directory~/matchedfiles/
- Note the use of
()
: Make the 2 commands act as one