Standard Linux Tools - Working with hashes

Introduction

These notes cover some new commands and try to put some together using pipes (|). We'll generate a list of hashes and use this list to match files. Advanced topics in this session:

  • Command substitution
  • Return value of commands
  • Conditional execution of commands

Working with Hashes

A hash is basically a (very big) number which can be used to identify files. Sometimes referred to as a fingerprint of a file. In principle there exist many more files than hashes. Thus, in theory there have to be different files which map to the same hash. Two types of hashes that we'll use (though they're outdated and shouldn't be used for anything serious in forensics): Message Digest 5 (MD5), and Secure Hash Algorithm 1 (SHA1).

md5sum

To calculate the MD5 hash of a file, use:

$ md5sum myfile


To calculate the MD5 hashes of all files in a directory:

$ md5sum *


To keep the results:

$ md5sum * > hashes.txt


To keep results without filenames:

$ md5sum * | cut -c1-32 > hashes.txt

sha1sum

To calculate the SHA1 hash of a file:

$ sha1sum myfile


To calculate the SHA1 hashes of all files in a directory:

$ sha1sum *


To keep the results:

$ sha1sum * > hashes.txt

Matching Files using MD5

We will try to match some files based on hash. There are many more programs which can do this. However, we'll see many tricks which can be done in the shell. Also, it's a nice opportunity to show how to combine some of the commands we already learned.

Intermezzo - find

When investigating many files, the find command may come in handy. We've seen already some examples of using find. Now, we look at the printf function of find. printf takes a format string and prints information about every file it encounters. The format string determines what information about the file will be printed. The format string may also contain arbitrary characters which will be printed verbatim.

Intermezzo - command substitution

Command substitution allows replacing the command itself by its output (we saw this earlier in the course with the use of back ticks (``)). Syntax for this looks like follows:

`$(command)`

Substitution is used to execute a command "in place" and reuse the output at the position of the original command. For example:

Check output of: $ which md5sum


Then run: $ ls -l $(which md5sum)


The above first expands to ls -l /usr/bin/md5sum and will subsequently be executed.

Intermezzo - return values of commands

Every command returns a value to the shell when it finishes execution. For example grep returns 0 when it finds a match and 1 when it does not find a match. Consult the man page to find out what a command returns under which condition.

You can't "see" the return value. It's not printed on the terminal. You can, however, ask the shell what the return value of the last command was. It is stored in the variable $?. To see the value of this variable use the following echo command:

$ echo $?

Intermezzo - conditional execution

The return value of a command can be used to conditionally execute a subsequent command:

$ cmd1 && cmd2

cmd2 will only be executed if cmd1 returns 0


Example using grep and echo:

$ grep -q root /etc/passwd && echo Match


If grep finds a match it will return 0. "echo "Match"" will only execute when the previous command returns 0. Therefore, "echo "Match"" will only run when grep finds a match.

Consult the man page to find out what -q does.

Combining what we've learned

Consider the following:

$ grep -q $(md5sum img001.jpg | cut -c1-32) hashes.txt && echo "Match"

What does it do?

  • md5sum img001.jpg: calculates the Hash
  • | cut -c1-32: filters out the first 32 characters of the output of md5sum (the hash)
  • So, this expands to "grep <hash_of_img001.jpg> hashes.txt && echo Match" (we've already seen what this does)

Combining with find

The previous example works fine for one file. But maybe we have thousands of files all scattered around the place. Let's combine it with find.

Consider the following:

$ find . -type f -printf 'grep -q $(md5sum %p | cut -c1-32) hashes.txt && echo Match %p \n'

This prints commands like we've seen above, only with different filenames. Note the use of single quotes, as opposed to double quotes, in the printf directive. They prevent the shell from interpreting the $() construct too early.

So, now we have a bunch of printed commands on our screen. But it doesn't do anything, it just sits there. We have to make the commands run.

Combining commands

We have seen that output of one command can be used as input to another command using a pipe (|). Remember cmd1 | cmd2. Of course, what find prints is the output of find. Could we send this output to a shell and make it more useful?

Yes, we can, and the shell will do what it normally does if you type a command and press enter. So, we can just generate a bunch of commands with find and send them to bash. Remember, bash is just a program interpreting strings.

So, let's try to pipe the output of find to a shell (in this case, bash):

$ find . -type f -printf 'grep -q $(md5sum %p | cut -c1-32) hashes.txt && echo Match %p \n' | bash

Of course, this is much slower than all the other tools which you could use to match files with hashes. The point here is to show you the power of combining commands into command chains.

Taking it further

We can take the previous example even further (you need to ensure that ~/matchedfiles/ exists):

$ find . -type f -printf '(grep -q $(md5sum %p | cut -c1-32) hashes.txt && mv %p ~/matchedfiles/%f) || echo No Match %p \n' | bash

  • Note the use of ||: Only execute if previous command returned 1
  • Note use of %f: Print filename without the leading pathname
  • Note use of mv: Move the matched files to directory ~/matchedfiles/
  • Note the use of (): Make the 2 commands act as one