Session 10 - How to create your own "junkfile" for data carving practice
Requirements: Any Linux distribution will do.
Introduction
At the end of the Session 07 lab, there's a mention of how to create your own file to practice carving. These instructions will show you how to do it.
Concept
You can go back to Session 07
and have a look through it again, but here's a brief refresher.
Consider the following mental image of a "junkfile". This might be a file in its own right, or it might be part of a data dump from a damaged device where, depending on the filesystem, the information about where the file starts and ends has been destroyed somehow. And when you don't have that information any more, you need to rely on your understanding of hex, file type header/footer information, filesystems, etc. to try and carve the data manually from the data dump.
So, here the green represents random uninteresting data, and yellow represents the data we're after. In this case, we're after a JPEG picture. As you already know, JPEGs have their own specific header and footer information, being ffd8
and ffd9
, respectively, and in this picture, represented by the smaller red blocks.
How to create this marvel?
We've also already talked about the concept of Linux's special devices. The two most prominent ones are:
/dev/zero
(generates infinite amounts of zeroes)/dev/urandom
(pseudo-random number generator)
There's also /dev/random
but it works slightly different.
Think about them like a tap in the kitchen or bathroom. You open the tap and water runs until you close the tap. These devices do the same thing, but instead of water, it is random numbers or zeroes that flow. You can redirect this flow to anywhere using pipes |
and redirection arrows >
and <
.
The simplest and most straight-forward way to generate a junkfile is to combine a portion of random data with the actual data, and attach another portion of random data at the end. A bit like a hamburger.
Let's assume we have a picture called mypicture.jpg
.
$ ls -l
-rw-r--r-- danny users 2041 Tue Nov 15 20:57:28 2022 myimage.jpg
We can see that the file is around 2 kilobytes in size.
Let's now use dd
to create two random data portions, staying with our hamburger analogy top_bun.dd
, and bottom_bun.dd
.
$ dd if=/dev/urandom of=top_bun.dd bs=1 count=100
$ dd if=/dev/urandom of=bottom_bun.dd bs=1 count=100
Let's break this dd
command down:
if=
- input file, in this case we're using whatever/dev/urandom
generatesof=
- output file, the name of the file we want to dump the data tobs=
- byte size, the amount of bytesdd
should be processing at a time (i.e. block), by default it's512
bytes, but here we'll override that and set it to1
bytecount=
counts how many blocks (which have a set amount of bytes) should be copied. By default, it's set to infinite, but here we're tellingdd
to only copy100
blocks of1
bytes in size; which will give us a file that is100
bytes long
You're welcome to look at the generated files with xxd
or whatever tool you want to use.
The tool cat
is very useful to concatenate** data, in addition to outputting file contents to your terminal. The data can be anything. So, let's use it to sandwich the picture between two portions of random data.
Let's first check that everything is in place.
$ ls -l
-rw-r--r-- danny users 100 Sun Dec 3 17:14:41 2023 bottom_bun.dd
-rw-r--r-- danny users 2041 Tue Nov 15 20:57:28 2022 mypicture.jpg
-rw-r--r-- danny users 100 Sun Dec 3 17:14:42 2023 top_bun.dd
Looks good. Let's concatenate them together.
$ cat top_bun.dd mypicture.jpg bottom_bun.dd > junkburger01
Here we're telling cat to read each file in sequence, and without putting a break between them simply redirect (>
) the whole dump to the file junkburger01
instead of your terminal screen.
And that's it. You've successfully created your first junkfile you can use to practice data carving. You can analyse junkburger01
now with xxd
as you usually would.
A few things to note:
- You might have to change
xxd
's column width (check the man-page) in case theffd8
and/orffd9
portions are on a line-break- For maximum effect, experiment with different byte sizes for
top_bun.dd
andbottom_bun.dd
, they don't have to be100
bytes each, but I wouldn't go too large.- You will likely encounter false-positives. The random-ness will probably generate
ffd8
andffd9
in the junk data, even though those are not the beginning and end of your picture. This is to be expected. In such cases, you'll have to experiment until you identify the correctffd8
andffd9
. It's not hard, just annoying and requires you to iterate your carving attempts until you have it right.- Yes, that means practice.