Session 10 - How to create your own "junkfile" for data carving practice

Requirements: Any Linux distribution will do.

Introduction

At the end of the Session 07 lab, there's a mention of how to create your own file to practice carving. These instructions will show you how to do it.

Concept

You can go back to Session 07 and have a look through it again, but here's a brief refresher.

Consider the following mental image of a "junkfile". This might be a file in its own right, or it might be part of a data dump from a damaged device where, depending on the filesystem, the information about where the file starts and ends has been destroyed somehow. And when you don't have that information any more, you need to rely on your understanding of hex, file type header/footer information, filesystems, etc. to try and carve the data manually from the data dump.

So, here the green represents random uninteresting data, and yellow represents the data we're after. In this case, we're after a JPEG picture. As you already know, JPEGs have their own specific header and footer information, being ffd8 and ffd9, respectively, and in this picture, represented by the smaller red blocks.

How to create this marvel?

We've also already talked about the concept of Linux's special devices. The two most prominent ones are:

/dev/zero (generates infinite amounts of zeroes)
/dev/urandom (pseudo-random number generator)

There's also /dev/random but it works slightly different.

Think about them like a tap in the kitchen or bathroom. You open the tap and water runs until you close the tap. These devices do the same thing, but instead of water, it is random numbers or zeroes that flow. You can redirect this flow to anywhere using pipes | and redirection arrows > and <.

The simplest and most straight-forward way to generate a junkfile is to combine a portion of random data with the actual data, and attach another portion of random data at the end. A bit like a hamburger.

Let's assume we have a picture called mypicture.jpg.

$ ls -l
-rw-r--r-- danny users 2041 Tue Nov 15 20:57:28 2022 myimage.jpg

We can see that the file is around 2 kilobytes in size.

Let's now use dd to create two random data portions, staying with our hamburger analogy top_bun.dd, and bottom_bun.dd.

$ dd if=/dev/urandom of=top_bun.dd bs=1 count=100
$ dd if=/dev/urandom of=bottom_bun.dd bs=1 count=100

Let's break this dd command down:

if= - input file, in this case we're using whatever /dev/urandom generates
of= - output file, the name of the file we want to dump the data to
bs= - byte size, the amount of bytes dd should be processing at a time (i.e. block), by default it's 512 bytes, but here we'll override that and set it to 1 byte
count= counts how many blocks (which have a set amount of bytes) should be copied. By default, it's set to infinite, but here we're telling dd to only copy 100 blocks of 1 bytes in size; which will give us a file that is 100 bytes long

You're welcome to look at the generated files with xxd or whatever tool you want to use.

The tool cat is very useful to concatenate** data, in addition to outputting file contents to your terminal. The data can be anything. So, let's use it to sandwich the picture between two portions of random data.

Let's first check that everything is in place.

$ ls -l
-rw-r--r-- danny users 100 Sun Dec 3 17:14:41 2023 bottom_bun.dd
-rw-r--r-- danny users 2041 Tue Nov 15 20:57:28 2022 mypicture.jpg
-rw-r--r-- danny users 100 Sun Dec 3 17:14:42 2023 top_bun.dd

Looks good. Let's concatenate them together.

$ cat top_bun.dd mypicture.jpg bottom_bun.dd > junkburger01

Here we're telling cat to read each file in sequence, and without putting a break between them simply redirect (>) the whole dump to the file junkburger01 instead of your terminal screen.

And that's it. You've successfully created your first junkfile you can use to practice data carving. You can analyse junkburger01 now with xxd as you usually would.

A few things to note:

You might have to change xxd's column width (check the man-page) in case the ffd8 and/or ffd9 portions are on a line-break

For maximum effect, experiment with different byte sizes for top_bun.dd and bottom_bun.dd, they don't have to be 100 bytes each, but I wouldn't go too large.

You will likely encounter false-positives. The random-ness will probably generate ffd8 and ffd9 in the junk data, even though those are not the beginning and end of your picture. This is to be expected. In such cases, you'll have to experiment until you identify the correct ffd8 and ffd9. It's not hard, just annoying and requires you to iterate your carving attempts until you have it right.

Yes, that means practice.

U19960 - Data Recovery and Analysis

Session 10 - How to create your own "junkfile" for data carving practice

Introduction

Concept

How to create this marvel?