Importing NSRL hash sets from NIST

You can import the National Software Reference Library (NSRL) data set as a hash set in to OSForensics.

The NSRL is a project by the U.S. Department of Justice's National Institute of Justice (NIJ), federal, state, and local law enforcement, and the National Institute of Standards and Technology (NIST). They collect software profiles into a Reference Data Set (RDS) which allow you to review and identify files by their digital signatures.

The RDS data, however, is in text format which makes it slow to search normally. Fortunately, it is now possible to import the whole set into OSForensics for fast searching. Once imported the entire set can be searched within a second. You can also browse the entire set as a tree view and extract portions of the set into new hash set databases.

Download and Importing

  1. Download the dataset from http://www.nsrl.nist.gov/ (check the "Downloads" section). Currently the data set is distributed as a set of four ISO (.iso) files. To access the contents of these files, you will either need to burn them to DVD or mount them using a virtual disk manager such as OSFMount. We will assume the latter in this tutorial (skip to step 3 if you have burnt the ISO on a DVD).
  2. Start up OSFMount and click "File"->"Mount new virtual disk". You will see the window below, click the "..." button next to the image filepath and select the ISO file you downloaded as shown. Leave the default settings and click "OK".
    Mounting the ISO file downloaded from NSRL in OSFMount
  3. Browse to the drive containing the RDS disc (either your DVD drive or the drive letter assigned in OSFMount in the step above). On each of the RDS discs is a ZIP (.zip) file. You will need to unzip this to a folder on your hard disk. This can be any temporary folder, in our example, we've created a folder named "NSRLData" to unzip/copy these files to. IMPORTANT: Keep the files from each ZIP file in a seperate folder. So if you download more than one discs, you should create sub-folders for each, like "Disc 1", "Disc 2", etc.
    Unzip files
  4. Create a new empty database in OSForensics, you may import to a non-empty database but this is not recommended.
  5. Make the new database active.
  6. Finally, click the "NSRL Import.." button and then select the root folder for all the unzipped sub folders (i.e. in the example above, this would be the "NSRLData" folder and not "Disc 1").

Import duration

Note that due to the large amount of data in these hash sets, this process can take a very long time to complete. In fact, on some slower systems, this can take up to several days. One way to make the process more manageable is to only import one disk at a time. This would mean in step 3 above you would only extract one of the zips, then remove it and extract the next and repeat the process importing into the same database. This is one scenario where importing to a non-empty database is recommended. This will actually take more time total but breaks the task up into shorter steps. You can also back-up the database in between each import in case an error occurs this way.

Another way to speed up the process is to make sure the database is on a solid state hard drive or a RAM drive. Import time is highly dependent on the random seek read/write performance of the drive. On an average system with a normal hard drive the process takes about 50 hours. On a RAM drive the process has been seen to take as little as 10-15. A solid state drive will likely have a import time somewhere between these two figures.

Data set size

The December 2010 release is made up of 4 CDs (4 x 378MB = 1.5GB). Each of which contain compressed data. When uncompressed the data set is 7.2GB of comma separated text. Within this data set is information and hashes for over 62 million files and almost 19 million SHA-1 values.

Once the 4 discs are imported, and indexes added to the data, the uncompressed size is approximately 9.8GB. This includes both the MD5 and SHA1 hash values. Not all tools import both hash values, but OSF does.