EDRM offers “Micro Datasets” designed for eDiscovery testing and process validation. Software vendors, litigation support organizations, law firms and others may use these smaller sets to qualify support, test speed and accuracy in indexing and search, and conduct more forensically oriented analytics exercises throughout the eDiscovery workflow.
The EDRM community thanks these members for their active participation in this important initiative:
The Public EDRM Micro Dataset is an approximately 136.9 MB zip file containing the latest versions of everything from Microsoft Office and Adobe Acrobat files to image files. The EDRM Dataset group has scoured the internet and found usable freely available data at universities, government sites and elsewhere, a selection of which are included in the zip file. The members EDRM Micro Dataset (initially available only to EDRM members) is similar to the initial public dataset but much larger, at approximately 5.5 GB.
The full dataset is sourced from publicly available data and free from copyright restrictions. It was assembled by the Digital Forensics Research Laboratories at the Auckland University of Technology, in collaboration with the EDRM Dataset team.
The EDRM Micro Dataset is valued for its large variety of file types and other challenges characteristic of ESI collected in discovery cases. The files have various levels of corruption, and the dataset contains a duplicate set of files that are encrypted, to support exception handling exercises and advanced testing.
The EDRM Micro Dataset mix of file types includes:
The Dataset team includes: