The EDRM Data Set Project provides industry-standard, reference data sets of electronically stored information (ESI) and software files that can be used to test various aspects of e-discovery software and services.
These files may contain viruses, as can be the case with any set of files collected during discovery. Appropriate caution should be used when handling the files.
These files may contain personally identifiable information, in spite of efforts to remove that information. If you find PII that you think should be removed, please notify us at firstname.lastname@example.org.
EDRM ESI Reference Data Sets
This initiative collects, evaluates, and publishes ESI data sets for use in testing e-discovery software and services. There are currently four data sets available:
EDRM Enron Email v1 Data Set: An updated set of Enron e-mail messages and attachments.
EDRM File Format Data Set: 381 files covering 200 file formats.
EDRM Internationalization Data Set: A snapshot of selected Ubuntu localization mailing list archives covering 23 languages in 724 MB of email.