New EDRM Enron Email Data Set

The EDRM Enron v1 Data Set Cleansed of Private, Health and Financial Information

The Enron v1 data set previously hosted by EDRM ( has served for many years as an industry-standard collection of email data for electronic discovery training and testing. Since this data set was originally made available by FERC, it has been an open secret that it contained many instances of private, health and financial data about the company’s former employees.

Cleansing the data

Nuix specialists cleansed the EDRM Enron data set of private information. We identified and removed more than 10,000 items of information including:

  • 60 containing credit card numbers, including departmental contact lists that each contained hundreds of individual credit cards
  • 572 containing Social Security or other national identity numbers—thousands of individuals’ identity numbers in total
  • 292 containing individuals’ dates of birth
  • 532 containing information of a highly personal nature such as medical or legal matters.

Many items contained multiple instances and types of information. This included departmental contact list spreadsheets with dates of birth, credit card numbers, Social Security numbers, home addresses and other private details of dozens of staff members.

In removing these items and making the cleansed data set available to the community, we hope to protect the privacy of hundreds of individuals.

Nuix is also pleased to offer the legal and investigator community the methodology we used for identifying personal and financial data in corporate data sets.

  • Download our case study, “Removing PII from the EDRM Enron Data Set: Investigating the prevalence of unsecured financial, health and personally identifiable information in corporate data” for a detailed methodology. Download here

Download the cleansed EDRM Enron v1 data set

What risks lie in your data?

Although the EDRM Enron data set is more than 10 years old, most organizations still face significant risks relating to private information stored in their systems.

  • Using Nuix Investigator tools and the methodology outlined in our case study, you can identify inappropriately stored private, health and financial data and take immediate steps to remediate the risks involved.
  • Nuix also offers information governance products and solutions to locate and remediate these risks in emailfile shares, and archives.


These files may contain personally identifiable information, in spite of efforts to remove that information. If you find PII that you think should be removed, please notify us at

Print Friendly, PDF & Email