De-Duplication

Definition(s)

  • De-duplication (“de-duping”) is the process of comparing electronic records based on their characteristics and removing duplicate records from the data set.  1  2
  • The process of providing one instance of an item when there was once two or more identical copies. This process usually involves landing all files into a database and then searching for duplicate files.  3
  • The process of identifying (or some vendors includes actually removing) additional copies of identical documents in a document collection. There are three types of de-duplication: case, custodian, and production.  4  5
  • The process of identifying (and/or removing) additional copies of identical documents in a document collection. There are three types of de-duplication: case, custodian, and production.  6
  • The method of data reduction that excludes duplicate messages (with their attachments) and files from further processing.  7
  • The process of removing duplicate records from a collection of data.  8
  • The process of determining which documents are duplicates. File systems can contain many copies of the same document, which need to be identified for efficiency’s sake. Every time an email is sent it typically creates two additional copies of the email and its attachments, one in the sender’s sent-items folder and once in the recipient’s inbox. An email may also be sent to multiple recipients, thereby creating more copies.

Notes

  1. Merrill Corporation, Electronic Discovery Glossary.
  2. Kroll Ontrack, Glossary of Termshttp://www.krollontrack.com/glossaryterms
  3. RenewData, Glossary (10/5/2005).
  4. Fios, E-Discovery Glossaryhttp://discoveryresources.org/01_electronic_discovery_glossary.html 
  5. RSI, Glossary.
  6. Vinson & Elkins LLP Practice Support, EDD Glossary.
  7. Ibis Consulting, Glossary.
  8. Legal Electronic Document Institute, Basic Principles of Automated Litigation Support (2005).