EDRM Production Standards, Version 1

Lead author: Julie Brown (Vorys, Sater, Seymour and Pease LLP); Updated February 10, 2011

The purpose of this document is to outline standards for production of electronically stored information in discovery. The intent is for these standards to be easily communicated by attorneys at a meet and confer by referring to the category of production. The following definitions are provided regarding the forms of production (See the EDRM Production Guide for further clarification on the forms of production):

  • Native Format – Files are produced in the format in which they were originally created (Example: .docx produced in .docx; .pdf produced in .pdf, etc.)
  • Near-Native Format – Files are extracted or converted into another searchable format (Example: e-mails produced in .htm, .mht, or .rtf; Databases produced in .txt or .csv format)
  • Image (Near Paper) Format – Electronic files are converted to image format or paper is scanned to image format
  • Paper – Electronic files are printed to paper or paper files remain in paper format

The categories of production identified below include A1, A2, B1, B2, C1, C2, D and E. The descriptions of the standards are followed by a Quick Guide to Components of Productions A-D, a chart containing the Characteristics of Productions A-D and a chart containing the required metadata and other information fields. In addition to agreeing to one of these standards, the requesting party should tell the producing party which review tool they will be using. This information is needed to properly identify the components and formats required to successfully load the information into a review tool.

A. Native/Near-Native Production

E-mail, databases and proprietary files are produced in a near native format. Attachments and loose files are produced in native format. Only files requiring redaction are tiffed.

  1. Includes searchable text for redacted files:
    1. Each native /near-native file name matches the DocID. (I.e. DocID = ABC0000123; Filename = ABC0000123.doc for MS Word document.)
    2. Each searchable native/near native file has an extracted text file in .txt format named with the DocID of the corresponding file. Each non-searchable file containing text has a multipage OCR text file named with the DocID of the corresponding file. (I.e. DocID = ABC0000123; Filename = ABC0000123.txt.)
    3. Each file requiring redaction has group IV single page tifs. Each file requiring redaction has a unique bates number applied to images matching the DocID or Bates number. The same number may be applied to each page within a document or the numbers can increment by page.
    4. OCR for redacted files in multipage .txt format. Each file named the same as the DocID/Bates number of the corresponding document. (I.e. Image Filename = ABC0000123.tif; OCR Filename = ABC0000123.txt.)
    5. Load file(s) for native/near-native, images, extracted text and OCR files in EDRM xml or common format such as that required by Concordance or Summation.
    6. Data file including, at a minimum, the standard EDRM extracted metadata and other information fields to the extent they exist (see chart below). This data may be included in load file or produced as a separate text delimited file.
  2. Does not include searchable text for redacted files:
    1. Each native /near-native file name matches the DocID. (I.e. DocID = ABC0000123; Filename = ABC0000123.doc for MS Word document.)
    2. Each searchable native/near native file has an extracted text file in .txt format named with the DocID of the corresponding file. Each non-searchable file containing text has a multipage OCR text file named with the DocID of the corresponding file. (I.e. DocID = ABC0000123; Filename = ABC0000123.txt.)
    3. Each file requiring redaction has group IV single page tifs. Each file requiring redaction has a unique bates number applied to images matching the DocID or Bates number. The same number may be applied to each page within a document or the numbers can increment by page.
    4. Load file(s) for native/near-native, images, extracted text and OCR files in EDRM xml or common format such as that required by Concordance or Summation.
    5. Data file including, at a minimum, the standard EDRM extracted metadata and other information fields to the extent they exist (see chart below). This data may be included in load file or produced as a separate text delimited file.

B. Image (Near-Paper)/Native/Near-Native Production

Most files are converted to image format (tif, pdf, etc.) with the exception of files like MS Excel that are not usable in image format and/or paper scanned to image format and OCR’d.

  1. Includes searchable text for redacted files:
    1. Most Native/near native files are converted to group IV single page tif. Each file has a unique bates number applied to images matching the DocID or Bates number.
    2. Each searchable native/near native file has an extracted text file in .txt format named with the DocID of the corresponding file. Each non-searchable file containing text has a multipage OCR text file named with the DocID of the corresponding file. (I.e. DocID = ABC0000123; Filename = ABC0000123.txt.)
    3. Spreadsheets and files that are not usable in .tif format are produced in native or near-native format and named the same as the Doc ID. (I.e. DocID = ABC0000123; Filename = ABC0000123.xls for MS Excel document.)
    4. OCR for redacted files in multipage .txt format. Each file named the same as the DocID/Bates number of the corresponding document. (I.e. Image Filename = ABC0000123.tif; OCR Filename = ABC0000123.txt.)
    5. Load file(s) for native/near-native, images, extracted text and OCR files in EDRM xml or common format such as that required by Concordance or Summation.
    6. Data file including, at a minimum, the standard EDRM extracted metadata and other information fields to the extent they exist (see chart below). This data may be included in load file or produced as a separate text delimited file.
  2. Does not include searchable text for redacted files
    1. Most Native/near native files are converted to group IV single page tif. Each file has a unique bates number applied to images matching the DocID or Bates number.
    2. Each searchable native/near native file has an extracted text file in .txt format named with the DocID of the corresponding file. Each non-searchable file containing text has a multipage OCR text file named with the DocID of the corresponding file. (I.e. DocID = ABC0000123; Filename = ABC0000123.txt.)
    3. Spreadsheets and files that are not usable in .tif format will be produced in native or near-native format and named the same as the Doc ID. (I.e. DocID = ABC0000123; Filename = ABC0000123.doc for MS Word document.)
    4. Load file(s) for native/near-native, images, extracted text and OCR files in EDRM xml or common format such as that required by Concordance or Summation.
    5. Data file including, at a minimum, the standard EDRM extracted metadata and other information fields to the extent they exist (see chart below). This data may be included in load file or produced as a separate text delimited file.

C. Image Production

All files are converted to image format (tif, pdf, etc.) and/or paper is scanned to image format and OCR’d.

  1. Includes searchable text for redacted files:
    1. All Native/near native files are converted to group IV single page tif. Each file has a unique bates number applied to images matching the DocID or Bates number.
    2. All images are black & white except for those that require color for interpretation. Color images are produced in .jpg format unless otherwise agreed.
    3. Container files such as .zip or .rar may be converted to .tif format with a table of contents or referenced in the “folder” field containing the path to the original native file as it existed at the time of collection.
    4. Each searchable native/near native file has an extracted text file in .txt format named with the DocID of the corresponding file. Each non-searchable file containing text has a multipage OCR text file named with the DocID of the corresponding file. (I.e. DocID = ABC0000123; Filename = ABC0000123.txt.)
    5. OCR for redacted files in multipage .txt format. Each file named the same as the DocID/Bates number of the corresponding document. (I.e. Image Filename = ABC0000123.tif; OCR Filename = ABC0000123.txt.)
    6. Load file(s) for image files, extracted text and OCR in EDRM xml or common format such as that required by Concordance or Summation.
    7. Data file including, at a minimum, the standard EDRM extracted metadata and other information fields to the extent they exist (see chart below). This data may be included in load file or produced as a separate text delimited file.
  2. Does not include searchable text for redacted files:
    1. All Native/near native files are converted to group IV single page tif. Each file has a unique bates number applied to images matching the DocID or Bates number.
    2. Each searchable native/near native file has an extracted text file in .txt format named with the DocID of the corresponding file. Each non-searchable file containing text has a multipage OCR text file named with the DocID of the corresponding file. (I.e. DocID = ABC0000123; Filename = ABC0000123.txt.)
    3. Load file(s) for image files, extracted text and OCR in EDRM xml or common format such as that required by Concordance or Summation.
    4. Data file including, at a minimum, the standard EDRM extracted metadata and other information fields to the extent they exist (see chart below). This data may be included in load file or produced as a separate text delimited file.

D. Custom

  1. Images, Load File, Data file and no searchable text
  2. Images only
  3. Paper
  4. Other

E. On-line Production

Files presented for production via online review tool. Formats, fields, loads and exports to be negotiated on a case by case basis.

Quick Guide to Components of Productions A-D

ProductionNativeNear NativeImagesExtracted TextOCR TextSearchable Text for Redacted FilesLoad FileData File
A1xxxxxxxx
A2xxxxxxx
B1xxxxxxxx
B2xxxxxxx
C1xxxxxx
C2xxxxx
D1xxx
D2x

Characteristics of Productions A-D

CharacteristicsA1A2B1B2C1C2D1D2D3
Increase costs for image conversionxxxxxxx
Increase turn around time for image conversion of majority of data setxxxxxxx
Increase cost and turn around time for OCRing redacted filesxxx
Files are not searchablexxx
Files such as spreadsheets and small databases are not in a format conducive for reviewxxxxx
Cannot individually number or endorse pages for document controlxxxx
Cannot brand pages with confidentiality endorsementsxxxx
Risk of accidental alteration is greater than with image formatxxxx
Metadata may be hidden and not fully reviewed prior to productionxxxx
May require native application or provision of client’s proprietary software to open filesxxxx
Cost of conversion and printingx
No link back to native filexx
No database or text for searchingxx

Metadata and Other Information Fields

Fields for email (Not All Inclusive)Description
ATTACHMENTIDSDocids of attachment(s) to email/edoc
BATES RANGEBegin and end bates number of a document if it differs from DocID; this can be provided in one bates range field or 2 separate fields for the beginning and ending number
BCCNames of persons blind copied on an email
CCNames of persons copied on an email
CUSTODIANName of person from whom the file was obtained
DATERECEIVEDDate email was received
DATESENTDate email was sent
DOCEXTExtension of native document
DOCIDUnique number assigned to each file or first page
DOCLINKFull relative path to the current location of the native or near-native document used to link metadata to native or near native file
FILENAMEName of the original native file as it existed at the time of collection
FOLDERFile path/folder structure for the original native file as it existed at the time of collection
FROMName of person sending an email
HASHIdentifying value of an electronic record – used for deduplication and authentication; hash value is typically MD5 or SHA1
PARENTIDDocId of the parent document
RCRDTYPEIndicates document type, i.e., email; attachment; edoc; scanned; etc.
SUBJECTSubject line of an email
TIMERECEIVEDTime email was received in user’s mailbox
TIMESENTTime email was sent
TOName(s) of person(s) receiving email
Fields for edocs & Attachments (Not All Inclusive)Description
ATTACHMENTIDSDocIds of attachment(s) to email/edoc
AUTHORSName of person creating document
BATES RANGEBegin and end bates number of a document if it differs from DocID; this can be provided in one bates range field or 2 separate fields for the beginning and ending number
CUSTODIANName of person from whom the file was obtained
DATECREATEDDate document was created
DATESAVEDDate document was last saved
DOCEXTExtension of native document
DOCIDUnique number assigned to each file or first page
DOCLINKFull relative path to the current location of the native or near-native document used to link metadata to native or near native file
DOCTITLETitle given to native file
FILENAMEName of the original native file as it existed at the time of collection
FOLDERFile path/folder structure for the original native file as it existed at the time of collection
HASHIdentifying value of an electronic record – used for deduplication and authentication; hash value is typically MD5 or SHA1
PARENTIDDocId of the parent document
RCRDTYPEIndicates document type, i.e., email; attachment; email attachment (email); edoc; scanned; etc.