Records Management - The Complexity of Metadata

From EDRM

Jump to: navigation, search
Information Management
The Records Management Program
The Objectives of a Comprehensive Records Management Program
Record Definition
Incorporation of Electronic Records into a RM Program
The Complexity of Metadata
Managing Information Copies/Duplicates
Records Storage and Maintenance
Storage/Access/Security/Disposition of Records - Legal Requirements
Records Disposition
Program Assessment/Audit
Training
Suspension of Records Destruction During Litigation Holds
When the Duty to Preserve Relevant Materials Arises
What Happens to Records Once Hold Terminated
Emerging Technologies
Records Management - RM Technology Solutions
E-Mail Content Filtering and Monitoring Software
E-Mail Archiving
Encryption and Security
EDMS/ECMS/RMS
Web-Based Compliance Training
Records Home Page
Other Solutions
Additional Materials
Participants

The Review and Produce steps of the discovery process often focus on the format(s) in which the parties exchange information. Some of the most frequently litigated issues have been: whether paper production suffices;[1] discoverability of metadata and/or embedded data; and whether to request/produce in native format or in "uniform image format" (TIFF or PDF).[2] Other oft-disputed issues are: the (in)sufficiency of unorganized sets of electronic information; and the produce-ability of databases and/or proprietary software to the responding party.[3]

However, it is still unsettled whether a potential litigant should be sufficiently clairvoyant to anticipate the ultimate discoverability of metadata.

Metadata is "data about data."[4] In electronic discovery, the three principal kinds of such data are: E-mail; File System; and Document (imbedded/embedded). Each requires slightly different review and production strategies. Thus, each may require different retention/destruction/preservation strategies.

In data processing systems, metadata provides information about all files and/or other data managed within an application or environment. Most programs create at least six basic types of "file system" metadata. Many systems create much more extensive metadata, including prior versions of files.[5]

(back to top)

Contents

File System Metadata

File System metadata - created by, e.g., Word and Excel - encompasses what a user sees in Windows Explorer, including:

  • File name;
  • Original author;
  • Information regarding by whom and when revisions were made;
  • Number of pages;
  • Number of characters;
  • File size;
  • Template used to create it;
  • Date created;
  • Date modified; and
  • Date printed.

Note that some categories, such as original author, may be very misleading whenever the current file started out as a "File...Save As" version of a predecessor file.

(back to top)

E-mail Metadata

E-mail Metadata is a broad category, in part because it includes specialized file metadata, such as From, To, cc, Subject, Date and Time Sent and the like. Other, less transparent e-mail file metadata can provide additional information, including the sender's domain, the route a message has traveled over the Internet, and where delays may have occurred between sending and receipt.

Understanding what types of metadata are available and what is required to collect and retain this metadata is critical to effective records retention. For example, getting a complete picture of all recipients of a given e-mail is often impossible through simply looking at the e-mail on an e-mail server:

  • If the e-mail was sent to a distribution list, knowing who was on that given distribution list at the time of distribution list expansion requires capturing distribution list membership at the time of transmission. Technologies such as Microsoft Exchange envelope journaling can be enabled to capture this information in Exchange 2003 or higher, but this functionality is not available in previous versions of Exchange Server.
  • If the e-mail was Blind Carbon Copied (Bcc-ed) to any recipients, this information will not be retained in any stored e-mail. Technologies such as Bcc Journaling in Exchange 2003 or higher can be enabled to capture this information, but are unavailable in previous versions of Exchange.

These examples underscore the need to understand what is going on technically in order to ensure that e-mail records are retained in a manner that meets expectations.

(back to top)

Document Metadata (a/k/a Embedded/Imbedded Data)

Embedded data can generally yield more surprises. For example, in Word or Excel, embedded data can track/capture:

  • Changes made;
  • Reviewer name(s); and
  • The sequence in which changes were made.

Embedded data is not necessarily revealed when a file's creator/modifier opens the file. However, it can be revealed if either:

  • Deleted text is still present in the bowels of the file; or
  • If the Track Changes feature was not used or was properly used.

Tracked changes is an item that is the tip of the iceberg, regardless of whether a file is in native format or has been converted to .pdf. This commonly used feature is quite familiar to many users and thus illustrative. If the creator or modifier merely un-highlighted the track changes, the recipient of a file can activate Word's Track Changes tool or Markup tool, thus revealing the revisions of a deactivated document.

Upon conversion to .pdf via Acrobat's PDFMaker, improperly-handled Tracked Changes may migrate to the .pdf file in unusual circumstances - namely when:

  • The Word file itself is incorporated in native format into the .pdf file;
  • Tracked changes are visible before and after the PDFMaker conversion; or
  • One's Word printing configuration is set to also print tracked changes.[6]

In the Word scenario, prior revisions can reside in multiple places in the file. Even in the .pdf scenario, some file metadata does migrate. Once in a converted file whose metadata has not been scrubbed,[7] "Ctrl+D" enables identification of the Title and original Author of the .pdf file at its first creation, in its original format.

(back to top)

Native vs. Electronic Document Discovery (EDD) [Platforms]

A "native data" file is one "[i]n the original file format in which [it was] created (i.e., in the specific software applications used to create each individual document)."[8] Examples are Microsoft Word, Microsoft Excel and WordPerfect.

In contrast, "uniform" or "standard" image format is an agreed-upon file format into which all different types of native files are converted solely for review and/or production in civil litigation.[9] Often tagged image format (TIFF) plus searchable index is uniform format; sometimes portable document format (PDF) is.[10]

"Searchable TIFF" is an oxymoron. It is a litigation fiction, reflecting the exchange of a set of imaged electronic files, which are accompanied by searchable text associated with those files.

Many strategies and cost issues determine whether to review, produce and/or seek files in their native format(s).[11] The relative technological and financial resources of the parties are likely to play a big role. So is the significance, or lack thereof, of metadata - such as spreadsheet formulas, tracked changes, creation date, e-mail fields, cross-file links, etc.[12] In some instances, it may be better for a party to review and/or produce in native format. In other instances, an EDD platform/database may be preferable.

(back to top)

Metadata - When Is It Discoverable?

Metadata/Imbedded-Data is discoverable when needed or relevant to a matter at hand.[13] The proposed amendment to Federal Rule 26(b)(2)(C) intentionally avoids specific reference to metadata; yet the associated comment evinces a desire to keep metadata from being produced absent an affirmative showing of need.[14] A stark example of a context in which metadata would be relevant would be when a contention of back-dating a file is at issue.[15]

(back to top)

Privilege

An important consideration mitigating against mandated retention of metadata is that buried therein may be material protected by the attorney-client privilege and/or the attorney work-product privilege.

The larger the amounts of electronic material that are produced in native format, the greater the odds that privileged content and/or metadata will get disclosed.[16] Ethical obligations and case law exist to mitigate the ramifications of an inadvertent disclosure. However, as a practical matter, once privileged matter has been disclosed to an opponent, the recipient will not be able to erase it from his/her memory. One cannot "un-ring" the bell.[17]

Most of the case law to date has dealt with hardcopy documents whose content contains the privileged material.[18] Yet, the principles from those cases are equally applicable to the brave new world of privileged material residing in metadata and embedded data. Electronic information is protected by the same traditional legal privileges applicable to paper, including the attorney/client privilege and the work product doctrine.[19] There are three case law approaches to privilege waiver: strict (intent is irrelevant); lenient (no waiver absent intentional conduct); and case-by-case multi-factor balancing tests.

The majority view is the case-by-case approach,[20] typical factors being: the reasonableness of the precautions taken relative to the production's size; the number of inadvertent disclosures; the extent of the disclosure(s); and whether remedial measures were taken (and, whether the producing party exhibited delay in effectuating them).[21]

The most ominous concern is that the privilege waiver may extend beyond the file(s)/document(s) in question to encompass the entire covered subject matter.[22]

On the other hand, if litigation does ensue, a litigant can still guard against a subsequent privilege waiver. Through counsel - perhaps in conjunction with the Fed. R. Civ. P. 26(f) conference - the litigant can enter into a "quick peek"[23] or "claw-back"[24] stipulation with its litigation opponent. It has become "[i]ncreasingly popular for a stipulation to have 'explicit provisions as to how [it]...will deal with documents inadvertently produced.'"[25]

The pending proposed amendments to Federal Rules 26(b)(5)(B) and 45(d)(2)(B) establish a procedure to guard against inadvertent privilege waivers.[26] The new provision would state in pertinent part that:

When a party produces information without intending to waive a claim of privilege it may, within a reasonable time, notify any party that received the information of its claim of privilege. After being notified, a party must promptly return or destroy the specified information and any copies.[27]

These provisions are likely to take effect on December 1, 2006.

(back to top)

Save Now; Scrub and/or Extract Later?

The average employee (and even some attorneys) is unaware of the potential liability that may be found in metadata. Educating employees on document creation and archiving methods to minimize that liability may alleviate the damage when forced to produce the documents in litigation. A comprehensive document management program may include a proactive metadata removal or "scrubbing" initiative within the firm. There are many software programs available that will minimize or remove metadata within the normal course of business operations and document management.

Any such policy should also take into account the data revealed in word processing programs such as "track changes" or "undo/redo" as well as the use of PDFs or other document types. Many of the scrubbing programs can be programmed to automatically remove that information when the document is saved or a separate redline analysis program can be used to monitor document revisions. Another area of concern is the recycling of documents or the use of templates. Without precautions in place, it is possible to inadvertently forward one client's information to another. This should be guarded against within the internal document management policy.

Once under a litigation hold or discovery request, it is too late to "scrub" or remove the metadata without the approval of the requesting party. Such a unilateral decision could lead to charges of spoliation and possible sanctions. The most protective and cost effective procedure during the collection phase is to keep the metadata intact within the native documents and then negotiate with the requesting party regarding the overall need for metadata and what metadata must be included. This procedure protects parties whether it is determined that the metadata must be produced or whether it can be removed or omitted from the produced data.

Early meet and confer sessions to review electronic discovery plans are required in the proposed amendment to Federal Rule of Civil Procedure 26(f). The new rule will provide parties with the opportunity to negotiate/stipulate as to how much data can and cannot be scrubbed.

Footnotes

  1. ^  Electronic production was ordered in "In re Honeywell Int'l, Inc. Securities Litig.", 2003 U.S. Dist. LEXIS 20602, 2003 WL 22722961 (S.D.N.Y. Nov. 18, 2003) (in putative securities class action, third party accounting firm's previous production of hardcopies of its work papers had been insufficient under Fed. R. Civ. P. 34(b) because information "not produced as kept in the usual course of business"), available at http://www.nysd.uscourts.gov/courtweb/pdf/D02NYSC/03-09294.PDF. For a rare modern case in which paper production was sufficient, "see Northern Crossarm Co., Inc. v. Chemical Specialties, Inc.", 2004 U.S. Dist. LEXIS 5381 (W.D. Wis. Mar. 3, 2004) (unique set of circumstances in which both production request and meet-and-confer correspondence failed to specify "electronic" and prior costly production of hardcopies 65,000 e-mails had "mimic[ed] manner in which that information [wa]s stored electronically").
  2. ^  "See generally" Kristin M. Nimsger & Michele C.S. Lange, "E-Document Conversion & Native Document Review" (LJN Legal Tech. News Dec. 2003) ("Nimsger"), available by paid subscription at http://www.lawjournalnewsletters.com/issues/ljn_legaltech/21_9/news/141567-1.html; "E-Evidence Thought Leadership Luncheon: Rowe v. Zubulake: A Perspective From the Bench" (Kroll Ontrack Sep. 23, 2003) (hereafter "Judges"), at http://www.krollontrack.com/upcomingevents/documents/zubulake.pdf; Kenneth Shear, "Retaining Computer Data in Original Format v. Conversion of Data into Images" (Electronic Evidence Discovery 2003) http://NativeShear.notlong.com. See also Mary Mack, "Native File Review: Simplifying Electronic Discovery?" (LJN's Legal Tech News. May 1, 2005); Mark Reber, "Native File Review: What Problem Are We Solving?" Technolawyer (Mar. 8, 2005), available at http://www.fiosinc.com/resources/pdfFiles/20050308_TechnoFeature_NativeReview_Reber.pdf.
  3. ^  "Jinks-Umstead v. England", 2005 WL 775780 (D.D.C. Apr. 7, 2005) (in discrimination case, granting new trial to allow Plaintiff to present its case using new electronic evidence that Defendant had initially claimed it no longer possessed but which turned out to be retrievable from database), available at http://www.dcd.uscourts.gov/opinions/2005/Facciola/1999-CV-2691~16:33:24~4-7-2005-a.pdf; "In re Plastics Additives Antitrust Litig.", 2004 U.S. Dist. LEXIS 23989, 2004-2 Trade Cas. (CCH) ¶ 74,620 (E.D. Pa. Nov. 29, 2004) (ordering parties to provide all transactional data in electronic format, to extent reasonably feasible; not requiring Defendant to provide technical assistance to help plaintiffs understand and make use of electronic data), available at http://www.paed.uscourts.gov/documents/opinions/04D0537P.pdf.
  4. ^  Among the many online definitions is the one found in Applied Discovery's Glossary at http://www.lexisnexis.com/applieddiscovery/clientResources/glossary_M.asp. "See also" Brownstone, Collaborative Navigation of the Stormy e-Discovery Seas, 10 Rich. J.L. & Tech. 53, ¶¶ 2, 23 & nn. 5, 68-70 (2004), ¶¶ 3, 19, 31 & nn.5-7, 56, 95-96, at http://law.richmond.edu/jolt/v10i5/article53.pdf#page=2.
  5. ^  "See generally" Workshare, "Dangers of Document Metadata" (2004), available by free registration at http://www.workshare.com/collateral/misc/Dangers_of_Document_Metadata.pdf.
  6. ^  "See" E. Svenson, "Overstating the threat of metadata in PDF documents" http://www.planetpdf.com/enterprise/article.asp?ContentID=6877 (rebutting D. Payne & B.Lewis, "Metadata: Are You Protected?" (2004), available upon free registration at http://www.lawtechnews.com/r5/showkiosk.asp?listing_id=430591).
  7. ^  Metadata cleaning software includes PCG's Metadata Assistant and Workshare's Professional 4's "Hidden Data". See Benjamin Rosenbaum, "Evaluation of the Top 5 Metadata Removal Utilities" (TechnoLawyer post 1/28/05), purchasable at http://www.technolawyer.com/member/archivehome.asp. Note: in "eDiscovery", only scrub metadata if you are sure no Court Order or Stipulation forbids it.
  8. ^  Nimsger, supra note 2, at 2. Each of the pieces cited in note 123 supra does an excellent job on this topic.
  9. ^  "Id". at 1-2.
  10. ^  "Id."
  11. ^  Robert D. Brownstone, "Collaborative Navigation of the Stormy e-Discovery Seas", 10 Rich. J.L. & Tech. 53, ¶¶ 2, 23 & nn. 5, 68-70 (2004) (hereafter "Brownstone"), available at http://law.richmond.edu/jolt/v10i5/article53.pdf#page=2.

  12. ^  Case law addressing these issues is still developing. "See, e.g., Medtronic Sofamor Danek, Inc. v. Michelson", 2003 WL 21468573 (W.D. Tenn. 2003) (in trade secrets and patents case as to spinal fusion medical technology, ordering non-privileged files produced to Defendant in their native electronic formats (rather than as image files); appointing special master - technology or computer expert - to oversee discovery and setting forth detailed protocol).
  13. ^  S.D.N.Y. Magistrate Judge Francis has informally stated
    [T]he touchstone...is the purpose or...relevance of the particular document at issue. [Whether] the metadata or the embedded data is going to be highly relevant...dictates [the] form of production.... [I]n any large document case these days, it's probably irresponsible for the requesting party not to ask for it in searchable form in any event.... I think the days of producing large volumes of paper documents are pretty close to over but that doesn't solve the situation about what form of searchable data.
    "Judges", supra note 2, at 25, at http://Judgesat25.notlong.com.
  14. ^  The pertinent Advisory Committee Note quotes the Manual for Complex Litigation (4th) § 11.446 to the effect that "production of word-processing files with all associated metadata...should be conditioned upon a showing of need or sharing expenses." Cf. D. Del. Default Standard for Discovery of Electronic Documents, available at http://www.ded.uscourts.gov/Announce/HotPage21.htm or http://www.ded.uscourts.gov/SLR/Misc/EDiscov.pdf, providing that:
    If, during...Rule 26(f) conference, the parties cannot agree to the format..., electronic documents shall be produced...as image files (e.g., PDF or TIFF).... [T[he producing party must preserve the integrity of the electronic document's contents, i.e., the original formatting of the document, its metadata and, where applicable, its revision history. After initial production in image file format is complete, a party must demonstrate particularized need for production of electronic documents in their native format.</blockquote

Personal tools
2006-2007 projects