EDRM Search Glossary

The EDRM Search Glossary is a list of terms related to searching ESI.

Ad Hoc Search

Single logical query or the progression of single logical queries performed interactively in an effort to accumulate intelligence.

Bayesian Classifier

Bayesian classifier is a process of identifying concepts using a certain representative documents in a particular category. The classifier has the ability to discern other responsive documents in the larger collection and place them in a category. Typically, a category is represented by a collection of words and their frequency of occurrence within the document. The probability that a document belongs to a category is based on the product of each word of the document appearing in that category across all documents. Thus, the learning classifier is able to apply words present in a sample category and apply that knowledge to other new documents. In the e-discovery context, a Bayesian classifier can quickly place documents into confidential, privileged, responsive documents and other well-known categories.

Bibliographic / Objective Coding

  • Objective information, often manually recorded from documents such as the document date, the authors or recipients of the documents, or the title of a document. Bibliographic coding usually takes place against documents originating as paper with no electronically stored information.
  • The entering of objective information such as date, document number, and document type into data fields. 1
  • Extracting information from electronic documents such as date created, author recipient, CC and linking each image to the information in pre-defined objective fields. In direct opposition to Subjective coding where legal interpretations of data in a document are linked to individual documents. Also called objective coding. 2

Boolean Search

  • A search technique that utilizes Boolean Logic to connect individual keywords or phrases within a single query such as AND, OR, and NOT, within (w/5) , and NOT withinN (not w/5). 1 2
  • A Keyword Search in which the Keywords are combined using operators such as “AND,” “OR,” and “[BUT] NOT.” The result of a Boolean Search is precisely determined by the words contained in the Documents. (See also Bag of Words method.) 3
  • The term “Boolean” refers to a system of logic developed by an early computer pioneer, George Boole. In Boolean searching, an “and” operator between two words results in a search for documents containing both of the words. An “or” operator between two words creates a search for documents containing either of the target words. A “not” operator between two words creates a search result containing the first word but excluding the second. 4 5 6
  • A search type using Boolean logic operators between search terms that indicate a relationship between them. An “AND” operator between two words or other values (for example, “pear AND apple”) means one is searching for documents containing both of the words or values, not just one of them. An “OR” operator between two words or other values (for example, “pear OR apple”) means one is searching for documents containing either of the words. 7
  • Mathematical query language developed by English mathematician George Boole in the 19th century. Boolean searching of text is based on the underlying logic functions of various true/false statements. Common Boolean operators are “and,” “but not,” and “within.” 8
  • A search for information using “AND,” “OR” and “NOT” commands, such as “Tom but not Jones” or “bankruptcy and trustee.” 9
  • The use of the terms “AND,” “OR” and “NOT” in conducting searches. Used to widen or narrow the scope of a search. 10

Case Search / Specifying Case

Specifying that the search must be case sensitive will match the exact case for all letters in the keyword and in the documents. For example, a case-sensitive search on Rose will match the name “Rose Jones” but it will not match the phrase “rose garden”.

Character Encoding

Electronic data is represented as sequences of bits, or numbers. Each alphabet or script used in a language is mapped to a unique numeric value. This is referred to as character encoding. See also Unicode.

Classify / Classification

  • To arrange or designate according to categorization such as potentially responsive or privileged versus non-responsive or not-privileged.
  • An Algorithm that Labels items as to whether or not they have a particular property; the act of Labeling items as to whether or not they have a particular property. In Technology-Assisted Review, Classifiers are commonly used to Label Documents as Responsive or Non-Responsive. 1

Compliance Search

Searching for the purposes of identification of specified relevant information in response to a discovery request. A compliance search should be paired with a methodology search as Ad-Hoc or Iterative searching.

Concept Search

  • A search technique that provides words which are similar in concept to a query word. A concept search will return documents that relate to the same concept as the query word, regardless of whether the query word exists in the search results documents. Concept searches can be implemented as a simple thesaurus match, or by using sophisticated statistical analysis methods. Effectiveness of concept search in an e-discovery project depends greatly on the type of algorithm used and its implementation.
  • An industry-specific term generally used to describe Keyword Expansion techniques, which allow search methods to return Documents beyond those that would be returned by a simple Keyword or Boolean Search. Methods range from simple techniques such as Stemming, Thesaurus Expansion, and Ontology search, through statistical Algorithms such as Latent Semantic Indexing.
  • Maps relationships between each word and every other word in large sets of documents and then associates words based on the context in which they are used. Two techniques can be used to perform concept searches: the use of a manually constructed thesaurus which relates certain words to others or semantic indexing, a fully automated method to show associations among words based, in part, on statistical analysis of the occurrence of proximity of certain words to others. 1
  • Also called “thesaurus” or “related” searching; sometimes called “synonym searching.” Searches that provide other words similar or close in meaning to the primary word. 2

Coverage Bias

Coverage Bias can occur if the samples are not representative of the population due to the methodology used. In e-discovery, such coverage bias occurs when large portions of ESI get excluded from based on meta-data or type of ESI. As an example, Patent Litigation may require sampling technical documents in their source form, and care should be taken to include these documents in the sample selection process.