Latent Semantic Indexing / Latent Semantic Analysis


  • Latent semantic indexing (sometimes also referred to as Latent Semantic Analysis) is a technology that analyzes co-occurrence of keyword terms in the document collection. In textual documents, keywords exhibit polysemy as well as synonymy. Latent Semantic Indexing refers to the additional factor that certain keywords are related to the concept in that they appear together. These relationships can be “is-a” relationship such as “motorcycle is a vehicle” or a containment relationship such as “wheels of a motorcycle”. Support Vector Machines, Probabilistic Latent Semantic Analysis, Latent Dirichlet Allocation, and others.
  • A Feature Engineering Algorithm that uses linear algebra to group together correlated Features. For example, “Windows, Gates, Ballmer” might be one group, while “Windows, Gates, Doors” might be another. Latent Semantic Indexing underlies many Concept Search tools. While Latent Semantic Indexing is used for Feature Engineering in some Technology-Assisted Review tools, it is not, per se, a Technology-Assisted Review method. Also referred to as Latent Semantic Analysis. 1

See Also

  1. Maura R. Grossman and Gordon V. Cormack, EDRM page & The Grossman-Cormack Glossary of Technology-Assisted Review, with Foreword by John M. Facciola, U.S. Magistrate Judge2013 Fed. Cts. L. Rev. 7 (January 2013).