Normal Distribution


  • The “bell curve” of classical statistics. The number of Relevant Documents in a Sample tends to obey a Normal (Gaussian) Distribution, provided the Sample size is large enough to capture a substantial number of Relevant and Non-Relevant Documents. In this situation, Gaussian Estimation is reasonably accurate. If the Sample size is insufficiently large to capture a substantial number of both Relevant and Non-Relevant Documents (as a rule of thumb, at least 12 of each), the Binomial Distribution better characterizes the number of Relevant Documents in the Sample, and Binomial Estimation is more appropriate. Also referred to as a Gaussian Distribution. 1
  • In probability theory, the normal (or Gaussian) distribution is a continuous probability distribution. It has a bell-shaped probability density function, known as the Gaussian function or informally as the bell curve, the height of the curve shows the relative likelihood of various values. The area under the curve sums to 1.0, so sections of the curve represent probabilities. The normal distribution derives from the central limit theorem, which says that the average of a large number of random variables is distributed as the normal distribution, however the variables were originally distributed. The normal distribution has wide application in statistics, for example, in sampling.normal-distribution
    A graph of the normal distribution. The confidence interval is in the middle in white. The “tails” are shown in yellow. The 95% confidence interval represents 95% of the area under the curve. In a two-tailed distribution, this 95% area is symmetrically aligned around the average of the distribution. Image from


  1. Maura R. Grossman and Gordon V. Cormack, EDRM page & The Grossman-Cormack Glossary of Technology-Assisted Review, with Foreword by John M. Facciola, U.S. Magistrate Judge2013 Fed. Cts. L. Rev. 7 (January 2013).
  2. Herb Roitblat, Predictive Coding Glossary.