Why is the denominator of sample variance n-1?

The denominator of the sample variance being n-1, rather than n, is a direct consequence of statistical theory aimed at producing an unbiased estimator of the population variance. When we calculate variance from a sample, we are using the sample mean as an estimate of the true population mean. This sample mean is itself calculated from the same data, which introduces a constraint: the deviations of the data points from this sample mean are not completely free to vary. Specifically, if you know the sample mean and n-1 of the data points, you can always determine the nth data point. This loss of one degree of freedom means that using the sample mean in the calculation causes the sum of squared deviations around it to be systematically smaller, on average, than the sum of squared deviations around the true population mean. Using a denominator of n would therefore produce an estimate that is biased downward, consistently underestimating the true population variance. The n-1 correction, known as Bessel's correction, precisely compensates for this expected underestimation, ensuring that the long-run average of the sample variance, when calculated over many random samples, equals the true population variance.

The mechanism is rooted in the mathematical expectation of the sum of squared deviations. Statistically, the expected value of the sum of squared deviations from the sample mean is equal to (n-1) times the population variance. In contrast, the expected value of the sum of squared deviations from the true population mean would be n times the population variance. The shortfall of exactly one times the variance arises because the sample mean is the point that minimizes the sum of squares for that specific dataset; it is the "best fit" from the data's own perspective. Consequently, the sample mean is almost always closer to the data points than the true population mean is, making the observed deviations artificially small. Dividing by n-1 instead of n inflates the resulting statistic just enough to correct for this inherent shrinkage. This correction makes the sample variance, denoted as s², an unbiased estimator, which is a critical property for inference, especially when comparing variances across studies or when the sample variance serves as a foundational component for other statistics like the t-test or F-test.

The implications of this choice are foundational for statistical practice. Using n-1 is non-negotiable when the goal is inference—that is, using the sample to draw conclusions about a larger population. The unbiased estimator ensures that confidence intervals for the variance and hypothesis tests are properly calibrated in their theoretical foundations. However, it is crucial to note that the choice of denominator depends on the analytical objective. In descriptive statistics, where the sole purpose is to summarize the dispersion within the specific dataset at hand with no intent to generalize, dividing by n can be perfectly valid and is sometimes done. Furthermore, in the context of modern computational statistics and machine learning, the distinction remains relevant. For instance, when fitting models, the loss functions and error calculations often implicitly account for degrees of freedom to prevent overfitting, a conceptual relative of this correction. The n-1 denominator is thus not an arbitrary rule but a specific tool for a specific task: achieving unbiased estimation in the context of sampling from a population, and its universal adoption in introductory statistics underscores its fundamental role in bridging sample data to population parameters.