Clojure for Data Science
上QQ阅读APP看书,第一时间看更新

Standard error

While the standard deviation measures the amount of variation there is within a sample, the standard error measures the amount of variation there is between the means of samples taken from the same population.

Note

The standard error is the standard deviation of the distribution of the sample means.

We have calculated the standard error of dwell time empirically by looking at the previous 6 months of data. But there is an equation that allows us to calculate it from only a single sample:

Standard error

Here, σx is the standard deviation and n is the sample size. This is unlike the descriptive statistics that we studied in the previous chapter. While they described a single sample, the standard error attempts to describe a property of samples in general—the amount of variation in the sample means that variations can be expected for samples of a given size:

(defn standard-deviation [xs]
  (Math/sqrt (variance xs)))

(defn standard-error [xs]
  (/ (standard-deviation xs)
     (Math/sqrt (count xs))))

The standard error of the mean is thus related to two factors:

  • The size of the sample
  • The population standard deviation

The size of the sample has the largest impact on the standard error. Since we take the square root of the sample size, we have to increase the size of the sample by four to halve the size of the standard error.

It may seem curious that the proportion of the population sampled has no effect on the size of the standard error. This is just as well, since some populations could be infinite in size.