
Performing the t-test
The difference in the way t-test works stems from the probability distribution from which our p-value is calculated. Having calculated our t-statistic, we need to look up the value in the t-distribution parameterized by the degrees of freedom of our data:
(defn t-test [a b] (let [df (+ (count a) (count b) -2)] (- 1 (s/cdf-t (i/abs (t-stat a b)) :df df))))
The degrees of freedom are two less than the sizes of the samples combined, which is 298 for our samples.

Recall that we are performing a hypothesis test. So, let's state our null and alternate hypotheses:
- H0: This sample is drawn from a population with a supplied mean
- H1: This sample is drawn from a population with a greater mean
Let's run the example:
(defn ex-2-16 [] (let [data (->> (load-data "new-site.tsv") (:rows) (group-by :site) (map-vals (partial map :dwell-time))) a (get data 0) b (get data 1)] (t-test a b))) ;; 0.0503
This returns a p-value of over 0.05. Since this is greater than the α of 5% we set for our hypothesis test, we are not able to reject the null hypothesis. Our test for the difference between the means has not discovered a significant difference using the t-test. Our barely significant result of the z-test was therefore partly due to it having such a small sample.
Two-tailed tests
There has been an implicit assumption in our alternate hypothesis that the new site would perform better than the previous site. The process of hypothesis testing goes to great lengths to ensure that we don't encode hidden assumptions while looking for statistical significance.
Tests where we look only for a significant increase or decrease in quantity are called one-tailed tests and are generally frowned upon, except in the case where a change in the opposite direction would be impossible. The name comes from the fact that a one-tailed test allocates all of the α to a single tail of the distribution. By not testing in the other direction, the test has more power to reject the null hypothesis in a particular direction and, in essence, lowers the threshold by which we would judge a result as significant.
While higher statistical power sounds desirable, it comes at the cost of there being a greater probability of making a Type I error. A more correct approach would be to entertain the possibility that the new site could realistically be worse than the existing site. This allocates our α equally to both tails of the distribution and ensures a significant outcome that is not biased by a prior assumption of improvement.

In fact, Incanter already provides functions to perform two-sample t-tests with the s/t-test
function. We provide a sample of data as the first argument and a sample to compare against with the :y
keyword argument. Incanter will assume that we want to perform a two-tailed test, unless we pass the :alternative
keyword with a value of :greater
or :lower
, in which case a one-tailed test will be performed.
(defn ex-2-17 [] (let [data (->> (load-data "new-site.tsv") (:rows) (group-by :site) (map-vals (partial map :dwell-time))) a (get data 0) b (get data 1)] (clojure.pprint/print (s/t-test a :y b)))) ;; {:p-value 0.12756432502462456, ;; :df 17.7613823496861, ;; :n2 16, ;; :x-mean 87.95070422535211, ;; :y-mean 122.0, ;; :x-var 10463.941024237305, ;; :conf-int [-78.9894629402365 10.890871390940724], ;; :y-var 6669.866666666667, ;; :t-stat -1.5985205593851322, ;; :n1 284}
Incanter's t-test returns a lot of information, including the p-value. The p-value is around twice what we calculated for the one-tailed test. In fact, the only reason it's not exactly double is because Incanter implements a slight variant of the t-test called Welch's t-test, which is slightly more robust when two samples have different standard deviations. Since we know that, for exponential distributions, the mean and the variance are intimately related, the test is slightly more rigorous to apply and returns an even lower significance.