top of page

Lecture 36: Statistics

Lecture 37: Conclusion

Bayes' Rule

(070616)

Baye's Rule

A rule for updating probabilities based on new evidence Provided with P(A), P(B), and P(B|A)... compute P(A|B) P(A|B) = P(B|A) * P(A) / P(B)

P(A) : Prior probability of some event before evidence

P(B|A) : Likelihood of evidence given this event

P(B) : Probability of evidence in general

P(A|B) : Posterior probability of event after evidence 

Main Uses of Confidence Intervals

(070616)

Confidence Intervals

● To estimate a parameter:

   ○ You can just take the middle 95% of the estimates if you don’t want to deal with subtracting deviations

● Regression prediction: predict y based on a new x

● To test the null hypothesis that the parameter is equal to a specified value:

    ○ In the regression model, test whether the slope of the true line is 0

Tests of Hypotheses

(070616)

Hypothesis

Null:

   The observation is “just due to chance”. Need to say exactly what is due to chance, and

    what the hypothesis specifies.
Alternative:

   The null isn’t true; something other than chance is going on.

P-value

(070616)

P-value

● The chance, under the null hypothesis, that the test statistic comes out like the one in the sample or more extreme.

● If this chance is small, then:

   ○ If the null is true, something very unlikely has happened.

   ○ Conclude that the data support the alternative hypothesis better than they support the null. 

Comparing

Two Numerical Samples

(070616)

Numerical Samples

● Null:

   The two samples come from the same underlying distribution in the population.
● Method:

   ○ Bootstrap A/B test

   ○ Test statistic: difference between means

Comparing

Two Categorical Samples

(070616)

Categorical Samples

● Null:

   The two samples come from the same underlying distribution in the population

   (e.g. “distribution of employment status is the same for married and unmarried men”).

● Method:

   ○ Permutation test

   ○ Test statistic: total variation distance between the distributions of the two samples 

One Categorical Sample

(070616)

1-Categorical Samples

● Null:

   The sample was drawn at random from a specified distribution

    (e.g. known from Census data, “the coin is fair”, etc).

● Method:

   ○ Simulation: Generate samples from the distribution specified in the null.

   ○ Test statistic: total variation distance between distribution in sample and distribution

       specified in the null. 

bottom of page