class: center, middle, inverse, title-slide # Confidence Interval for a Single Proportion ### Dr. Dogucu --- layout: true <div class="my-header"></div> <div class="my-footer"> Copyright © <a href="https://mdogucu.ics.uci.edu">Dr. Mine Dogucu</a>. <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">CC BY-NC-SA 4.0</a></div> --- class: middle ## Remembering CLT <img src="slide-4-prop-ci_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> Let `\(\pi\)` represent the proportion of bike owners on campus then `\(\pi =\)` 0.15. --- ## Getting to sampling distribution of single proportion `\(p_1\)` - Proportion of first sample (n = 100) ``` ## [1] 0.17 ``` `\(p_2\)` -Proportion of second sample (n = 100) ``` ## [1] 0.12 ``` `\(p_3\)` -Proportion of third sample (n = 100) ``` ## [1] 0.14 ``` .... --- ### Sampling Distribution of Single Proportion <img src="slide-4-prop-ci_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- If certain conditions are met then `$$p \sim \text{approximately } N(\pi, \frac{\pi(1-\pi)}{n})$$` --- class: middle ## In Reality - We only have one sample and thus one point estimate of the population parameter. How can make use of it? -- - First we will assume the sample proportion is the best thing we have at hand and use it as a point estimate of the population proportion. -- - Second, even though we embrace the sample proportion as a point estimate of the population proportion, we will need to acknowledge that it has some error. --- class: middle ## Standard Error `\(p \sim \text{approximately } N(\text{mean} = \pi, \text{sd} = \sqrt{\frac{\pi(1-\pi)}{n}})\)` -- We call the standard deviation of the sampling distribution __standard error__ of the estimate. Standard error of single proportion is `\(\sqrt{\frac{p(1-p)}{n}}\)`. --- ## Confidence Interval CI = `\(\text{point estimate} \pm \text { margin of error}\)` -- CI = `\(\text{point estimate} \pm \text { critical value} \times \text{standard error}\)` -- CI for single proportion = `\(p \pm \text {critical value} \times \text{standard error}\)` -- CI for single proportion = `\(p \pm \text {critical value} \times \sqrt{\frac{p(1-p)}{n}}\)` -- 95% CI for single proportion = `\(p \pm 1.96 \times \sqrt{\frac{p(1-p)}{n}}\)` because ... --- 95% of the data falls within 1.96 standard deviations in the normal distribution. <img src="slide-4-prop-ci_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- ## How do we know that? ```r qnorm(0.025, mean = 0 , sd = 1) ``` ``` ## [1] -1.959964 ``` ```r qnorm(0.975, mean = 0 , sd = 1) ``` ``` ## [1] 1.959964 ``` --- ## 95% CI for the first sample Recall `\(p = 0.17\)` and `\(n = 100\)` -- 95% CI for single proportion = `\(p \pm 1.96 \times \sqrt{\frac{p(1-p)}{n}}\)` -- 95% CI = `\(0.17 \pm 1.96 \times \sqrt{\frac{0.17(1-0.17)}{100}}\)` -- 95% CI = `\(0.17 \pm 1.96 \times 0.03756328\)` -- 95%CI = `\(0.17 \pm 0.07362403\)` -- 95%CI = (0.09637597, 0.243624) --- ## 95% CI for the first sample 95%CI = (0.09637597, 0.243624) We are 95% confident that the true population proportion of bike owners is in this confidence interval. --- class: middle center 95%CI = (0.09637597, 0.243624) <img src="slide-4-prop-ci_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- ## Understanding Confidence Intervals I have taken 100 samples with `\(n = 100\)`, calculated the sample proportion, standard error, and 95% CI interval for each sample ``` ## # A tibble: 100 x 4 ## p SE lower_bound upper_bound ## <dbl> <dbl> <dbl> <dbl> ## 1 0.19 0.0392 0.113 0.267 ## 2 0.21 0.0407 0.130 0.290 ## 3 0.15 0.0357 0.0800 0.220 ## 4 0.15 0.0357 0.0800 0.220 ## 5 0.13 0.0336 0.0641 0.196 ## 6 0.11 0.0313 0.0487 0.171 ## 7 0.16 0.0367 0.0881 0.232 ## 8 0.11 0.0313 0.0487 0.171 ## 9 0.19 0.0392 0.113 0.267 ## 10 0.16 0.0367 0.0881 0.232 ## # ... with 90 more rows ``` --- ## Understanding Confidence Intervals <img src="slide-4-prop-ci_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> --- ## Understanding Confidence Intervals <img src="slide-4-prop-ci_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- class: middle ## Confidence Interval Width Which of the following confidence intervals would be the widest? Why? - 90% CI - 95% CI - 99% CI --- class: middle CI = `\(\text{point estimate} \pm \text { critical value} \times \text{standard error}\)` ```r qnorm(0.05, mean = 0, sd = 1) # critical value for 90%CI ``` ``` ## [1] -1.644854 ``` <img src="slide-4-prop-ci_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> --- class: middle CI = `\(\text{point estimate} \pm \text { critical value} \times \text{standard error}\)` ```r qnorm(0.025, mean = 0, sd = 1) # critical value for 95%CI ``` ``` ## [1] -1.959964 ``` <img src="slide-4-prop-ci_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> --- class: middle CI = `\(\text{point estimate} \pm \text { critical value} \times \text{standard error}\)` ```r qnorm(0.005, mean = 0, sd = 1) # critical value for 99%CI ``` ``` ## [1] -2.575829 ``` <img src="slide-4-prop-ci_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> --- class: middle CI = `\(\text{point estimate} \pm \text { critical value} \times \text{standard error}\)` - 99% CI has the highest critical value. - Higher critical value means higher margin of error. - Higher margin of error means wider CI. Thus 99% CI would be the widest. --- class: center middle [Garfield](https://www.gocomics.com/garfield/1999/03/12) --- class: middle ## Effect of Sample Size on Confidence Interval Researchers A, B, and C are interested in proportion of bike ownership took samples. They each take separate samples of size 100, 500, and 1000 respectively. They each have a sample proportion of 0.18. What a surprise! Which of the researchers will find the narrowest 95% CI? --- class: middle **Researcher A:** 95% CI for single proportion = `\(0.18 \pm 1.96 \times \sqrt{\frac{0.18(1-0.18)}{100}}\)` **Researcher B:** 95% CI for single proportion = `\(0.18 \pm 1.96 \times \sqrt{\frac{0.18(1-0.18)}{500}}\)` **Researcher C:** 95% CI for single proportion = `\(0.18 \pm 1.96 \times \sqrt{\frac{0.18(1-0.18)}{1000}}\)` -- As sample size increases, the standard error decreases and the margin of error also decreases, thus the confidence interval interval gets narrower. The Researcher C would have the narrowest CI. --- class: middle ## CLT If these conditions are met then `\(p \sim \text{approximately } N(\pi, \frac{\pi(1-\pi)}{n})\)` --- class: middle ## Conditions 1. The sample data are independent. 2. There needs to be at least 10 successes and 10 failures in the sample. 3. The sample size is smaller than 10% of the population. --- class: middle __Example__ According to a Gallup Survey of 1017 adults living in US 66% of Americans favor legalizing marijuana. Compute 95% confidence interval for the population proportion of those who favor legalizing marijuana. .footnote[Information on the survey can be found [here](https://news.gallup.com/poll/267698/support-legal-marijuana-steady-past-year.aspx)] --- ## Checking Conditions `\(n = 1017\)` and `\(p = 0.66\)` 1) The survey link indicates that the respondents were chosen from a random sample. We would expect such sample to be independent. -- 2) We need at least 10 people favoring legalizing marijuana and 10 people opposing this. `\(np = 1017 \cdot 0.66 = 671.22\)`. There are more than 10 people favoring legalizing marijuana. `\(n(1-p) = 1017 \cdot (1-0.66) = 345.78\)`. There are more than 10 people opposing legalizing marijuana. -- 3) 1017 is less than 10% of US population. --- ## Confidence Interval CI = `\(\text{point estimate} \pm \text { margin of error}\)` -- CI = `\(\text{point estimate} \pm \text { critical value} \times \text{standard error}\)` -- 95% CI for single proportion = `\(p \pm 1.96 \times \sqrt{\frac{p(1-p)}{n}}\)` -- 95% CI = `\(0.66 \pm 1.96 \times \sqrt{\frac{0.66(1-0.66)}{1017}}\)` -- 95% CI = (0.6308857, 0.6891143) We are 95% confident that the true proportion of Americans who support legalizing marijuana falls between 0.6308857 and 0.6891143. --- ## Confidence Interval Using R ```r p <- 0.66 #sample proportion n <- 1017 #sample size se <- sqrt(p*(1-p)/n) #standard error cv <- qnorm(0.975) #critical value p - cv*se #lower bound of the CI ``` ``` ## [1] 0.6308862 ``` ```r p + cv*se #upper bound of the CI ``` ``` ## [1] 0.6891138 ```