Additional office hours: Thursdays 3 - 4 pm.
Do not trust Canvas for your grades. Grading policies are on the course website.
Even though I respect your preferred name, Gradescope can only detect your official name that is on the Course roster.
If you do not attend lecture or discussion, you will be lost.
This is NOT a class that you can only show up to midterm and final, and pass.
ggplot
.Patients with a certain disease at a state hospital participate in a research study. They are randomly split into two and either receive a new found drug or a placebo based on their group. Researchers observe that the patients who have taken the drug have been cured from the disease. Which of the following is true?
a) Researchers can conclude that this drug cures the disease.
b) This is an observational study since it says "observe". Researchers cannot conclude that this drug cures the disease.
c) Researchers can conclude that this drug cures the disease only for this sample.
Patients with a certain disease at a state hospital participate in a research study. They are randomly split into two and either receive a new found drug or a placebo based on their group. Researchers observe that the patients who have taken the drug have been cured from the disease. Which of the following is true?
a) Researchers can conclude that this drug cures the disease.
b) This is an observational study since it says "observe". Researchers cannot conclude that this drug cures the disease.
c) Researchers can conclude that this drug cures the disease only for this sample.
titanic_train %>% select(PassengerId, Name) %>% glimpse()
## Observations: 891## Variables: 2## $ PassengerId <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,...## $ Name <chr> "Braund, Mr. Owen Harris", "Cumings, Mrs. John Bra...
Note that both PassengerId
and Name
are nomimal variables. There is really no meaningful ordering or grouping for these variables.
Mohammed claims that he can tell the difference between tap water and bottled water. His friend, Phoebe does not believe him. They want to test his claim. They invite their friend Xuan to help out. Xuan pours bottled or tap water into a cup and gives it to Phoebe who then gives this to Mohammed. They repeat this twenty times. The study design is
a. a confounding design.
b. a single-blind design.
c. a double-blind design.
Mohammed claims that he can tell the difference between tap water and bottled water. His friend, Phoebe does not believe him. They want to test his claim. They invite their friend Xuan to help out. Xuan pours bottled or tap water into a cup and gives it to Phoebe who then gives this to Mohammed. They repeat this twenty times. The study design is
a. a confounding design.
b. a single-blind design.
c. a double-blind design.
If two variables are related/associated they are called dependent variables (e.g. having a college degree and income)
If two variables are not related/associated they are called independent variables (e.g. your hair color and whether it will rain in Gaborone.)
glimpse(titanic_train)
## Observations: 891## Variables: 12## $ PassengerId <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,...## $ Survived <int> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0,...## $ Pclass <int> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2, 3,...## $ Name <chr> "Braund, Mr. Owen Harris", "Cumings, Mrs. John Bra...## $ Sex <chr> "male", "female", "female", "female", "male", "mal...## $ Age <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58, 20, ...## $ SibSp <int> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0, 4,...## $ Parch <int> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0, 1,...## $ Ticket <chr> "A/5 21171", "PC 17599", "STON/O2. 3101282", "1138...## $ Fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, ...## $ Cabin <chr> "", "C85", "", "C123", "", "", "E46", "", "", "", ...## $ Embarked <chr> "S", "C", "S", "S", "S", "Q", "S", "S", "S", "C", ...
R is a functional language. In this case glimpse()
is a function and titanic_train
is the argument.
Since R is functional let's remember functions from algebra. If you want to execute fog(x), you have three choices in R
f(g(x))
g(x) %>% f()
x %>% g() %>% f()
glimpse(titanic_train)
titanic_train %>% glimpse()
count(titanic_train, Survived)
## # A tibble: 2 x 2## Survived n## <int> <int>## 1 0 549## 2 1 342
titanic_train %>% count(Survived)
## # A tibble: 2 x 2## Survived n## <int> <int>## 1 0 549## 2 1 342
count()
Count can take more than two variables.
count(titanic_train, Survived, Pclass)
## # A tibble: 6 x 3## Survived Pclass n## <int> <int> <int>## 1 0 1 80## 2 0 2 97## 3 0 3 372## 4 1 1 136## 5 1 2 87## 6 1 3 119
You can break code over multiple lines
count(titanic_train, Survived, Pclass)
## # A tibble: 6 x 3## Survived Pclass n## <int> <int> <int>## 1 0 1 80## 2 0 2 97## 3 0 3 372## 4 1 1 136## 5 1 2 87## 6 1 3 119
## [1] "7" "8" ## [3] "8" "8" ## [5] "8" "7" ## [7] "8" "4" ## [9] "7 (very rare occurrence)" "7.5"
Calculate 1)mean, 2) median, 3) mode, 4) first quartile, 5) third quartile, 6) interquartile range and
Draw a boxplot
titanic_train %>% summarize(mean(Fare), median(Fare), sd(Fare), var(Fare), min(Fare), max(Fare))
## mean(Fare) median(Fare) sd(Fare) var(Fare) min(Fare) max(Fare)## 1 32.20421 14.4542 49.69343 2469.437 0 512.3292
titanic_train %>% summarize(median(Fare), quantile(Fare, 0.50))
## median(Fare) quantile(Fare, 0.5)## 1 14.4542 14.4542
titanic_train %>% summarize(q1 = quantile(Fare, 0.25), q3 = quantile(Fare, 0.75))
## q1 q3## 1 7.9104 31
group_by()
titanic_train %>% group_by(Pclass) %>% summarize(mean_fare = mean(Fare))
## # A tibble: 3 x 2## Pclass mean_fare## <int> <dbl>## 1 1 84.2## 2 2 20.7## 3 3 13.7
You can only group by a categorical variable.
Imagine doing this by hand. You would first take the Fare
amounts and then divide them into three groups based on Pclass
and then calculate the mean for each group.
Let's assume that the height in our classroom has bell-shaped symmetric distribution. In such distributions, mean = median = mode.
If Michael Jordan (6 feet 6 inches - 198.12 cm) were to walk into our classroom. How would the distibritution of our heights change? Describe the shape and spread. Order the mean, median, and mode in descending order.
If Tyrion Lannister (4 feet 5 inches - 135.00 cm) were to walk into our classroom. How would the distibritution of our heights change? Describe the shape and spread. Order the mean, median, and mode in descending order.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |