<-- "Hello to you!" shup2
Clicker Questions
to go along with
Modern Data Science with R, 3rd edition by Baumer, Kaplan, and Horton
R for Data Science, 2nd edition by Wickham, Çetinkaya-Rundel, and Grolemund
- R / R Studio / Quarto1
- all good
- started, progress is slow and steady
- started, very stuck
- haven’t started yet
- what do you mean by “R”?
- Git / GitHub2
- all good
- started, progress is slow and steady
- started, very stuck
- haven’t started yet
- what do you mean by “Git”?
- Where can I get feedback on my HW assignments / quizzes?3
- prof will return paper versions
- on Gradescope
- on Canvas
- on GitHub
- Which of the following includes talking to the remove version of GitHub?4
- changing your name (updating the YAML)
- committing the file(s)
- pushing the file(s)
- some of the above
- all of the above
- What is the error?5
- poor assignment operator
- unmatched quotes
- improper syntax for function argument
- invalid object name
- no mistake
- What is the error?6
- poor assignment operator
- unmatched quotes
- improper syntax for function argument
- invalid object name
- no mistake
3shup <- "Hello to you!"
- What is the error?7
- poor assignment operator
- unmatched quotes
- improper syntax for function argument
- invalid object name
- no mistake
<- "Hello to you! shup4
- What is the error?8
- poor assignment operator
- unmatched quotes
- improper syntax for function argument
- invalid object name
- no mistake
<- date() shup5
- What is the error?9
- poor assignment operator
- unmatched quotes
- improper syntax for function argument
- invalid object name
- no mistake
<- sqrt 10 shup6
- Do you keep a calendar / schedule / planner?10
- Yes
- No
- Do you keep a calendar / schedule / planner? If you answered “Yes” …11
- Yes, on Google Calendar
- Yes, on Calendar for macOS
- Yes, on Outlook for Windows
- Yes, in some other app
- Yes, by hand
- Where should I put things I’ve created for the HW (e.g., data, .ics file, etc.)12
- Upload into remote GitHub directory
- In the local folder which also has the R project
- In my Downloads
- Somewhere on my Desktop
- In my Home directory
- The goal of making a figure is…13
- To draw attention to your work.
- To facilitate comparisons.
- To provide as much information as possible.
- A good reason to make a particular choice of a graph is:14
- Because the journal / field has particular expectations for how the data are presented.
- Because some variables naturally fit better on some graphs (e.g., numbers on scatter plots).
- Because that graphic displays the message you want as optimally as possible.
- Why are the points orange?15
- R translates “navy” into orange.
- color must be specified in
geom_point()
- color must be specified outside the
aes()
function - the default plot color is orange
ggplot(data = Births78,
aes(x = date, y = births, color = "navy")) +
geom_point() +
labs(title = "US Births in 1978")
- Why are the dots blue and the lines colored?16
- dot color is given as “navy”, line color is given as
wday
. - both colors are specified in the
ggplot()
function. - dot coloring takes precedence over line coloring.
- line coloring takes precedence over dot coloring.
- dot color is given as “navy”, line color is given as
- Setting vs. Mapping. If I want information to be passed to all data points (not variable):17
- map the information inside the
aes()
function. - set the information outside the
aes()
function
- map the information inside the
- The Snow figure was most successful at:18
- making the data stand out
- facilitating comparison
- putting the work in context
- simplifying the story
- The Challenger figure(s) was(were) least successful at:19
- making the data stand out
- facilitating comparison
- putting the work in context
- simplifying the story
- The biggest difference between Snow and the Challenger was:20
- The amount of information portrayed.
- One was better at displaying cause.
- One showed the relevant comparison better.
- One was more artistic.
- Caffeine and Calories. What was the biggest concern over the average value axes?21
- It isn’t at the origin.
- They should have used all the data possible to find averages.
- There wasn’t a random sample.
- There wasn’t a label explaining why the axes were where they were.
- What is wrong with the following code?22
- should only be one =
- Sydney should be lower case
- name should not be in quotes
- use mutate instead of filter
- babynames in wrong place
<- |> filter(babynames,
Result == “Sydney”) name
- Which data represents the ideal format for ggplot2 and dplyr?23
year | Algeria | Brazil | Columbia |
---|---|---|---|
2000 | 7 | 12 | 16 |
2001 | 9 | 14 | 18 |
country | Y2000 | Y2001 |
---|---|---|
Algeria | 7 | 9 |
Brazil | 12 | 14 |
Columbia | 16 | 18 |
country | year | value |
---|---|---|
Algeria | 2000 | 7 |
Algeria | 2001 | 9 |
Brazil | 2000 | 12 |
Brazil | 2001 | 14 |
Columbia | 2000 | 16 |
Columbia | 2001 | 18 |
- Each of the statements except one will accomplish the same calculation. Which one does not match?24
#(a)
|>
babynames group_by(year, sex) |>
summarize(totalBirths = sum(num))
#(b)
group_by(babynames, year, sex) |>
summarize(totalBirths = sum(num))
#(c)
group_by(babynames, year, sex) |>
summarize(totalBirths = mean(num))
#(d)
<- group_by(babynames, year, sex)
temp
summarize(temp, totalBirths = sum(num))
#(e)
summarize(group_by(babynames, year, sex),
totalBirths = sum(num))
- Fill in Q1.25
filter()
arrange()
select()
mutate()
group_by()
- Fill in Q2.26
(year, sex)
(year, name)
(year, num)
(sex, name)
(sex, num)
- Fill in Q3.27
n_distinct(name)
n_distinct(n)
sum(name)
sum(num)
mean(num)
- Running the code.28
<- babynames::babynames |>
babynames rename(num = n)
|>
babynames filter(name %in% c("Jane", "Mary")) |>
# just the Janes and Marys
group_by(name, year) |>
# for each year for each name
summarize(total = sum(num))
# A tibble: 276 × 3
# Groups: name [2]
name year total
<chr> <dbl> <int>
1 Jane 1880 215
2 Jane 1881 216
3 Jane 1882 254
4 Jane 1883 247
5 Jane 1884 295
6 Jane 1885 330
7 Jane 1886 306
8 Jane 1887 288
9 Jane 1888 446
10 Jane 1889 374
# ℹ 266 more rows
|>
babynames filter(name %in% c("Jane", "Mary")) |>
group_by(name, year) |>
summarize(number = sum(num))
# A tibble: 276 × 3
# Groups: name [2]
name year number
<chr> <dbl> <int>
1 Jane 1880 215
2 Jane 1881 216
3 Jane 1882 254
4 Jane 1883 247
5 Jane 1884 295
6 Jane 1885 330
7 Jane 1886 306
8 Jane 1887 288
9 Jane 1888 446
10 Jane 1889 374
# ℹ 266 more rows
|>
babynames filter(name %in% c("Jane", "Mary")) |>
group_by(name, year) |>
summarize(n_distinct(name))
# A tibble: 276 × 3
# Groups: name [2]
name year `n_distinct(name)`
<chr> <dbl> <int>
1 Jane 1880 1
2 Jane 1881 1
3 Jane 1882 1
4 Jane 1883 1
5 Jane 1884 1
6 Jane 1885 1
7 Jane 1886 1
8 Jane 1887 1
9 Jane 1888 1
10 Jane 1889 1
# ℹ 266 more rows
|>
babynames filter(name %in% c("Jane", "Mary")) |>
group_by(name, year) |>
summarize(n_distinct(num))
# A tibble: 276 × 3
# Groups: name [2]
name year `n_distinct(num)`
<chr> <dbl> <int>
1 Jane 1880 1
2 Jane 1881 1
3 Jane 1882 1
4 Jane 1883 1
5 Jane 1884 1
6 Jane 1885 1
7 Jane 1886 1
8 Jane 1887 1
9 Jane 1888 1
10 Jane 1889 1
# ℹ 266 more rows
|>
babynames filter(name %in% c("Jane", "Mary")) |>
group_by(name, year) |>
summarize(sum(name))
Error in `summarize()`:
ℹ In argument: `sum(name)`.
ℹ In group 1: `name = "Jane"` and `year = 1880`.
Caused by error in `base::sum()`:
! invalid 'type' (character) of argument
|>
babynames filter(name %in% c("Jane", "Mary")) |>
group_by(name, year) |>
summarize(mean(num))
# A tibble: 276 × 3
# Groups: name [2]
name year `mean(num)`
<chr> <dbl> <dbl>
1 Jane 1880 215
2 Jane 1881 216
3 Jane 1882 254
4 Jane 1883 247
5 Jane 1884 295
6 Jane 1885 330
7 Jane 1886 306
8 Jane 1887 288
9 Jane 1888 446
10 Jane 1889 374
# ℹ 266 more rows
|>
babynames filter(name %in% c("Jane", "Mary")) |>
group_by(name, year) |>
summarize(median(num))
# A tibble: 276 × 3
# Groups: name [2]
name year `median(num)`
<chr> <dbl> <dbl>
1 Jane 1880 215
2 Jane 1881 216
3 Jane 1882 254
4 Jane 1883 247
5 Jane 1884 295
6 Jane 1885 330
7 Jane 1886 306
8 Jane 1887 288
9 Jane 1888 446
10 Jane 1889 374
# ℹ 266 more rows
- Where can I get feedback on my HW assignments / quizzes?29
- prof will return paper versions
- on Gradescope
- on Canvas
- on GitHub
- Where can I get feedback on my projects?30
- prof will return paper versions
- on Gradescope
- on Canvas
- on GitHub
- Fill in Q1.31
gdp
year
gdpval
country
–country
- Fill in Q2.32
gdp
year
gdpval
country
–country
- Fill in Q3.33
gdp
year
gdpval
country
–country
- Response to stimulus (in ms) after only 3 hrs of sleep for 9 days. You want to make a plot with the subject’s reaction time (y-axis) vs the number of days of sleep restriction (x-axis) using the following
ggplot()
code. Which data frame should you use?34- use raw data
- use
pivot_wider()
on raw data - use
pivot_longer()
on raw data
ggplot(___, aes(x = ___, y = ___, color = ___)) +
geom_line()
# A tibble: 18 × 11
Subject day_0 day_1 day_2 day_3 day_4 day_5 day_6 day_7 day_8 day_9
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 308 250. 259. 251. 321. 357. 415. 382. 290. 431. 466.
2 309 223. 205. 203. 205. 208. 216. 214. 218. 224. 237.
3 310 199. 194. 234. 233. 229. 220. 235. 256. 261. 248.
4 330 322. 300. 284. 285. 286. 298. 280. 318. 305. 354.
5 331 288. 285 302. 320. 316. 293. 290. 335. 294. 372.
6 332 235. 243. 273. 310. 317. 310 454. 347. 330. 254.
7 333 284. 290. 277. 300. 297. 338. 332. 349. 333. 362.
8 334 265. 276. 243. 255. 279. 284. 306. 332. 336. 377.
9 335 242. 274. 254. 271. 251. 255. 245. 235. 236. 237.
10 337 312. 314. 292. 346. 366. 392. 404. 417. 456. 459.
11 349 236. 230. 239. 255. 251. 270. 282. 308. 336. 352.
12 350 256. 243. 256. 256. 269. 330. 379. 363. 394. 389.
13 351 251. 300. 270. 281. 272. 305. 288. 267. 322. 348.
14 352 222. 298. 327. 347. 349. 353. 354. 360. 376. 389.
15 369 272. 268. 257. 278. 315. 317. 298. 348. 340. 367.
16 370 225. 235. 239. 240. 268. 344. 281. 348. 365. 372.
17 371 270. 272. 278. 282. 279. 285. 259. 305. 351. 369.
18 372 269. 273. 298. 311. 287. 330. 334. 343. 369. 364.
<- sleep_wide |>
sleep_long pivot_longer(cols = -Subject,
names_to = "day",
names_prefix = "day_",
values_to = "reaction_time")
sleep_long
# A tibble: 180 × 3
Subject day reaction_time
<dbl> <chr> <dbl>
1 308 0 250.
2 308 1 259.
3 308 2 251.
4 308 3 321.
5 308 4 357.
6 308 5 415.
7 308 6 382.
8 308 7 290.
9 308 8 431.
10 308 9 466.
# ℹ 170 more rows
- Consider band members from the Beatles and the Rolling Stones. Who is removed in a
right_join()
?35
- Mick
- John
- Paul
- Keith
- Impossible to know
|>
band_members right_join(band_instruments, by = "name")
band_members
# A tibble: 3 × 2
name band
<chr> <chr>
1 Mick Stones
2 John Beatles
3 Paul Beatles
band_instruments
# A tibble: 3 × 2
name plays
<chr> <chr>
1 John guitar
2 Paul bass
3 Keith guitar
- Consider band members from the Beatles and the Rolling Stones. Which variables are removed in a
right_join()
?36
name
band
plays
- none of them
|>
band_members right_join(band_instruments, by = "name")
band_members
# A tibble: 3 × 2
name band
<chr> <chr>
1 Mick Stones
2 John Beatles
3 Paul Beatles
band_instruments
# A tibble: 3 × 2
name plays
<chr> <chr>
1 John guitar
2 Paul bass
3 Keith guitar
- What happens to Mick’s
plays
variable in afull_join()
?37
- Mick is removed
- changes to guitar
- changes to bass
NA
NULL
|>
band_members full_join(band_instruments, by = "name")
band_members
# A tibble: 3 × 2
name band
<chr> <chr>
1 Mick Stones
2 John Beatles
3 Paul Beatles
band_instruments
# A tibble: 3 × 2
name plays
<chr> <chr>
1 John guitar
2 Paul bass
3 Keith guitar
- What is the output of the following R code?38
- TRUE
- TRUE TRUE TRUE TRUE
- TRUE FALSE FALSE FALSE
- FALSE
<- c("apple", "banana", "pear", "pineapple")
fruit str_detect(fruit, "a")
- What is the output of the following R code?39
- “one -pple” “two p-ars” “three bananas”
- “on- -ppl-” “two p–rs” “thr– b-n-n-s”
- “on- apple” “two p-ars” “thr-e bananas”
<- c("one apple", "two pears", "three bananas")
fruits str_replace(fruits, c("a", "e", "i"), "-")
- What is the output of the following R code?40
- “abc” “hifg”
- “ab” “hifg”
- “ab” “ifg”
- “abc” “ifg”
<- c("abcde", "ghifgh")
x str_sub(x, start = c(1, 3), end = c(2, 5))
- What is January 31 + one month?41
- February 31
- March 4
- February 28 (assuming no leap year)
- I don’t want to answer the question
- What is the difference between what these two lines of code?42
- same thing
- different months
- different output formatting
- different input
- different calculation
library(lubridate)
<- ymd("2024-09-25")
today month(today)
month(today, label = TRUE)
- What does this number mean?43
- Today is the 39th day of the month.
- Today is the 39th day of the year.
- Today is the 39th week of the month.
- Today is the 39th week of the year.
<- ymd("2024-09-25")
today week(today)
[1] 39
- What is the difference in these two functions?44
- Day of month and day of year.
- Day of month and day of week.
- Day of week and day of year.
- Day of weekend and day of month.
mday(today)
[1] 25
yday(today)
[1] 269
- What is the result of the code?45
- TRUE
- FALSE
- “2025-09-01”
- “2025-09-25”
> ymd("2025-09-01") today
grep("q[^u]", very.large.word.list)
would not match which of the following?46- Iraqi
- Iraqian
- Iraq
- zaqqun (tree that “springs out of the bottom of Hell”, in the Quran)
- Qantas (the Australian airline)
- Which of the following regex would match to both “grey” and “gray”?47
- “gr[ae]y”
- “gr(a|e)y”
- “gray | grey”
- “gr[a|e]y”
- some / all of the above – which ones?
- What will the result be for the following code?48
- 10
- 1
- 0
- NA
str_extract("My dog is 10 years old", "\\d")
- What will the result be for the following code?49
- 10
- 1
- 0
- NA
str_extract("My dog is 10 years old", "\\d+")
- What will the result be for the following code?50
- .
- Episode 2: The pie whisperer. (4 August 2015)
- Episode
- E
str_extract("Episode 2: The pie whisperer. (4 August 2015)", ".")
- What will the result be for the following code?51
- .
- Episode 2: The pie whisperer. (4 August 2015)
- Episode
- E
str_extract("Episode 2: The pie whisperer. (4 August 2015)", ".+")
- What will the result be for the following code?52
- .
- Episode 2: The pie whisperer. (4 August 2015)
- Episode
- E
str_extract("Episode 2: The pie whisperer. (4 August 2015)", "\\.")
- What is the difference between the output for the two regular expressions below?53
- They give the same result.
- The first is not case sensitive.
- The second allow for all the variants.
- The first includes Jane.
<- c("Mary", "Mar", "Janet", "jane", "Susan", "Sue")
string str_extract(string, "\\bMary|Jane|Sue\\b")
str_extract(string, "\\b(Mary|Jane|Sue)\\b")
- How can I pull out just the numerical information in “$47”?54
"(?<=\\$)\\d"
"(?<=\\$)\\d+"
"\\d(?=\\$)"
"\\d+(?=\\$)"
- You want to know all the types of pies in the text strings. They are written as, for example “apple pie”.55
"\\w+(?!pie)"
"\\w+(?! pie)"
"\\w+(?=pie)"
"\\w+(?= pie)"
str_extract(c("apple pie", "chocolate pie", "peach pie"), "\\w+(?= pie)")
[1] "apple" "chocolate" "peach"
str_extract(c("apple pie", "chocolate pie", "peach pie"), "\\w+(?=pie)")
[1] NA NA NA
- We say that lookarounds are “zero-lenghth assertions”. What does that mean?56
- we return the string in the lookaround
- we replace the string in the lookaround
- we return the string at the lookaround
- we replace the string at the lookaround
- What will happen when I run the following code?57
- 0
- 3
- 9
- NA
- error (code will fail)
<- function(x, y){
my_power return(x^y)
}my_power(3)
- What will happen when I run the following code?58
- 0
- 3
- 9
- NA
- error (code will fail)
<- function(x, y = 2){
my_power return(x^y)
}my_power(3)
- What will happen when I run the following code?59
- 4
- 8
- 9
- NA
- error (code will fail)
<- function(x, y = 2){
my_power return(x^y)
}my_power(2, 3)
- What will happen when I run the following code?60
- 4
- 8
- 9
- NA
- error (code will fail)
<- function(x = 2, y = 3){
my_power return(x^y)
}my_power( )
- Consider the
addTen()
function. The following output is a result of whichmap_*()
call?61
map(c(1,4,7), addTen)
map_dbl(c(1,4,7), addTen)
map_chr(c(1,4,7), addTen)
map_lgl(c(1,4,7), addTen)
<- function(wow) {
addTen return(wow + 10)
}
[1] "11.000000" "14.000000" "17.000000"
- Which of the following input is allowed?62
map(c(1, 4, 7), addTen)
map(list(1, 4, 7), addTen)
map(data.frame(a=1, b=4, c=7), addTen)
- some of the above
- all of the above
- Which of the following produces a different output?63
map(c(1, 4, 7), addTen)
map(c(1, 4, 7), ~addTen(.x))
map(c(1, 4, 7), ~addTen)
map(c(1, 4, 7), function(hi) (hi + 10))
map(c(1, 4, 7), ~(.x + 10))
- What will the following code output?64
- 3 random normals
- 6 random normals
- 18 random normals
input
# A tibble: 3 × 3
n mean sd
<dbl> <dbl> <dbl>
1 1 1 3
2 2 3 1
3 3 47 10
|>
input pmap(rnorm)
- What is the following error telling me?65
- I haven’t loaded lubridate.
- I can’t add months and days.
- There is no object called
jan31
. months()
is not a function.- There is no error
+ months(0:11) + days(31)
jan31 #> Error in eval(expr, envir, enclos): object 'jan31' not found
- What is the following error telling me?66
- I haven’t loaded lubridate.
- I can’t add months and days.
- There is no object called
jan31
. ymd()
is not a function.- There is no error.
<- ymd("2021-01-31")
jan31 #> Error in ymd("2021-01-31"): could not find function "ymd"
+ months(0:11) + days(31)
jan31 #> Error in eval(expr, envir, enclos): object 'jan31' not found
- What is the following error telling me?67
- I haven’t loaded lubridate.
- I can’t add months and days.
- There is no object called
jan31
. ymd()
is not a function.- There is no error.
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
<- ymd("2021-01-31")
jan31 + months(0:11) + days(31)
jan31 #> [1] "2021-03-03" NA "2021-05-01" NA "2021-07-01"
#> [6] NA "2021-08-31" "2021-10-01" NA "2021-12-01"
#> [11] NA "2022-01-31"
- In R the
ifelse()
function takes the arguments:68
- question, yes, no
- question, no, yes
- statement, yes, no
- statement, no, yes
- option1, option2, option3
- What is the output of the following:69
- “cat”, 30, “cat”, “cat”, 6
- “cat”, “30”, “cat”, “cat”, “6”
- 1, “cat”, 5, “cat”, “cat”
- 1, “cat”, 5, NA, “cat”
- “1”, “cat”, “5”, NA, “cat”
<- c(1, 30, 5, NA, 6)
data
ifelse(data > 5, "cat", data)
- Where can I get feedback on my HW assignments / quizzes?70
- prof will return paper versions
- on Gradescope
- on Canvas
- on GitHub
- Where can I get feedback on my projects?71
- prof will return paper versions
- on Gradescope
- on Canvas
- on GitHub
- In R, the
set.seed()
function72
- makes your computations go faster
- keeps track of your computation time
- provides an important parameter
- repeats the function
- makes your results reproducible
- What does the following give us?73
- the number of hats that match
- the number of hats that don’t match
- the proportion of hats that match
- the proportion of hats that don’t match
- whether or not at least one hat matches
sum(hats == random_hats)
[1] 1
- What does the following give us?74
- the number of hats that match
- the number of hats that don’t match
- the proportion of hats that match
- the proportion of hats that don’t match
- whether or not at least one hat matches
mean(hats == random_hats)
[1] 0.1
- What does the following give us?75
- the number of hats that match
- the number of hats that don’t match
- the proportion of hats that match
- the proportion of hats that don’t match
- whether or not at least one hat matches
sum(hats == random_hats) > 0
[1] TRUE
- In the SAT example, we ran a single iteration and found that the false positive and false negative rates were problematic. What should we do next?76
- Repeat for many iterations.
- Change the initial settings.
- Bring this analysis to the people with power.
- Always use two models.
- Always use only one model.
- In the SAT example, what types of things might we vary?77
- proportion to red vs blue
- how variable the values are:
N(talent, 15)
- different number of times blues get to take the test
- how close
grades
andSAT
are totalent
(bias?)
- What would you want to know from the investment allocation plots?78
- What is the average rate of return?
- What is the maximum rate of return?
- What is the minimum rate of return?
- How often do I lose money?
- If 16 infants with no genuine preference choose 16 toys, what is the most likely number of “helping” toys that will be chosen?79
- 4
- 7
- 8
- 9
- 10
- How likely is it that exactly 8 helpers will be chosen (if there is no preference)?80
- 0-15%
- 16-30%
- 31-49%
- 50%
- 51-100%
- What if we flipped a coin 160 times? What percent of the time will the simulation flip exactly 80 heads?81
- 0-15%
- 16-30%
- 31-49%
- 50%
- 51-100%
- Is our actual result of 14 (under the coin model)…82
- very surprising?
- somewhat surprising?
- not very surprising?
- Hypothesis: the number of hours that grade-school children spend doing homework predicts their future success on standardized tests.83
- null, one sided
- null, two sided
- alternative, one sided
- alternative, two sided
- Hypothesis: king cheetahs on average run the same speed as standard spotted cheetahs.84
- null, one sided
- null, two sided
- alternative, one sided
- alternative, two sided
- Hypothesis: the mean length of African elephant tusks has changed over the last 100 years.85
- null, one sided
- null, two sided
- alternative, one sided
- alternative, two sided
- Hypothesis: the risk of facial clefts is equal for babies born to mothers who take folic acid supplements compared with those from mothers who do not.86
- null, one sided
- null, two sided
- alternative, one sided
- alternative, two sided
- Hypothesis: caffeine intake during pregnancy affects mean birth weight.87
- null, one sided
- null, two sided
- alternative, one sided
- alternative, two sided
- In this class, the word parameter means:88
- The values in a model
- Numbers that need to be tuned
- A number which is calculated from a sample of data.
- A number which (is almost always unknown and) describes a population.
- To run a two-sample permutation test, should you permute the variable with or without replacement?89
- with replacement (
replace = TRUE
) - without replacement (
replace = FALSE
)
- with replacement (
- The histogram is a null sampling distribution for the difference in two means. The red line is the observed value from the data. To compute the p-value, which area should be considered?90
- The area to the left.
- The area to the right.
- Double the area to the left.
- It depends.
- The histogram is a null sampling distribution for the difference in two means. The red line is the observed value from the data. The alternative hypothesis is \(H_A: \mu_1 - \mu_2 \ne 0\). To compute the p-value, which area should be considered?91
- The area to the left.
- The area to the right.
- Double the area to the left.
- It depends.
- How often do you read The Student Life?92
- Every day
- 3-5 times a week
- Once a week
- Rarely
- What do you think is the most common word in the titles of the Student Life opinion articles?93
- stop
- health
- Pomona
- CMC
- students
- How can you tell the difference between an element and an attribute?94
- the elements have
.
- the elements have
#
- the elements have
< >
- the elements have
[ ]
- the elements have
- How do I find all the instances of the
<img>
(image) element?95- use selector:
<img>
- use selector:
.img
- use selector:
#img
- use selector:
[img]
- use selector:
img
- use selector:
- How do I find all the instances of the
href=
(URL) attribute?96- use selector:
<href>
- use selector:
href
- use selector:
#href
- use selector:
[href]
- use selector:
href
- use selector:
- What is the difference between an attribute and an element?97
- an attribute describes an element
- an element describes an attribute
- an attribute is the parent of an element
- an element is the parent of an attribute
- What is a SQL server?98
- A relational database management system.
- A software program whose main purpose is to store and retrieve data.
- A highly secure server that does not allow any database file manipulation during execution.
- All of the above.
- When was SQL created?99
- 1960s
- 1970s
- 1980s
- 1990s
- 2000s
- What type of databases is SQL designed for?100
- hierarchical database management systems.
- network database management systems.
- object-oriented database management systems.
- relational database management systems.
- Which is bigger:101
- computer’s hard drive / storage
- computer’s memory / RAM
- Where are each stored?102
- SQL
tbl
and Rtibble
both in storage - SQL
tbl
and Rtibble
both in memory - SQL
tbl
in storage and Rtibble
in memory - SQL
tbl
in memory and Rtibble
in storage
- Which SQL clause is used to extract data from a database?103
- OPEN
- EXTRACT
- SELECT
- GET
- With SQL, how to you retrieve a column named “FirstName” from a table named “Persons”?104
SELECT
Persons.FirstName- EXTRACT FIRSTNAME
FROM
Persons SELECT
FirstNameFROM
PersonsSELECT
“FirstName”FROM
“Persons”
- With SQL, how do you select all the columns from a table named “Persons”?105
SELECT
PersonsSELECT
*FROM
PersonsSELECT
[all]FROM
PersonsSELECT
*.Persons
- With SQL, how can you return the number of records in the “Persons” table?106
SELECT
COLUMNS(*)
FROM
PersonsSELECT
COUNT(*)
FROM
PersonsSELECT
NO(*)
FROM
PersonsSELECT
LEN(*)
FROM
Persons
- With SQL, how do you select all the records from a table named “Persons” where the value of the column “FirstName” is “Peter”?107
SELECT
*FROM
PersonsWHERE
FirstName <> ‘Peter’SELECT
*FROM
PersonsWHERE
FirstName = ‘Peter’SELECT
*FROM
PersonsWHERE
FirstName == ‘Peter’SELECT
*FROM
PersonsWHERE
FirstNameLIKE
‘Peter’SELECT
[all]FROM
PersonsWHERE
FirstName = ‘Peter’
- With SQL, how do you select all the records from a table named “Persons” where the “FirstName” is “Peter” and the “LastName” is “Jackson”?108
SELECT
FirstName = ‘Peter’, LastName = ‘Jackson’FROM
PersonsSELECT
*FROM
PersonsWHERE
FirstName = Peter’ & LastName = Jackson’SELECT
*FROM
PersonsWHERE
FirstName = ‘Peter’ AND LastName = ‘Jackson’SELECT
*FROM
PersonsWHERE
FirstName = Peter’ | LastName = Jackson’
- Which operator selects values within a range?109
BEWTEEN
WITHIN
RANGE
- With SQL, how do you select all the records from a table named “Persons” where the “LastName” is alphabetically between (and including) “Hansen” and “Pettersen”?110
SELECT
LastName > ‘Hansen’ AND LastName < ‘Pettersen’FROM
PersonsSELECT
*FROM
PersonsWHERE
LastName BETWEEN ‘Hansen’ AND ‘Pettersen’SELECT
*FROM
PersonsWHERE
LastName > ‘Hansen’ AND LastName < ‘Pettersen’
- Which SQL statement returns only different values?111
SELECT
UNIQUE
SELECT
DISTINCT
SELECT
DIFFERENT
- Which SQL keyword is used to sort the result-set?112
ORDER BY
ORDER
SORT
SORT BY
- With SQL, how can you return all the records from a table named “Persons” sorted descending by “FirstName”?113
SELECT
*FROM
PersonsORDER
FirstNameDESC
SELECT
*FROM
PersonsSORT
‘FirstName’DESC
SELECT
*FROM
PersonsORDER BY
FirstNameDESC
SELECT
*FROM
PersonsSORT BY
‘FirstName’DESC
- The OR operator displays a record if ANY conditions listed are true. The AND operator displays a record if ALL of the conditions listed are true.114
- TRUE
- FALSE
- In order to
SELECT
the records with foods that are either green or yellow fruit:115- …
WHERE
type = ‘fruit’ AND color = ‘yellow’ OR color = ‘green’
- …
WHERE
(type = ‘fruit’ AND color = ‘yellow’) OR color = ‘green’
- …
WHERE
type = ‘fruit’ AND (color = ‘yellow’ OR color = ‘green’)
- …
WHERE
type = ‘fruit’ AND color = ‘yellow’ AND color = ‘green’
- …
WHERE
type = ‘fruit’ AND (color = ‘yellow’ AND color = ‘green’)
- …
- What is the purpose of a
JOIN
?116- it filters the rows returned by the
SELECT
statement. - it specifies the columns to be retrieved.
- it combines rows from two or more tables based on a related column.
- it orders the results in ascending or descending order.
- it filters the rows returned by the
- What is the purpose of the
UNION
operator in SQL?117- it combines the results of two or more
SELECT
statements. - it performs a pattern match on a string.
- it retrieves the maximum value in a column.
- it filters the rows returned by the
SELECT
statement.
- it combines the results of two or more
- What is the purpose of the
INNER JOIN
in SQL?118- it retrieves the maximum value in a column.
- it combines rows from two or more tables based on a related column.
- it filters the rows returned by the
SELECT
statement. - it performs a pattern match on a string.
- What is the purpose of the
LEFT JOIN
in SQL?119- it combines rows from two or more tables based on a related column.
- it retrieves the maximum value in a column.
- it filters the rows returned by the
SELECT
statement. - it performs a pattern match on a string.
RIGHT JOIN
keeps all the rows in …?120- the first table.
- the second table.
- both tables.
- neither table
- Who is removed in a
RIGHT JOIN
?121- Mick
- John
- Paul
- Keith
- Which variable(s) are removed in a
RIGHT JOIN
?122- name
- band
- plays
- none of them
- In SQL, what happens to Mick’s “plays” variables in a
FULL JOIN
?123- Mick is removed
- guitar
- bass
- NA
NULL
- With SQL, how do you select all the records from a table named “Persons” where the value of the column “FirstName” starts with an “a”?124
SELECT
*FROM
PersonsWHERE
FirstName = ’a.*’SELECT
*FROM
PersonsWHERE
FirstName = ’a*’SELECT
*FROM
PersonsWHERE
FirstNameREGEXP
’a.*’SELECT
*FROM
PersonsWHERE
FirstNameREGEXP
’a*’SELECT
*FROM
PersonsWHERE
FirstNameREGEXP
’(?i)a.*’
- What is the main way to absolutely recognize a record within a database?125
- Foreign key
- Primary key
- Unique key
- Natural key
- Alternate key
- What does a foreign key do?126
- Directly identifies another table
- Directly identifies another column
- Gives access to another entire database
- Translates the database into another language
- Which of these would likely be used as a foreign key between a table on student enrollment and student grades?127
- grades
- tuition
- student_name
- student_hometown
- For the student records (for two tables: enrollment and grades), which is the most likely combination?128
- name as primary key to both
- name as foreign to both
- name as primary in enrollment and foreign in grades
- name as foreign in enrollment and primary in grades
- Which of the following is the primary function used to create a Shiny app?129
shinyApp()
createApp()
runApp()
startShinyApp()
- Which of the following Shiny components contains the code for handling user inputs and generating outputs?130
ui
server
runApp()
shinyApp()
- Which of the following Shiny UI elements is used to allow users to select a single option from a list of choices?131
selectInput()
radioButtons()
checkboxGroupInput()
textInput()
- In Shiny, what is the purpose of the
renderText()
function?132
- To display a plot as text
- To generate text output based on reactive inputs
- To create a text input field
- To render HTML elements
- What does the
ui
component in a Shiny app represent?133
- The logic of the application
- The server-side calculations
- The user interface elements
- The global settings for the app
- Which Shiny function is used to handle reactive expressions in the server function?134
reactive()
render()
observe()
updateInput()
- What is the default output type for
renderPlot()
in Shiny?135- plotly chart
- ggplot2 plot
- base R plot
- HTML table
- Which of the following is the correct way to create a slider input in Shiny?136
sliderInput("slider", "Slider", min = 1, max = 100, value = 50)
inputSlider("slider", min = 1, max = 100)
sliderControl("slider", 1, 100)
input_slider("slider", 1, 100)
Footnotes
wherever you are, make sure you are communicating with me when you have questions!↩︎
wherever you are, make sure you are communicating with me when you have questions!↩︎
- on Gradescope
- pushing the file(s)
- poor assignment operator
- invalid object name
- unmatched quotes
- no mistake
- improper syntax for a function argument
- I mean, the right answer has to be Yes, right!??!
no right answer here!↩︎
- In the local folder which also has the R project. It could be on the Desktop or the Home directory, but it must be in the same place as the R project. Do not upload files to the remote GitHub directory or you will find yourself with two different copies of the files.
Yes! All the responses are reasons to make a figure.↩︎
- Because that graphic displays the message you want as optimally as possible.
- color must be specified outside the
aes()
function
- color must be specified outside the
- dot color is specified as “navy”, line color is specified as
wday
.
- dot color is specified as “navy”, line color is specified as
- set the information outside the
aes()
function
- set the information outside the
answers may vary. I’d say c. putting the work in context. Others might say b. facilitating comparison or d. simplifying the story. However, I don’t think a correct answer is a. making the data stand out.↩︎
- making the data stand out
- One showed the relevant comparison better.
- It isn’t at the origin. in combination with d. There wasn’t a label explaining why the axes were where they were. The story associated with the average value axes is not clear to the reader.
- babynames in wrong place
- Table c is best because the columns allow us to work with each of the variable separately.
- does something different because it takes the
mean()
(average) instead of thesum()
. The other commands compute the total number of births broken down byyear
andsex
.
- does something different because it takes the
filter()
(year, name)
sum(num)
running the different code chunks with relevant output.↩︎
- on Gradescope
- on Canvas
-country
year
gdpval
(if possible, good idea to name variables something different from the name of the data frame)
- use
pivot_longer()
on raw data. The reference to the study is: Gregory Belenky, Nancy J. Wesensten, David R. Thorne, Maria L. Thomas, Helen C. Sing, Daniel P. Redmond, Michael B. Russo and Thomas J. Balkin (2003) Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: a sleep dose-response study. Journal of Sleep Research 12, 1–12.
- use
- Mick
- none of them (the default is to retain all the variables)
NA
(it would beNULL
in SQL)
- TRUE TRUE TRUE TRUE
- “one -pple” “two p-ars” “three bananas” (because
str_replace()
is vectorized)
- “one -pple” “two p-ars” “three bananas” (because
- “ab” “ifg” Again,
str_sub()
is vectorized. So the subset of string one is from 1 to 2. The subset of string two is from 3 to 5.
- “ab” “ifg” Again,
I don’t know what the answer is. Ill-defined question.↩︎
- different output formatting (the first produces
9
the second producesSep
)
- different output formatting (the first produces
- Today is the 39th week of the year.
- Day of month and day of year. (Day of year is often called the “Julian Day”.)
- FALSE
neither c. nor e. would match. Inside the bracket “[^u]” matches anything other than a “u”, but it has to match something.↩︎
- all of the above. Inside a character class
|
is a normal character and would therefore match “grey” and “gray” and “gr|y”. Which is not what we want, but would work to match both “grey” and “gray”.
- all of the above. Inside a character class
- 1 (because
\d
matches only a single digit).
- 1 (because
- 10 (because
\d+
matches at least one digit).
- 10 (because
- E (because
.
matches anything, and returns only a single character).
- E (because
- Episode 2: The pie whisperer. (4 August 2015) (because
.
matches anything, and with the+
it returns multiple characters).
- Episode 2: The pie whisperer. (4 August 2015) (because
- . (because
\.
matches the period, .).
- . (because
- The first includes Jane.
"(?<=\\$)\\d+"
"\\w+(?= pie)"
- we return the string at the lookaround
- error (code will fail)
- 9
- 8
- 8
map_chr(c(1,4,7), addTen)
because the output is in quotes, the values are strings, not numbers.
- all of the above. The
map()
function allows vectors, lists, and data frames as input.
- all of the above. The
map(c(1, 4, 7), ~addTen)
. The~
acts on functions that do not have their own name or that are defined byfunction(...)
. By adding the argument(.x)
we’ve expanded theaddTen()
function, and so it needs a~
. TheaddTen()
function all alone does not use a~
.
- 6 random normals (1 with mean 1, sd 3; 2 with mean 3, sd 1; 3 with mean 47, sd 10)
- There is no object called
jan31
.
- There is no object called
- I haven’t loaded lubridate (which is why it doesn’t recognize that
ymd()
is not a function).
- I haven’t loaded lubridate (which is why it doesn’t recognize that
- There is no error.
- question, yes, no
- “1”, “cat”, “5”, NA, “cat” (Note that the numbers were converted to character strings!)
- on Gradescope
- on Canvas
- makes your results reproducible
- the number of hats that match
- the proportion of hats that match
- whether or not at least one hat matches
- Repeat for many iterations. (The next step needs to gather information on how the FP and FN results hold, it might have just been something odd in my simulation… )
all of the above↩︎
It totally depends on your personality and your finances. b. doesn’t make much sense. But a., c., and d. are all very reasonable questions to ask about your investments.↩︎
- 8
- 0.196 (19.6% of the time)
- 0.063 (6.3% of the time)
- very surprising (prob of 14 or more is 0.0021)
- alternative, one sided (because probably we are studying that it increases their success rate)
- null, two sided (because I have no idea which cheetah might run faster)
- alternative, two sided (because I have no idea whether they’ve increased or decreased)
- null, one sided (because I happen to know that folic acid is thought to prevent facial clefts)
- alternative, one sided (because I happen to know that caffeine is thought to decrease baby’s birth weight)
- A number which (is almost always unknown and) describes a population.
- without replacement (
replace = FALSE
)
- without replacement (
- It depends. a. would be the correct answer if the alternative hypothesis is \(\mu_1 - \mu_2 < 0\), b. would be the right answer if the alternative hypothesis is \(\mu_1 - \mu_2 > 0\) and c. would be the right answer if the alternative hypothesis is \(\mu_1 - \mu_2 \ne 0\).
- Double the area to the left.
there can’t possibly be a right answer here.↩︎
- students (is the top word over the last 500 opinion articles)
- the elements have
< >
- the elements have
- use selector:
img
- use selector:
- use selector:
[href]
- use selector:
- an attribute describes an element
- A relational database management system.
- The first versions were created in the 1970s and called SEQUEL (Structured English QUEry Language). c. SQL came about in particular systems in the 1980s.
- relational database management systems.
- computer’s hard drive / storage
- SQL
tbl
in storage and Rtibble
in memory
- SQL
- SELECT
SELECT
FirstNameFROM
Persons
SELECT
*FROM
Persons
SELECT
COUNT(*)
FROM
Persons
SELECT
*FROM
PersonsWHERE
FirstName = ‘Peter’ (d. would also work.)
SELECT
*FROM
PersonsWHERE
FirstName = ‘Peter’ AND LastName = ‘Jackson’
BEWTEEN
SELECT
*FROM
PersonsWHERE
LastName BETWEEN ‘Hansen’ AND ‘Pettersen’
SELECT
DISTINCT
ORDER BY
SELECT
*FROM
Persons ORDER BY FirstName DESC
- TRUE
- …
WHERE
type = ‘fruit’ AND (color = ‘yellow’ OR color = ‘green’)
- …
- it combines rows from two or more tables based on a related column.
- it combines the results of two or more
SELECT
statements.
- it combines the results of two or more
- it combines rows from two or more tables based on a related column.
- it combines rows from two or more tables based on a related column.
- the first table
- Mick
- none of them (all variables are kept in all joins)
NULL
(it would be NA in R)
SELECT
*FROM
PersonsWHERE
FirstNameREGEXP
’(?i)a.*’ (n.b., theLIKE
function will give you a similar result, with%
as a wildcard: SELECT*
FROMPersons
WHERE` FirstName LIKE ‘a%’)
- Primary key
- Directly identifies another column
- student_name
- name as primary in enrollment and foreign in grades (the primary key must uniquely identify the records, and name is unlikely to do that in a grades database.)
shinyApp()
server
selectInput()
- To generate text output based on reactive inputs
- The user interface elements
- reactive()
- base R plot
sliderInput("slider", "Slider", min = 1, max = 100, value = 50)