Clicker Questions

to go along with

Modern Data Science with R, 3rd edition by Baumer, Kaplan, and Horton

R for Data Science, 2nd edition by Wickham, Çetinkaya-Rundel, and Grolemund

R / R Studio / Quarto¹
1. all good
2. started, progress is slow and steady
3. started, very stuck
4. haven’t started yet
5. what do you mean by “R”?

Git / GitHub²
1. all good
2. started, progress is slow and steady
3. started, very stuck
4. haven’t started yet
5. what do you mean by “Git”?

Where can I get feedback on my HW assignments / quizzes?³
1. prof will return paper versions
2. on Gradescope
3. on Canvas
4. on GitHub

Which of the following includes talking to the remove version of GitHub?⁴
1. changing your name (updating the YAML)
2. committing the file(s)
3. pushing the file(s)
4. some of the above
5. all of the above

What is the error?⁵
1. poor assignment operator
2. unmatched quotes
3. improper syntax for function argument
4. invalid object name
5. no mistake

shup2 <-- "Hello to you!"

What is the error?⁶
1. poor assignment operator
2. unmatched quotes
3. improper syntax for function argument
4. invalid object name
5. no mistake

3shup <-  "Hello to you!"

What is the error?⁷
1. poor assignment operator
2. unmatched quotes
3. improper syntax for function argument
4. invalid object name
5. no mistake

shup4 <-  "Hello to you!

What is the error?⁸
1. poor assignment operator
2. unmatched quotes
3. improper syntax for function argument
4. invalid object name
5. no mistake

shup5 <-  date()

What is the error?⁹
1. poor assignment operator
2. unmatched quotes
3. improper syntax for function argument
4. invalid object name
5. no mistake

shup6 <-  sqrt 10

Do you keep a calendar / schedule / planner?¹⁰
1. Yes
2. No

Do you keep a calendar / schedule / planner? If you answered “Yes” …¹¹
1. Yes, on Google Calendar
2. Yes, on Calendar for macOS
3. Yes, on Outlook for Windows
4. Yes, in some other app
5. Yes, by hand

Where should I put things I’ve created for the HW (e.g., data, .ics file, etc.)¹²
1. Upload into remote GitHub directory
2. In the local folder which also has the R project
3. In my Downloads
4. Somewhere on my Desktop
5. In my Home directory

The goal of making a figure is…¹³
1. To draw attention to your work.
2. To facilitate comparisons.
3. To provide as much information as possible.

A good reason to make a particular choice of a graph is:¹⁴
1. Because the journal / field has particular expectations for how the data are presented.
2. Because some variables naturally fit better on some graphs (e.g., numbers on scatter plots).
3. Because that graphic displays the message you want as optimally as possible.

Why are the points orange?¹⁵
1. R translates “navy” into orange.
2. color must be specified in geom_point()
3. color must be specified outside the aes() function
4. the default plot color is orange

ggplot(data = Births78, 
       aes(x = date, y = births, color = "navy")) + 
  geom_point() +          
  labs(title = "US Births in 1978")

Why are the dots blue and the lines colored?¹⁶
1. dot color is given as “navy”, line color is given as wday.
2. both colors are specified in the ggplot() function.
3. dot coloring takes precedence over line coloring.
4. line coloring takes precedence over dot coloring.

Setting vs. Mapping. If I want information to be passed to all data points (not variable):¹⁷
1. map the information inside the aes() function.
2. set the information outside the aes() function

The Snow figure was most successful at:¹⁸
1. making the data stand out
2. facilitating comparison
3. putting the work in context
4. simplifying the story

The Challenger figure(s) was(were) least successful at:¹⁹
1. making the data stand out
2. facilitating comparison
3. putting the work in context
4. simplifying the story

The biggest difference between Snow and the Challenger was:²⁰
1. The amount of information portrayed.
2. One was better at displaying cause.
3. One showed the relevant comparison better.
4. One was more artistic.

Caffeine and Calories. What was the biggest concern over the average value axes?²¹
1. It isn’t at the origin.
2. They should have used all the data possible to find averages.
3. There wasn’t a random sample.
4. There wasn’t a label explaining why the axes were where they were.

What is wrong with the following code?²²
1. should only be one =
2. Sydney should be lower case
3. name should not be in quotes
4. use mutate instead of filter
5. babynames in wrong place

Result <- |> filter(babynames,
        name== “Sydney”)

Which data represents the ideal format for ggplot2 and dplyr?²³

table a
year	Algeria	Brazil	Columbia
2000	7	12	16
2001	9	14	18

table b
country	Y2000	Y2001
Algeria	7	9
Brazil	12	14
Columbia	16	18

table c
country	year	value
Algeria	2000	7
Algeria	2001	9
Brazil	2000	12
Brazil	2001	14
Columbia	2000	16
Columbia	2001	18

Each of the statements except one will accomplish the same calculation. Which one does not match?²⁴

#(a) 
babynames |> 
  group_by(year, sex) |> 
  summarize(totalBirths = sum(num))

#(b) 
group_by(babynames, year, sex) |> 
  summarize(totalBirths = sum(num))

#(c)
group_by(babynames, year, sex) |> 
  summarize(totalBirths = mean(num))

#(d)
temp <- group_by(babynames, year, sex)

summarize(temp, totalBirths = sum(num))

#(e)
summarize(group_by(babynames, year, sex), 
          totalBirths = sum(num))

Fill in Q1.²⁵
1. filter()
2. arrange()
3. select()
4. mutate()
5. group_by()

result <- babynames |>
  Q1(name %in% c("Jane", "Mary")) |> 
  # just the Janes and Marys
  group_by(Q2, Q2) |> 
  summarize(total = Q3)

Fill in Q2.²⁶
1. (year, sex)
2. (year, name)
3. (year, num)
4. (sex, name)
5. (sex, num)

result <- babynames |>
  Q1(name %in% c("Jane", "Mary")) |> 
  group_by(Q2, Q2) |> 
  # for each year for each name
  summarize(total = Q3)

Fill in Q3.²⁷
1. n_distinct(name)
2. n_distinct(n)
3. sum(name)
4. sum(num)
5. mean(num)

result <- babynames |>
  Q1(name %in% c("Jane", "Mary")) |> 
  group_by(Q2, Q2) |> 
  summarize(total = Q3)
  # number of babies (each year, each name)

Running the code.²⁸

babynames <- babynames::babynames |> 
  rename(num = n)

babynames |>
  filter(name %in% c("Jane", "Mary")) |> 
  # just the Janes and Marys
  group_by(name, year) |> 
  # for each year for each name
  summarize(total = sum(num))

# A tibble: 276 × 3
# Groups:   name [2]
   name   year total
   <chr> <dbl> <int>
 1 Jane   1880   215
 2 Jane   1881   216
 3 Jane   1882   254
 4 Jane   1883   247
 5 Jane   1884   295
 6 Jane   1885   330
 7 Jane   1886   306
 8 Jane   1887   288
 9 Jane   1888   446
10 Jane   1889   374
# ℹ 266 more rows

babynames |>
  filter(name %in% c("Jane", "Mary")) |> 
  group_by(name, year) |> 
  summarize(number = sum(num))

# A tibble: 276 × 3
# Groups:   name [2]
   name   year number
   <chr> <dbl>  <int>
 1 Jane   1880    215
 2 Jane   1881    216
 3 Jane   1882    254
 4 Jane   1883    247
 5 Jane   1884    295
 6 Jane   1885    330
 7 Jane   1886    306
 8 Jane   1887    288
 9 Jane   1888    446
10 Jane   1889    374
# ℹ 266 more rows

babynames |>
  filter(name %in% c("Jane", "Mary")) |> 
  group_by(name, year) |> 
  summarize(n_distinct(name))

# A tibble: 276 × 3
# Groups:   name [2]
   name   year `n_distinct(name)`
   <chr> <dbl>              <int>
 1 Jane   1880                  1
 2 Jane   1881                  1
 3 Jane   1882                  1
 4 Jane   1883                  1
 5 Jane   1884                  1
 6 Jane   1885                  1
 7 Jane   1886                  1
 8 Jane   1887                  1
 9 Jane   1888                  1
10 Jane   1889                  1
# ℹ 266 more rows

babynames |>
  filter(name %in% c("Jane", "Mary")) |> 
  group_by(name, year) |> 
  summarize(n_distinct(num))

# A tibble: 276 × 3
# Groups:   name [2]
   name   year `n_distinct(num)`
   <chr> <dbl>             <int>
 1 Jane   1880                 1
 2 Jane   1881                 1
 3 Jane   1882                 1
 4 Jane   1883                 1
 5 Jane   1884                 1
 6 Jane   1885                 1
 7 Jane   1886                 1
 8 Jane   1887                 1
 9 Jane   1888                 1
10 Jane   1889                 1
# ℹ 266 more rows

babynames |>
  filter(name %in% c("Jane", "Mary")) |> 
  group_by(name, year) |> 
  summarize(sum(name))

Error in `summarize()`:
ℹ In argument: `sum(name)`.
ℹ In group 1: `name = "Jane"` and `year = 1880`.
Caused by error in `base::sum()`:
! invalid 'type' (character) of argument

babynames |>
  filter(name %in% c("Jane", "Mary")) |> 
  group_by(name, year) |> 
  summarize(mean(num))

# A tibble: 276 × 3
# Groups:   name [2]
   name   year `mean(num)`
   <chr> <dbl>       <dbl>
 1 Jane   1880         215
 2 Jane   1881         216
 3 Jane   1882         254
 4 Jane   1883         247
 5 Jane   1884         295
 6 Jane   1885         330
 7 Jane   1886         306
 8 Jane   1887         288
 9 Jane   1888         446
10 Jane   1889         374
# ℹ 266 more rows

babynames |>
  filter(name %in% c("Jane", "Mary")) |> 
  group_by(name, year) |> 
  summarize(median(num))

# A tibble: 276 × 3
# Groups:   name [2]
   name   year `median(num)`
   <chr> <dbl>         <dbl>
 1 Jane   1880           215
 2 Jane   1881           216
 3 Jane   1882           254
 4 Jane   1883           247
 5 Jane   1884           295
 6 Jane   1885           330
 7 Jane   1886           306
 8 Jane   1887           288
 9 Jane   1888           446
10 Jane   1889           374
# ℹ 266 more rows

Where can I get feedback on my HW assignments / quizzes?²⁹
1. prof will return paper versions
2. on Gradescope
3. on Canvas
4. on GitHub

Where can I get feedback on my projects?³⁰
1. prof will return paper versions
2. on Gradescope
3. on Canvas
4. on GitHub

Fill in Q1.³¹
1. gdp
2. year
3. gdpval
4. country
5. –country

GDP |>  
  select(country = starts_with("Income"), everything()) |> 
       pivot_longer(cols = Q1, 
                    names_to = Q2, 
                    values_to = Q3)

Fill in Q2.³²
1. gdp
2. year
3. gdpval
4. country
5. –country

GDP |>  
  select(country = starts_with("Income"), everything()) |> 
       pivot_longer(cols = Q1, 
                    names_to = Q2, 
                    values_to = Q3)

Fill in Q3.³³
1. gdp
2. year
3. gdpval
4. country
5. –country

GDP |>  
  select(country = starts_with("Income"), everything()) |> 
       pivot_longer(cols = Q1, 
                    names_to = Q2, 
                    values_to = Q3)

Response to stimulus (in ms) after only 3 hrs of sleep for 9 days. You want to make a plot with the subject’s reaction time (y-axis) vs the number of days of sleep restriction (x-axis) using the following ggplot() code. Which data frame should you use?³⁴
1. use raw data
2. use pivot_wider() on raw data
3. use pivot_longer() on raw data

ggplot(___, aes(x = ___, y = ___, color = ___)) + 
  geom_line()

# A tibble: 18 × 11
   Subject day_0 day_1 day_2 day_3 day_4 day_5 day_6 day_7 day_8 day_9
     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     308  250.  259.  251.  321.  357.  415.  382.  290.  431.  466.
 2     309  223.  205.  203.  205.  208.  216.  214.  218.  224.  237.
 3     310  199.  194.  234.  233.  229.  220.  235.  256.  261.  248.
 4     330  322.  300.  284.  285.  286.  298.  280.  318.  305.  354.
 5     331  288.  285   302.  320.  316.  293.  290.  335.  294.  372.
 6     332  235.  243.  273.  310.  317.  310   454.  347.  330.  254.
 7     333  284.  290.  277.  300.  297.  338.  332.  349.  333.  362.
 8     334  265.  276.  243.  255.  279.  284.  306.  332.  336.  377.
 9     335  242.  274.  254.  271.  251.  255.  245.  235.  236.  237.
10     337  312.  314.  292.  346.  366.  392.  404.  417.  456.  459.
11     349  236.  230.  239.  255.  251.  270.  282.  308.  336.  352.
12     350  256.  243.  256.  256.  269.  330.  379.  363.  394.  389.
13     351  251.  300.  270.  281.  272.  305.  288.  267.  322.  348.
14     352  222.  298.  327.  347.  349.  353.  354.  360.  376.  389.
15     369  272.  268.  257.  278.  315.  317.  298.  348.  340.  367.
16     370  225.  235.  239.  240.  268.  344.  281.  348.  365.  372.
17     371  270.  272.  278.  282.  279.  285.  259.  305.  351.  369.
18     372  269.  273.  298.  311.  287.  330.  334.  343.  369.  364.

sleep_long <- sleep_wide |>
  pivot_longer(cols = -Subject,
               names_to = "day",
               names_prefix = "day_",
               values_to = "reaction_time")

sleep_long

# A tibble: 180 × 3
   Subject day   reaction_time
     <dbl> <chr>         <dbl>
 1     308 0              250.
 2     308 1              259.
 3     308 2              251.
 4     308 3              321.
 5     308 4              357.
 6     308 5              415.
 7     308 6              382.
 8     308 7              290.
 9     308 8              431.
10     308 9              466.
# ℹ 170 more rows

Consider band members from the Beatles and the Rolling Stones. Who is removed in a right_join()?³⁵

Mick
John
Paul
Keith
Impossible to know

band_members |> 
  right_join(band_instruments, by = "name")

band_members

# A tibble: 3 × 2
  name  band   
  <chr> <chr>  
1 Mick  Stones 
2 John  Beatles
3 Paul  Beatles

band_instruments

# A tibble: 3 × 2
  name  plays 
  <chr> <chr> 
1 John  guitar
2 Paul  bass  
3 Keith guitar

Consider band members from the Beatles and the Rolling Stones. Which variables are removed in a right_join()?³⁶

name
band
plays
none of them

band_members |> 
  right_join(band_instruments, by = "name")

band_members

# A tibble: 3 × 2
  name  band   
  <chr> <chr>  
1 Mick  Stones 
2 John  Beatles
3 Paul  Beatles

band_instruments

# A tibble: 3 × 2
  name  plays 
  <chr> <chr> 
1 John  guitar
2 Paul  bass  
3 Keith guitar

What happens to Mick’s plays variable in a full_join()?³⁷

Mick is removed
changes to guitar
changes to bass
NA
NULL

band_members |> 
  full_join(band_instruments, by = "name")

band_members

# A tibble: 3 × 2
  name  band   
  <chr> <chr>  
1 Mick  Stones 
2 John  Beatles
3 Paul  Beatles

band_instruments

# A tibble: 3 × 2
  name  plays 
  <chr> <chr> 
1 John  guitar
2 Paul  bass  
3 Keith guitar

What is the output of the following R code?³⁸

TRUE
TRUE TRUE TRUE TRUE
TRUE FALSE FALSE FALSE
FALSE

fruit <- c("apple", "banana", "pear", "pineapple")
str_detect(fruit, "a")

What is the output of the following R code?³⁹

“one -pple” “two p-ars” “three bananas”
“on- -ppl-” “two p–rs” “thr– b-n-n-s”
“on- apple” “two p-ars” “thr-e bananas”

fruits <- c("one apple", "two pears", "three bananas")
str_replace(fruits, c("a", "e", "i"), "-")

What is the output of the following R code?⁴⁰
1. “abc” “hifg”
2. “ab” “hifg”
3. “ab” “ifg”
4. “abc” “ifg”

x <- c("abcde", "ghifgh")
str_sub(x, start = c(1, 3), end = c(2, 5))

What is January 31 + one month?⁴¹
1. February 31
2. March 4
3. February 28 (assuming no leap year)
4. I don’t want to answer the question

What is the difference between what these two lines of code?⁴²
1. same thing
2. different months
3. different output formatting
4. different input
5. different calculation

library(lubridate)
today <- ymd("2024-09-25")
month(today)
month(today, label = TRUE)

What does this number mean?⁴³
1. Today is the 39th day of the month.
2. Today is the 39th day of the year.
3. Today is the 39th week of the month.
4. Today is the 39th week of the year.

today <- ymd("2024-09-25")
week(today)

[1] 39

What is the difference in these two functions?⁴⁴

Day of month and day of year.
Day of month and day of week.
Day of week and day of year.
Day of weekend and day of month.

mday(today)

[1] 25

yday(today)

[1] 269

What is the result of the code?⁴⁵
1. TRUE
2. FALSE
3. “2025-09-01”
4. “2025-09-25”

today > ymd("2025-09-01")

grep("q[^u]", very.large.word.list) would not match which of the following?⁴⁶
1. Iraqi
2. Iraqian
3. Iraq
4. zaqqun (tree that “springs out of the bottom of Hell”, in the Quran)
5. Qantas (the Australian airline)

Which of the following regex would match to both “grey” and “gray”?⁴⁷
1. “gr[ae]y”
2. “gr(a|e)y”
3. “gray | grey”
4. “gr[a|e]y”
5. some / all of the above – which ones?

What will the result be for the following code?⁴⁸
1. 10
2. 1
3. 0
4. NA

str_extract("My dog is 10 years old", "\\d")

What will the result be for the following code?⁴⁹
1. 10
2. 1
3. 0
4. NA

str_extract("My dog is 10 years old", "\\d+")

What will the result be for the following code?⁵⁰
1. .
2. Episode 2: The pie whisperer. (4 August 2015)
3. Episode
4. E

str_extract("Episode 2: The pie whisperer. (4 August 2015)", ".")

What will the result be for the following code?⁵¹
1. .
2. Episode 2: The pie whisperer. (4 August 2015)
3. Episode
4. E

str_extract("Episode 2: The pie whisperer. (4 August 2015)", ".+")

What will the result be for the following code?⁵²
1. .
2. Episode 2: The pie whisperer. (4 August 2015)
3. Episode
4. E

str_extract("Episode 2: The pie whisperer. (4 August 2015)", "\\.")

What is the difference between the output for the two regular expressions below?⁵³
1. They give the same result.
2. The first is not case sensitive.
3. The second allow for all the variants.
4. The first includes Jane.

string <- c("Mary", "Mar", "Janet", "jane", "Susan", "Sue")
str_extract(string, "\\bMary|Jane|Sue\\b")
str_extract(string, "\\b(Mary|Jane|Sue)\\b")

How can I pull out just the numerical information in “$47”?⁵⁴
1. "(?<=\\$)\\d"
2. "(?<=\\$)\\d+"
3. "\\d(?=\\$)"
4. "\\d+(?=\\$)"

You want to know all the types of pies in the text strings. They are written as, for example “apple pie”.⁵⁵
1. "\\w+(?!pie)"
2. "\\w+(?! pie)"
3. "\\w+(?=pie)"
4. "\\w+(?= pie)"

str_extract(c("apple pie", "chocolate pie", "peach pie"), "\\w+(?= pie)")

[1] "apple"     "chocolate" "peach"

str_extract(c("apple pie", "chocolate pie", "peach pie"), "\\w+(?=pie)")

[1] NA NA NA

We say that lookarounds are “zero-lenghth assertions”. What does that mean?⁵⁶
1. we return the string in the lookaround
2. we replace the string in the lookaround
3. we return the string at the lookaround
4. we replace the string at the lookaround

What will happen when I run the following code?⁵⁷
1. 0
2. 3
3. 9
4. NA
5. error (code will fail)

my_power <- function(x, y){
  return(x^y)
}
my_power(3)

What will happen when I run the following code?⁵⁸
1. 0
2. 3
3. 9
4. NA
5. error (code will fail)

my_power <- function(x, y = 2){
  return(x^y)
}
my_power(3)

What will happen when I run the following code?⁵⁹
1. 4
2. 8
3. 9
4. NA
5. error (code will fail)

my_power <- function(x, y = 2){
  return(x^y)
}
my_power(2, 3)

What will happen when I run the following code?⁶⁰
1. 4
2. 8
3. 9
4. NA
5. error (code will fail)

my_power <- function(x = 2, y = 3){
  return(x^y)
}
my_power( )

Consider the addTen() function. The following output is a result of which map_*() call?⁶¹

map(c(1,4,7), addTen)
map_dbl(c(1,4,7), addTen)
map_chr(c(1,4,7), addTen)
map_lgl(c(1,4,7), addTen)

addTen <- function(wow) {
  return(wow + 10)
}

[1] "11.000000" "14.000000" "17.000000"

Which of the following input is allowed?⁶²
1. map(c(1, 4, 7), addTen)
2. map(list(1, 4, 7), addTen)
3. map(data.frame(a=1, b=4, c=7), addTen)
4. some of the above
5. all of the above

Which of the following produces a different output?⁶³
1. map(c(1, 4, 7), addTen)
2. map(c(1, 4, 7), ~addTen(.x))
3. map(c(1, 4, 7), ~addTen)
4. map(c(1, 4, 7), function(hi) (hi + 10))
5. map(c(1, 4, 7), ~(.x + 10))

What will the following code output?⁶⁴
1. 3 random normals
2. 6 random normals
3. 18 random normals

input

# A tibble: 3 × 3
      n  mean    sd
  <dbl> <dbl> <dbl>
1     1     1     3
2     2     3     1
3     3    47    10

input |> 
  pmap(rnorm)

What is the following error telling me?⁶⁵

I haven’t loaded lubridate.
I can’t add months and days.
There is no object called jan31.
months() is not a function.
There is no error

jan31 + months(0:11) + days(31)
#> Error in eval(expr, envir, enclos): object 'jan31' not found

What is the following error telling me?⁶⁶

I haven’t loaded lubridate.
I can’t add months and days.
There is no object called jan31.
ymd() is not a function.
There is no error.

  jan31 <- ymd("2021-01-31")
#> Error in ymd("2021-01-31"): could not find function "ymd"
  jan31 + months(0:11) + days(31)
#> Error in eval(expr, envir, enclos): object 'jan31' not found

What is the following error telling me?⁶⁷

I haven’t loaded lubridate.
I can’t add months and days.
There is no object called jan31.
ymd() is not a function.
There is no error.

  library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
  jan31 <- ymd("2021-01-31")
  jan31 + months(0:11) + days(31)
#>  [1] "2021-03-03" NA           "2021-05-01" NA           "2021-07-01"
#>  [6] NA           "2021-08-31" "2021-10-01" NA           "2021-12-01"
#> [11] NA           "2022-01-31"

In R the ifelse() function takes the arguments:⁶⁸

question, yes, no
question, no, yes
statement, yes, no
statement, no, yes
option1, option2, option3

What is the output of the following:⁶⁹
1. “cat”, 30, “cat”, “cat”, 6
2. “cat”, “30”, “cat”, “cat”, “6”
3. 1, “cat”, 5, “cat”, “cat”
4. 1, “cat”, 5, NA, “cat”
5. “1”, “cat”, “5”, NA, “cat”

data <- c(1, 30, 5, NA, 6)

ifelse(data > 5, "cat", data)

Where can I get feedback on my HW assignments / quizzes?⁷⁰
1. prof will return paper versions
2. on Gradescope
3. on Canvas
4. on GitHub

Where can I get feedback on my projects?⁷¹
1. prof will return paper versions
2. on Gradescope
3. on Canvas
4. on GitHub

In R, the set.seed() function⁷²

makes your computations go faster
keeps track of your computation time
provides an important parameter
repeats the function
makes your results reproducible

What does the following give us?⁷³

the number of hats that match
the number of hats that don’t match
the proportion of hats that match
the proportion of hats that don’t match
whether or not at least one hat matches

sum(hats == random_hats)

[1] 1

What does the following give us?⁷⁴

the number of hats that match
the number of hats that don’t match
the proportion of hats that match
the proportion of hats that don’t match
whether or not at least one hat matches

mean(hats == random_hats)

[1] 0.1

What does the following give us?⁷⁵

the number of hats that match
the number of hats that don’t match
the proportion of hats that match
the proportion of hats that don’t match
whether or not at least one hat matches

sum(hats == random_hats) > 0

[1] TRUE

In the SAT example, we ran a single iteration and found that the false positive and false negative rates were problematic. What should we do next?⁷⁶

Repeat for many iterations.
Change the initial settings.
Bring this analysis to the people with power.
Always use two models.
Always use only one model.

In the SAT example, what types of things might we vary?⁷⁷

proportion to red vs blue
how variable the values are: N(talent, 15)
different number of times blues get to take the test
how close grades and SAT are to talent (bias?)

What would you want to know from the investment allocation plots?⁷⁸
1. What is the average rate of return?
2. What is the maximum rate of return?
3. What is the minimum rate of return?
4. How often do I lose money?

If 16 infants with no genuine preference choose 16 toys, what is the most likely number of “helping” toys that will be chosen?⁷⁹

How likely is it that exactly 8 helpers will be chosen (if there is no preference)?⁸⁰

0-15%
16-30%
31-49%
50%
51-100%

What if we flipped a coin 160 times? What percent of the time will the simulation flip exactly 80 heads?⁸¹

0-15%
16-30%
31-49%
50%
51-100%

Is our actual result of 14 (under the coin model)…⁸²

very surprising?
somewhat surprising?
not very surprising?

Hypothesis: the number of hours that grade-school children spend doing homework predicts their future success on standardized tests.⁸³
1. null, one sided
2. null, two sided
3. alternative, one sided
4. alternative, two sided

Hypothesis: king cheetahs on average run the same speed as standard spotted cheetahs.⁸⁴
1. null, one sided
2. null, two sided
3. alternative, one sided
4. alternative, two sided

Hypothesis: the mean length of African elephant tusks has changed over the last 100 years.⁸⁵
1. null, one sided
2. null, two sided
3. alternative, one sided
4. alternative, two sided

Hypothesis: the risk of facial clefts is equal for babies born to mothers who take folic acid supplements compared with those from mothers who do not.⁸⁶
1. null, one sided
2. null, two sided
3. alternative, one sided
4. alternative, two sided

Hypothesis: caffeine intake during pregnancy affects mean birth weight.⁸⁷
1. null, one sided
2. null, two sided
3. alternative, one sided
4. alternative, two sided

In this class, the word parameter means:⁸⁸
1. The values in a model
2. Numbers that need to be tuned
3. A number which is calculated from a sample of data.
4. A number which (is almost always unknown and) describes a population.

To run a two-sample permutation test, should you permute the variable with or without replacement?⁸⁹
1. with replacement (replace = TRUE)
2. without replacement (replace = FALSE)

The histogram is a null sampling distribution for the difference in two means. The red line is the observed value from the data. To compute the p-value, which area should be considered?⁹⁰

The area to the left.
The area to the right.
Double the area to the left.
It depends.

The histogram is a null sampling distribution for the difference in two means. The red line is the observed value from the data. The alternative hypothesis is $H_A: \mu_1 - \mu_2 \ne 0$. To compute the p-value, which area should be considered?⁹¹

The area to the left.
The area to the right.
Double the area to the left.
It depends.

How often do you read The Student Life?⁹²
1. Every day
2. 3-5 times a week
3. Once a week
4. Rarely

What do you think is the most common word in the titles of the Student Life opinion articles?⁹³
1. stop
2. health
3. Pomona
4. CMC
5. students

How can you tell the difference between an element and an attribute?⁹⁴
1. the elements have .
2. the elements have #
3. the elements have < >
4. the elements have [ ]

How do I find all the instances of the <img> (image) element?⁹⁵
1. use selector: <img>
2. use selector: .img
3. use selector: #img
4. use selector: [img]
5. use selector: img

How do I find all the instances of the href= (URL) attribute?⁹⁶
1. use selector: <href>
2. use selector: href
3. use selector: #href
4. use selector: [href]
5. use selector: href

What is the difference between an attribute and an element?⁹⁷

an attribute describes an element
an element describes an attribute
an attribute is the parent of an element
an element is the parent of an attribute

What is a SQL server?⁹⁸

A relational database management system.
A software program whose main purpose is to store and retrieve data.
A highly secure server that does not allow any database file manipulation during execution.
All of the above.

When was SQL created?⁹⁹
1. 1960s
2. 1970s
3. 1980s
4. 1990s
5. 2000s

What type of databases is SQL designed for?¹⁰⁰

hierarchical database management systems.
network database management systems.
object-oriented database management systems.
relational database management systems.

Which is bigger:¹⁰¹
1. computer’s hard drive / storage
2. computer’s memory / RAM

Where are each stored?¹⁰²

SQL tbl and R tibble both in storage
SQL tbl and R tibble both in memory
SQL tbl in storage and R tibble in memory
SQL tbl in memory and R tibble in storage

Which SQL clause is used to extract data from a database?¹⁰³

OPEN
EXTRACT
SELECT
GET

With SQL, how to you retrieve a column named “FirstName” from a table named “Persons”?¹⁰⁴

SELECT Persons.FirstName
EXTRACT FIRSTNAME FROM Persons
SELECT FirstName FROM Persons
SELECT “FirstName” FROM “Persons”

With SQL, how do you select all the columns from a table named “Persons”?¹⁰⁵

SELECT Persons
SELECT * FROM Persons
SELECT [all] FROM Persons
SELECT *.Persons

With SQL, how can you return the number of records in the “Persons” table?¹⁰⁶

SELECT COLUMNS(*) FROM Persons
SELECT COUNT(*) FROM Persons
SELECT NO(*) FROM Persons
SELECT LEN(*) FROM Persons

With SQL, how do you select all the records from a table named “Persons” where the value of the column “FirstName” is “Peter”?¹⁰⁷

SELECT * FROM Persons WHERE FirstName <> ‘Peter’
SELECT * FROM Persons WHERE FirstName = ‘Peter’
SELECT * FROM Persons WHERE FirstName == ‘Peter’
SELECT * FROM Persons WHERE FirstName LIKE ‘Peter’
SELECT [all] FROM Persons WHERE FirstName = ‘Peter’

With SQL, how do you select all the records from a table named “Persons” where the “FirstName” is “Peter” and the “LastName” is “Jackson”?¹⁰⁸

SELECT FirstName = ‘Peter’, LastName = ‘Jackson’ FROM Persons
SELECT * FROM Persons WHERE FirstName = Peter’ & LastName = Jackson’
SELECT * FROM Persons WHERE FirstName = ‘Peter’ AND LastName = ‘Jackson’
SELECT * FROM Persons WHERE FirstName = Peter’ | LastName = Jackson’

Which operator selects values within a range?¹⁰⁹
1. BEWTEEN
2. WITHIN
3. RANGE

With SQL, how do you select all the records from a table named “Persons” where the “LastName” is alphabetically between (and including) “Hansen” and “Pettersen”?¹¹⁰
1. SELECT LastName > ‘Hansen’ AND LastName < ‘Pettersen’ FROM Persons
2. SELECT * FROM Persons WHERE LastName BETWEEN ‘Hansen’ AND ‘Pettersen’
3. SELECT * FROM Persons WHERE LastName > ‘Hansen’ AND LastName < ‘Pettersen’

Which SQL statement returns only different values?¹¹¹
1. SELECT UNIQUE
2. SELECT DISTINCT
3. SELECT DIFFERENT

Which SQL keyword is used to sort the result-set?¹¹²
1. ORDER BY
2. ORDER
3. SORT
4. SORT BY

With SQL, how can you return all the records from a table named “Persons” sorted descending by “FirstName”?¹¹³
1. SELECT * FROM Persons ORDER FirstName DESC
2. SELECT * FROM Persons SORT ‘FirstName’ DESC
3. SELECT * FROM Persons ORDER BY FirstName DESC
4. SELECT * FROM Persons SORT BY ‘FirstName’ DESC

The OR operator displays a record if ANY conditions listed are true. The AND operator displays a record if ALL of the conditions listed are true.¹¹⁴
1. TRUE
2. FALSE

In order to SELECT the records with foods that are either green or yellow fruit:¹¹⁵
1. … WHERE type = ‘fruit’ AND color = ‘yellow’ OR color = ‘green’
2. … WHERE (type = ‘fruit’ AND color = ‘yellow’) OR color = ‘green’
3. … WHERE type = ‘fruit’ AND (color = ‘yellow’ OR color = ‘green’)
4. … WHERE type = ‘fruit’ AND color = ‘yellow’ AND color = ‘green’
5. … WHERE type = ‘fruit’ AND (color = ‘yellow’ AND color = ‘green’)

What is the purpose of a JOIN?¹¹⁶
1. it filters the rows returned by the SELECT statement.
2. it specifies the columns to be retrieved.
3. it combines rows from two or more tables based on a related column.
4. it orders the results in ascending or descending order.

What is the purpose of the UNION operator in SQL?¹¹⁷
1. it combines the results of two or more SELECT statements.
2. it performs a pattern match on a string.
3. it retrieves the maximum value in a column.
4. it filters the rows returned by the SELECT statement.

What is the purpose of the INNER JOIN in SQL?¹¹⁸
1. it retrieves the maximum value in a column.
2. it combines rows from two or more tables based on a related column.
3. it filters the rows returned by the SELECT statement.
4. it performs a pattern match on a string.

What is the purpose of the LEFT JOIN in SQL?¹¹⁹
1. it combines rows from two or more tables based on a related column.
2. it retrieves the maximum value in a column.
3. it filters the rows returned by the SELECT statement.
4. it performs a pattern match on a string.

RIGHT JOIN keeps all the rows in …?¹²⁰
1. the first table.
2. the second table.
3. both tables.
4. neither table

Who is removed in a RIGHT JOIN?¹²¹
1. Mick
2. John
3. Paul
4. Keith

Which variable(s) are removed in a RIGHT JOIN?¹²²
1. name
2. band
3. plays
4. none of them

In SQL, what happens to Mick’s “plays” variables in a FULL JOIN?¹²³
1. Mick is removed
2. guitar
3. bass
4. NA
5. NULL

With SQL, how do you select all the records from a table named “Persons” where the value of the column “FirstName” starts with an “a”?¹²⁴
1. SELECT * FROM Persons WHERE FirstName = ’a.*’
2. SELECT * FROM Persons WHERE FirstName = ’a*’
3. SELECT * FROM Persons WHERE FirstName REGEXP ’a.*’
4. SELECT * FROM Persons WHERE FirstName REGEXP ’a*’
5. SELECT * FROM Persons WHERE FirstName REGEXP ’(?i)a.*’

What is the main way to absolutely recognize a record within a database?¹²⁵
1. Foreign key
2. Primary key
3. Unique key
4. Natural key
5. Alternate key

What does a foreign key do?¹²⁶
1. Directly identifies another table
2. Directly identifies another column
3. Gives access to another entire database
4. Translates the database into another language

Which of these would likely be used as a foreign key between a table on student enrollment and student grades?¹²⁷
1. grades
2. tuition
3. student_name
4. student_hometown

For the student records (for two tables: enrollment and grades), which is the most likely combination?¹²⁸
1. name as primary key to both
2. name as foreign to both
3. name as primary in enrollment and foreign in grades
4. name as foreign in enrollment and primary in grades

Which of the following is the primary function used to create a Shiny app?¹²⁹
1. shinyApp()
2. createApp()
3. runApp()
4. startShinyApp()

Which of the following Shiny components contains the code for handling user inputs and generating outputs?¹³⁰
1. ui
2. server
3. runApp()
4. shinyApp()

Which of the following Shiny UI elements is used to allow users to select a single option from a list of choices?¹³¹

selectInput()
radioButtons()
checkboxGroupInput()
textInput()

In Shiny, what is the purpose of the renderText() function?¹³²

To display a plot as text
To generate text output based on reactive inputs
To create a text input field
To render HTML elements

What does the ui component in a Shiny app represent?¹³³

The logic of the application
The server-side calculations
The user interface elements
The global settings for the app

Which Shiny function is used to handle reactive expressions in the server function?¹³⁴

reactive()
render()
observe()
updateInput()

What is the default output type for renderPlot() in Shiny?¹³⁵
1. plotly chart
2. ggplot2 plot
3. base R plot
4. HTML table

Which of the following is the correct way to create a slider input in Shiny?¹³⁶
1. sliderInput("slider", "Slider", min = 1, max = 100, value = 50)
2. inputSlider("slider", min = 1, max = 100)
3. sliderControl("slider", 1, 100)
4. input_slider("slider", 1, 100)

Footnotes

wherever you are, make sure you are communicating with me when you have questions!↩︎
wherever you are, make sure you are communicating with me when you have questions!↩︎
1. on Gradescope
↩︎
1. pushing the file(s)
↩︎
1. poor assignment operator
↩︎
1. invalid object name
↩︎
1. unmatched quotes
↩︎
1. no mistake
↩︎
1. improper syntax for a function argument
↩︎
1. I mean, the right answer has to be Yes, right!??!
↩︎
no right answer here!↩︎
1. In the local folder which also has the R project. It could be on the Desktop or the Home directory, but it must be in the same place as the R project. Do not upload files to the remote GitHub directory or you will find yourself with two different copies of the files.
↩︎
Yes! All the responses are reasons to make a figure.↩︎
1. Because that graphic displays the message you want as optimally as possible.
↩︎
1. color must be specified outside the aes() function
↩︎
1. dot color is specified as “navy”, line color is specified as wday.
↩︎
1. set the information outside the aes() function
↩︎
answers may vary. I’d say c. putting the work in context. Others might say b. facilitating comparison or d. simplifying the story. However, I don’t think a correct answer is a. making the data stand out.↩︎
1. making the data stand out
↩︎
1. One showed the relevant comparison better.
↩︎
1. It isn’t at the origin. in combination with d. There wasn’t a label explaining why the axes were where they were. The story associated with the average value axes is not clear to the reader.
↩︎
1. babynames in wrong place
↩︎
1. Table c is best because the columns allow us to work with each of the variable separately.
↩︎
1. does something different because it takes the mean() (average) instead of the sum(). The other commands compute the total number of births broken down by year and sex.
↩︎
1. filter()
↩︎
1. (year, name)
↩︎
1. sum(num)
↩︎
running the different code chunks with relevant output.↩︎
1. on Gradescope
↩︎
1. on Canvas
↩︎
1. -country
↩︎
1. year
↩︎
1. gdpval (if possible, good idea to name variables something different from the name of the data frame)
↩︎
1. use pivot_longer() on raw data. The reference to the study is: Gregory Belenky, Nancy J. Wesensten, David R. Thorne, Maria L. Thomas, Helen C. Sing, Daniel P. Redmond, Michael B. Russo and Thomas J. Balkin (2003) Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: a sleep dose-response study. Journal of Sleep Research 12, 1–12.
↩︎
1. Mick
↩︎
1. none of them (the default is to retain all the variables)
↩︎
1. NA (it would be NULL in SQL)
↩︎
1. TRUE TRUE TRUE TRUE
↩︎
1. “one -pple” “two p-ars” “three bananas” (because str_replace() is vectorized)
↩︎
1. “ab” “ifg” Again, str_sub() is vectorized. So the subset of string one is from 1 to 2. The subset of string two is from 3 to 5.
↩︎
I don’t know what the answer is. Ill-defined question.↩︎
1. different output formatting (the first produces 9 the second produces Sep)
↩︎
1. Today is the 39th week of the year.
↩︎
1. Day of month and day of year. (Day of year is often called the “Julian Day”.)
↩︎
1. FALSE
↩︎
neither c. nor e. would match. Inside the bracket “[^u]” matches anything other than a “u”, but it has to match something.↩︎
1. all of the above. Inside a character class | is a normal character and would therefore match “grey” and “gray” and “gr|y”. Which is not what we want, but would work to match both “grey” and “gray”.
↩︎
1. 1 (because \d matches only a single digit).
↩︎
1. 10 (because \d+ matches at least one digit).
↩︎
1. E (because . matches anything, and returns only a single character).
↩︎
1. Episode 2: The pie whisperer. (4 August 2015) (because . matches anything, and with the + it returns multiple characters).
↩︎
1. . (because \. matches the period, .).
↩︎
1. The first includes Jane.
↩︎
1. "(?<=\\$)\\d+"
↩︎
1. "\\w+(?= pie)"
↩︎
1. we return the string at the lookaround
↩︎
1. error (code will fail)
↩︎
1. 9
↩︎
1. 8
↩︎
1. 8
↩︎
1. map_chr(c(1,4,7), addTen) because the output is in quotes, the values are strings, not numbers.
↩︎
1. all of the above. The map() function allows vectors, lists, and data frames as input.
↩︎
1. map(c(1, 4, 7), ~addTen). The ~ acts on functions that do not have their own name or that are defined by function(...). By adding the argument (.x) we’ve expanded the addTen() function, and so it needs a ~. The addTen() function all alone does not use a ~.
↩︎
1. 6 random normals (1 with mean 1, sd 3; 2 with mean 3, sd 1; 3 with mean 47, sd 10)
↩︎
1. There is no object called jan31.
↩︎
1. I haven’t loaded lubridate (which is why it doesn’t recognize that ymd() is not a function).
↩︎
1. There is no error.
↩︎
1. question, yes, no
↩︎
1. “1”, “cat”, “5”, NA, “cat” (Note that the numbers were converted to character strings!)
↩︎
1. on Gradescope
↩︎
1. on Canvas
↩︎
1. makes your results reproducible
↩︎
1. the number of hats that match
↩︎
1. the proportion of hats that match
↩︎
1. whether or not at least one hat matches
↩︎
1. Repeat for many iterations. (The next step needs to gather information on how the FP and FN results hold, it might have just been something odd in my simulation… )
↩︎
all of the above↩︎
It totally depends on your personality and your finances. b. doesn’t make much sense. But a., c., and d. are all very reasonable questions to ask about your investments.↩︎
1. 8
↩︎
1. 0.196 (19.6% of the time)
↩︎
1. 0.063 (6.3% of the time)
↩︎
1. very surprising (prob of 14 or more is 0.0021)
↩︎
1. alternative, one sided (because probably we are studying that it increases their success rate)
↩︎
1. null, two sided (because I have no idea which cheetah might run faster)
↩︎
1. alternative, two sided (because I have no idea whether they’ve increased or decreased)
↩︎
1. null, one sided (because I happen to know that folic acid is thought to prevent facial clefts)
↩︎
1. alternative, one sided (because I happen to know that caffeine is thought to decrease baby’s birth weight)
↩︎
1. A number which (is almost always unknown and) describes a population.
↩︎
1. without replacement (replace = FALSE)
↩︎
1. It depends. a. would be the correct answer if the alternative hypothesis is $\mu_1 - \mu_2 < 0$, b. would be the right answer if the alternative hypothesis is $\mu_1 - \mu_2 > 0$ and c. would be the right answer if the alternative hypothesis is $\mu_1 - \mu_2 \ne 0$.
↩︎
1. Double the area to the left.
↩︎
there can’t possibly be a right answer here.↩︎
1. students (is the top word over the last 500 opinion articles)
↩︎
1. the elements have < >
↩︎
1. use selector: img
↩︎
1. use selector: [href]
↩︎
1. an attribute describes an element
↩︎
1. A relational database management system.
↩︎
1. The first versions were created in the 1970s and called SEQUEL (Structured English QUEry Language). c. SQL came about in particular systems in the 1980s.
↩︎
1. relational database management systems.
↩︎
1. computer’s hard drive / storage
↩︎
1. SQL tbl in storage and R tibble in memory
↩︎
1. SELECT
↩︎
1. SELECT FirstName FROM Persons
↩︎
1. SELECT * FROM Persons
↩︎
1. SELECT COUNT(*) FROM Persons
↩︎
1. SELECT * FROM Persons WHERE FirstName = ‘Peter’ (d. would also work.)
↩︎
1. SELECT * FROM Persons WHERE FirstName = ‘Peter’ AND LastName = ‘Jackson’
↩︎
1. BEWTEEN
↩︎
1. SELECT * FROM Persons WHERE LastName BETWEEN ‘Hansen’ AND ‘Pettersen’
↩︎
1. SELECT DISTINCT
↩︎
1. ORDER BY
↩︎
1. SELECT * FROM Persons ORDER BY FirstName DESC
↩︎
1. TRUE
↩︎
1. … WHERE type = ‘fruit’ AND (color = ‘yellow’ OR color = ‘green’)
↩︎
1. it combines rows from two or more tables based on a related column.
↩︎
1. it combines the results of two or more SELECT statements.
↩︎
1. it combines rows from two or more tables based on a related column.
↩︎
1. it combines rows from two or more tables based on a related column.
↩︎
1. the first table
↩︎
1. Mick
↩︎
1. none of them (all variables are kept in all joins)
↩︎
1. NULL (it would be NA in R)
↩︎
1. SELECT * FROM Persons WHERE FirstName REGEXP ’(?i)a.*’ (n.b., the LIKE function will give you a similar result, with % as a wildcard: SELECT*FROMPersonsWHERE` FirstName LIKE ‘a%’)
↩︎
1. Primary key
↩︎
1. Directly identifies another column
↩︎
1. student_name
↩︎
1. name as primary in enrollment and foreign in grades (the primary key must uniquely identify the records, and name is unlikely to do that in a grades database.)
↩︎
1. shinyApp()
↩︎
1. server
↩︎
1. selectInput()
↩︎
1. To generate text output based on reactive inputs
↩︎
1. The user interface elements
↩︎
1. reactive()
↩︎
1. base R plot
↩︎
1. sliderInput("slider", "Slider", min = 1, max = 100, value = 50)
↩︎

Reuse

CC BY 4.0