Clicker Questions for DS2 - Foundations of Data Science

to go along with

Modern Data Science with R, 3rd edition by Baumer, Kaplan, and Horton

R for Data Science, 2nd edition by Wickham, Çetinkaya-Rundel, and Grolemund

R / R Studio / Quarto¹
1. all good
2. started, progress is slow and steady
3. started, very stuck
4. haven’t started yet
5. what do you mean by “R”?

Git / GitHub²
1. all good
2. started, progress is slow and steady
3. started, very stuck
4. haven’t started yet
5. what do you mean by “Git”?

Where can I get feedback on my HW assignments / quizzes?³
1. prof will return paper versions
2. on Gradescope
3. on Canvas
4. on GitHub

Which of the following includes talking to the remote version of GitHub?⁴
1. changing your name (updating the YAML)
2. committing the file(s)
3. pushing the file(s)
4. some of the above
5. all of the above

What is the error?⁵
1. poor assignment operator
2. unmatched quotes
3. improper syntax for function argument
4. invalid object name
5. no mistake

shup2 <-- "Hello to you!"

What is the error?⁶
1. poor assignment operator
2. unmatched quotes
3. improper syntax for function argument
4. invalid object name
5. no mistake

3shup <-  "Hello to you!"

What is the error?⁷
1. poor assignment operator
2. unmatched quotes
3. improper syntax for function argument
4. invalid object name
5. no mistake

shup4 <-  "Hello to you!

What is the error?⁸
1. poor assignment operator
2. unmatched quotes
3. improper syntax for function argument
4. invalid object name
5. no mistake

shup5 <-  date()

What is the error?⁹
1. poor assignment operator
2. unmatched quotes
3. improper syntax for function argument
4. invalid object name
5. no mistake

shup6 <-  sqrt 10

Do you keep a calendar / schedule / planner?¹⁰
1. Yes
2. No

Do you keep a calendar / schedule / planner? If you answered “Yes” …¹¹
1. Yes, on Google Calendar
2. Yes, on Calendar for macOS
3. Yes, on Outlook for Windows
4. Yes, in some other app
5. Yes, by hand

Where should I put things I’ve created for the HW (e.g., data, .ics file, etc.)¹²
1. Upload into remote GitHub directory
2. In the local folder which also has the R project
3. In my Downloads
4. Somewhere on my Desktop
5. In my Home directory

The goal of making a figure is…¹³
1. To draw attention to your work.
2. To facilitate comparisons.
3. To provide as much information as possible.

A good reason to make a particular choice of a graph is:¹⁴
1. Because the journal / field has particular expectations for how the data are presented.
2. Because some variables naturally fit better on some graphs (e.g., numbers on scatter plots).
3. Because that graphic displays the message you want as optimally as possible.

What are the visual cues on this plot?¹⁵

position
length
shape
area/volume
shade/color

What are the visual cues on this plot?¹⁶

position
length
shape
area/volume
shade/color

What are the visual cues on this plot?¹⁷

position
length
shape
area/volume
shade/color

Why are the points orange?¹⁸
1. R translates “navy” into orange.
2. color must be specified in geom_point()
3. color must be specified outside the aes() function
4. the default plot color is orange

ggplot(data = Births78, 
       aes(x = date, y = births, color = "navy")) + 
  geom_point() +          
  labs(title = "US Births in 1978")

Why are the dots blue and the lines colored?¹⁹
1. dot color is given as “navy”, line color is given as wday.
2. both colors are specified in the ggplot() function.
3. dot coloring takes precedence over line coloring.
4. line coloring takes precedence over dot coloring.

Setting vs. Mapping. If I want information to be passed to all data points (not variable):²⁰
1. map the information inside the aes() function.
2. set the information outside the aes() function

The Snow figure was most successful at:²¹
1. making the data stand out
2. facilitating comparison
3. putting the work in context
4. simplifying the story

The Challenger figure(s) was(were) least successful at:²²
1. making the data stand out
2. facilitating comparison
3. putting the work in context
4. simplifying the story

The biggest difference between Snow and the Challenger was:²³
1. The amount of information portrayed.
2. One was better at displaying cause.
3. One showed the relevant comparison better.
4. One was more artistic.

Caffeine and Calories. What was the biggest concern over the average value axes?²⁴
1. It isn’t at the origin.
2. They should have used all the data possible to find averages.
3. There wasn’t a random sample.
4. There wasn’t a label explaining why the axes were where they were.

Why is there no y designation in the aes() function for the geom_bar() geometry?²⁵
1. It is outside the aes() function.
2. There is a default value for what y should be when not specified.
3. y is specified in ggplot().
4. The job of a bar plot is to count the number of instances.
5. The y variable is the same as the x variable.

What is the difference between fill = children and position = "fill"?²⁶
1. fill = children colors and position = "fill" changes the y-axis
2. fill = children changes the y-axis and position = "fill" colors
3. fill = children goes in the aes and position = "fill" goes outside the aes
4. fill = children goes outside the aes and position = "fill" goes inside the aes
5. fill = children and position = "fill" are two different ways to write the same thing.

What is the difference between geom_bar() and geom_histogram()?²⁷
1. They are the different names for the same function.
2. geom_bar() is for numbers and geom_histogram() is for categorical variables.
3. geom_bar() is for categorical variables and geom_histogram() is for nubmers.
4. geom_bar() produces counts and geom_histogram() produces percentages.
5. geom_bar() produces percentages and geom_histogram() produces counts.

Which data represents the ideal format for ggplot2 and dplyr?²⁸

table a
year	Algeria	Brazil	Columbia
2000	7	12	16
2001	9	14	18

table b
country	Y2000	Y2001
Algeria	7	9
Brazil	12	14
Columbia	16	18

table c
country	year	value
Algeria	2000	7
Algeria	2001	9
Brazil	2000	12
Brazil	2001	14
Columbia	2000	16
Columbia	2001	18

What is wrong with the following code?²⁹
1. should only be one =
2. Bakery should be upper case
3. type should not be in quotes
4. use mutate instead of filter
5. starbucks in wrong place

Result <- |> filter(starbucks,
        type == "bakery")

Each of the statements except one will accomplish the same calculation. Which one does not match?³⁰

#(a) 
starbucks |> 
  group_by(type) |> 
  summarize(average_fat = mean(fat))

#(b) 
group_by(starbucks, type) |> 
  summarize(average_fat = mean(fat))

#(c)
group_by(starbucks, type) |> 
  summarize(average_fat = sum(fat))

#(d)
temp <- group_by(starbucks, type)

summarize(temp, average_fat = mean(fat))

#(e)
summarize(group_by(starbucks, type), 
          average_fat = mean(fat))

Fill in Q1.³¹
1. filter()
2. arrange()
3. select()
4. mutate()
5. group_by()

result <- lego_sample |>
  Q1(!is.na(minifigures)) |> 
  # keep only those with minifigures
  group_by(Q2, Q2) |> 
  summarize(total = Q3)

Fill in Q2.³²
1. (theme, price)
2. (theme, year)
3. (year, price)
4. (pieces, year)
5. (pieces, price)

result <- lego_sample |>
  Q1(!is.na(minifigures)) |> 
  group_by(Q2, Q2) |> 
  # for each theme and year
  summarize(total = Q3)

Fill in Q3.³³
1. n_distinct(pieces)
2. n_distinct(price)
3. sum(pieces)
4. sum(pages)
5. mean(pieces)

result <- lego_sample |>
  Q1(!is.na(minifigures)) |> 
  group_by(Q2, Q2) |> 
  summarize(ave_pieces = Q3)
  # average number of pieces (each theme, each year)

Running the code.³⁴

library(openintro)
lego_sample |>
  filter(!is.na(minifigures)) |> 
  # keep only those with minifigures
  group_by(theme, year) |> 
  # for each theme for each year
  summarize(ave_pieces = mean(pieces))

# A tibble: 9 × 3
# Groups:   theme [3]
  theme    year ave_pieces
  <chr>   <dbl>      <dbl>
1 City     2018      189. 
2 City     2019      257. 
3 City     2020      349  
4 DUPLO®   2018       50.5
5 DUPLO®   2019       32.5
6 DUPLO®   2020       45.8
7 Friends  2018      354. 
8 Friends  2019      259. 
9 Friends  2020      250.

#(a)
starbucks |> 
  group_by(type) |> 
  summarize(average_fat = mean(fat))

# A tibble: 7 × 2
  type          average_fat
  <fct>               <dbl>
1 bakery              14.6 
2 bistro box          18.4 
3 hot breakfast       13.7 
4 parfait              6.5 
5 petite               9.33
6 salad                0   
7 sandwich            14.7

#(b) 
group_by(starbucks, type) |> 
  summarize(average_fat = mean(fat))

# A tibble: 7 × 2
  type          average_fat
  <fct>               <dbl>
1 bakery              14.6 
2 bistro box          18.4 
3 hot breakfast       13.7 
4 parfait              6.5 
5 petite               9.33
6 salad                0   
7 sandwich            14.7

#(c)
group_by(starbucks, type) |> 
  summarize(average_fat = sum(fat))

# A tibble: 7 × 2
  type          average_fat
  <fct>               <dbl>
1 bakery              597  
2 bistro box          147  
3 hot breakfast       110. 
4 parfait              19.5
5 petite               84  
6 salad                 0  
7 sandwich            103

#(d)
temp <- group_by(starbucks, type)

summarize(temp, average_fat = mean(fat))

# A tibble: 7 × 2
  type          average_fat
  <fct>               <dbl>
1 bakery              14.6 
2 bistro box          18.4 
3 hot breakfast       13.7 
4 parfait              6.5 
5 petite               9.33
6 salad                0   
7 sandwich            14.7

#(e)
summarize(group_by(starbucks, type), 
          average_fat = mean(fat))

# A tibble: 7 × 2
  type          average_fat
  <fct>               <dbl>
1 bakery              14.6 
2 bistro box          18.4 
3 hot breakfast       13.7 
4 parfait              6.5 
5 petite               9.33
6 salad                0   
7 sandwich            14.7

Where can I get feedback on my HW assignments / quizzes?³⁵
1. prof will return paper versions
2. on Gradescope
3. on Canvas
4. on GitHub

Where can I get feedback on my projects?³⁶
1. prof will return paper versions
2. on Gradescope
3. on Canvas
4. on GitHub

Fill in Q1.³⁷
1. gdp
2. year
3. gdpval
4. country
5. –country

GDP |>  
  select(country = starts_with("Income"), everything()) |> 
       pivot_longer(cols = Q1, 
                    names_to = Q2, 
                    values_to = Q3)

Fill in Q2.³⁸
1. gdp
2. year
3. gdpval
4. country
5. –country

GDP |>  
  select(country = starts_with("Income"), everything()) |> 
       pivot_longer(cols = Q1, 
                    names_to = Q2, 
                    values_to = Q3)

Fill in Q3.³⁹
1. gdp
2. year
3. gdpval
4. country
5. –country

GDP |>  
  select(country = starts_with("Income"), everything()) |> 
       pivot_longer(cols = Q1, 
                    names_to = Q2, 
                    values_to = Q3)

You’d like to use the data to make a plot with Midterm score on the x-axis and Final score on the y-axis using the following ggplot() code. Which data frame should you use?⁴⁰
1. use raw data
2. use pivot_wider() on raw data
3. use pivot_longer() on raw data

ggplot(___, aes(x = ___, y = ___, color = ___)) + 
  geom_point()

# A tibble: 4 × 3
  student test    score
  <chr>   <chr>   <dbl>
1 Alice   Midterm    85
2 Alice   Final      90
3 Bob     Midterm    78
4 Bob     Final      82

grades |> 
  pivot_wider(names_from = test, values_from = score)

# A tibble: 2 × 3
  student Midterm Final
  <chr>     <dbl> <dbl>
1 Alice        85    90
2 Bob          78    82

grades |> 
  pivot_wider(names_from = test, values_from = score) |> 
  ggplot(aes(x = Midterm, y = Final, color = student)) + 
  geom_point()

Response to stimulus (in ms) after only 3 hrs of sleep for 9 days. You want to make a plot with the subject’s reaction time (y-axis) vs the number of days of sleep restriction (x-axis) using the following ggplot() code. Which data frame should you use?⁴¹
1. use raw data
2. use pivot_wider() on raw data
3. use pivot_longer() on raw data

ggplot(___, aes(x = ___, y = ___, color = ___)) + 
  geom_line()

# A tibble: 18 × 11
   Subject day_0 day_1 day_2 day_3 day_4 day_5 day_6 day_7 day_8 day_9
     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     308  250.  259.  251.  321.  357.  415.  382.  290.  431.  466.
 2     309  223.  205.  203.  205.  208.  216.  214.  218.  224.  237.
 3     310  199.  194.  234.  233.  229.  220.  235.  256.  261.  248.
 4     330  322.  300.  284.  285.  286.  298.  280.  318.  305.  354.
 5     331  288.  285   302.  320.  316.  293.  290.  335.  294.  372.
 6     332  235.  243.  273.  310.  317.  310   454.  347.  330.  254.
 7     333  284.  290.  277.  300.  297.  338.  332.  349.  333.  362.
 8     334  265.  276.  243.  255.  279.  284.  306.  332.  336.  377.
 9     335  242.  274.  254.  271.  251.  255.  245.  235.  236.  237.
10     337  312.  314.  292.  346.  366.  392.  404.  417.  456.  459.
11     349  236.  230.  239.  255.  251.  270.  282.  308.  336.  352.
12     350  256.  243.  256.  256.  269.  330.  379.  363.  394.  389.
13     351  251.  300.  270.  281.  272.  305.  288.  267.  322.  348.
14     352  222.  298.  327.  347.  349.  353.  354.  360.  376.  389.
15     369  272.  268.  257.  278.  315.  317.  298.  348.  340.  367.
16     370  225.  235.  239.  240.  268.  344.  281.  348.  365.  372.
17     371  270.  272.  278.  282.  279.  285.  259.  305.  351.  369.
18     372  269.  273.  298.  311.  287.  330.  334.  343.  369.  364.

sleep_long <- sleep_wide |>
  pivot_longer(cols = -Subject,
               names_to = "day",
               names_prefix = "day_",
               values_to = "reaction_time")

sleep_long

# A tibble: 180 × 3
   Subject day   reaction_time
     <dbl> <chr>         <dbl>
 1     308 0              250.
 2     308 1              259.
 3     308 2              251.
 4     308 3              321.
 5     308 4              357.
 6     308 5              415.
 7     308 6              382.
 8     308 7              290.
 9     308 8              431.
10     308 9              466.
# ℹ 170 more rows

sleep_wide |>
  pivot_longer(cols = -Subject,
               names_to = "day",
               names_prefix = "day_",
               values_to = "reaction_time") |>
  ggplot(aes(x = day, y = reaction_time, color = as.factor(Subject), group = as.factor(Subject))) + 
  geom_line()

Consider band members from the Beatles and the Rolling Stones. Who is removed in a right_join()?⁴²

Mick
John
Paul
Keith
Impossible to know

band_members |> 
  right_join(band_instruments, by = "name")

Consider band members from the Beatles and the Rolling Stones. Which variables are removed in a right_join()?⁴³

name
band
plays
none of them

band_members

# A tibble: 3 × 2
  name  band   
  <chr> <chr>  
1 Mick  Stones 
2 John  Beatles
3 Paul  Beatles

band_instruments

# A tibble: 3 × 2
  name  plays 
  <chr> <chr> 
1 John  guitar
2 Paul  bass  
3 Keith guitar

band_members |> 
  right_join(band_instruments, by = "name")

What happens to Mick’s plays variable in a full_join()?⁴⁴

Mick is removed
changes to guitar
changes to bass
NA
NULL

band_members

# A tibble: 3 × 2
  name  band   
  <chr> <chr>  
1 Mick  Stones 
2 John  Beatles
3 Paul  Beatles

band_instruments

# A tibble: 3 × 2
  name  plays 
  <chr> <chr> 
1 John  guitar
2 Paul  bass  
3 Keith guitar

band_members |> 
  full_join(band_instruments, by = "name")

Students on the roster but not in any class.⁴⁵
1. roster |> inner_join(classes, by = “student_id”) |> filter(major != subject)`
2. classes |> anti_join(roster, by = "student_id")
3. roster |> anti_join(classes, by = "student_id")
4. roster |> full_join(classes, by = "student_id")
5. roster |> semi_join(classes, by = "student_id")

roster

# A tibble: 5 × 3
  student_id name  major  
       <dbl> <chr> <chr>  
1          1 Alice Math   
2          2 Ben   CS     
3          3 Carla History
4          4 David CS     
5          5 Eva   Math

classes

# A tibble: 5 × 3
  student_id class       subject  
       <dbl> <chr>       <chr>    
1          1 Calc I      Math     
2          2 Intro CS    CS       
3          2 Data Struct CS       
4          4 Intro CS    CS       
5          6 Chemistry   Chemistry

All students on the roster and all enrollments.⁴⁶
1. roster |> inner_join(classes, by = “student_id”) |> filter(major != subject)`
2. classes |> anti_join(roster, by = "student_id")
3. roster |> anti_join(classes, by = "student_id")
4. roster |> full_join(classes, by = "student_id")
5. roster |> semi_join(classes, by = "student_id")

roster

# A tibble: 5 × 3
  student_id name  major  
       <dbl> <chr> <chr>  
1          1 Alice Math   
2          2 Ben   CS     
3          3 Carla History
4          4 David CS     
5          5 Eva   Math

classes

# A tibble: 5 × 3
  student_id class       subject  
       <dbl> <chr>       <chr>    
1          1 Calc I      Math     
2          2 Intro CS    CS       
3          2 Data Struct CS       
4          4 Intro CS    CS       
5          6 Chemistry   Chemistry

Students from the roster who are enrolled in at least one class.⁴⁷
1. roster |> inner_join(classes, by = “student_id”) |> filter(major != subject)`
2. classes |> anti_join(roster, by = "student_id")
3. roster |> anti_join(classes, by = "student_id")
4. roster |> full_join(classes, by = "student_id")
5. roster |> semi_join(classes, by = "student_id")

roster

# A tibble: 5 × 3
  student_id name  major  
       <dbl> <chr> <chr>  
1          1 Alice Math   
2          2 Ben   CS     
3          3 Carla History
4          4 David CS     
5          5 Eva   Math

classes

# A tibble: 5 × 3
  student_id class       subject  
       <dbl> <chr>       <chr>    
1          1 Calc I      Math     
2          2 Intro CS    CS       
3          2 Data Struct CS       
4          4 Intro CS    CS       
5          6 Chemistry   Chemistry

Students in a class but not on the roster.⁴⁸
1. roster |> inner_join(classes, by = “student_id”) |> filter(major != subject)`
2. classes |> anti_join(roster, by = "student_id")
3. roster |> anti_join(classes, by = "student_id")
4. roster |> full_join(classes, by = "student_id")
5. roster |> semi_join(classes, by = "student_id")

roster

# A tibble: 5 × 3
  student_id name  major  
       <dbl> <chr> <chr>  
1          1 Alice Math   
2          2 Ben   CS     
3          3 Carla History
4          4 David CS     
5          5 Eva   Math

classes

# A tibble: 5 × 3
  student_id class       subject  
       <dbl> <chr>       <chr>    
1          1 Calc I      Math     
2          2 Intro CS    CS       
3          2 Data Struct CS       
4          4 Intro CS    CS       
5          6 Chemistry   Chemistry

Students on the roster taking at least one class outside their major.⁴⁹
1. roster |> inner_join(classes, by = “student_id”) |> filter(major != subject)`
2. classes |> anti_join(roster, by = "student_id")
3. roster |> anti_join(classes, by = "student_id")
4. roster |> full_join(classes, by = "student_id")
5. roster |> semi_join(classes, by = "student_id")

roster

# A tibble: 5 × 3
  student_id name  major  
       <dbl> <chr> <chr>  
1          1 Alice Math   
2          2 Ben   CS     
3          3 Carla History
4          4 David CS     
5          5 Eva   Math

classes

# A tibble: 5 × 3
  student_id class       subject  
       <dbl> <chr>       <chr>    
1          1 Calc I      Math     
2          2 Intro CS    CS       
3          2 Data Struct CS       
4          4 Intro CS    CS       
5          6 Chemistry   Chemistry

What is the output of the following R code?⁵⁰

“a 1” “b 2” “c 3”
“a, 1” “b, 2” “c, 3”
“a1” “b2” “c3”
“a b c” “1 2 3”
“abc” “123”

str_c(letters = c("a", "b", "c"),
      numbers = c(1, 2, 3))

What is the output of the following R code?⁵¹

“a 1” “b 2” “c 3”
“a, 1” “b, 2” “c, 3”
“a1” “b2” “c3”
“a b c” “1 2 3”
“abc” “123”

str_c(letters = c("a", "b", "c"),
      numbers = c(1, 2, 3), sep = " ")

What is the output of the following R code?⁵²
1. “abc” “hifg”
2. “ab” “hifg”
3. “ab” “ifg”
4. “abc” “ifg”

x <- c("abcde", "ghifgh")
str_sub(x, start = c(1, 3), end = c(2, 5))

What is the output of the following R code?⁵³

“one -pple” “two p-ars” “three bananas”
“on- -ppl-” “two p–rs” “thr– b-n-n-s”
“on- apple” “two p-ars” “thr-e bananas”

fruits <- c("one apple", "two pears", "three bananas")
str_replace(fruits, pattern = c("a", "e", "i"), replacement = "-")

What is the output of the following R code?⁵⁴

TRUE
TRUE TRUE TRUE TRUE
TRUE FALSE FALSE FALSE
FALSE

fruit <- c("apple", "banana", "pear", "pineapple")
str_detect(fruit, pattern = "a")

If unspecified, the levels of a factor variable will be ordered:⁵⁵
1. in the order that they first show up in the dataset
2. from shortest to longest in terms of characters
3. from longest to shortest in terms of characters
4. alphabetically, from a to z
5. alphabetically, from z to a

What does this code do?⁵⁶
1. New variable which is the average of the calories
2. New variable which is the average of the type
3. Changes the values of type
4. Changes the levels of type
5. Changes the order of the levels of type

starbucks |> 
  mutate(type = fct_reorder(type, calories, .fun = "mean", .desc = TRUE))

What does fct_recode() do here?⁵⁷
1. Creates a new variable
2. Changes the values of x
3. Changes the levels of x
4. Changes the order of the levels of x
5. Some of the above
6. All of the above

x <- factor(c("apple", "bear", "dear", "banana"))
x
fct_recode(x, fruit = "apple", fruit = "banana")

What is January 31 + one month?⁵⁸
1. February 31
2. March 3
3. February 28 (assuming no leap year)
4. I don’t want to answer the question

What is the difference between code lines 3 and 4 below?⁵⁹
1. same thing
2. different months
3. different output formatting
4. different input
5. different calculation

library(lubridate)
today <- now()
month(today)
month(today, label = TRUE)

What does this number mean?⁶⁰
1. Today is the 47th day of the month.
2. Today is the 47th day of the year.
3. Today is the 47th week of the month.
4. Today is the 47th week of the year.

today <- now()
week(today)

[1] 47

What is the difference in these two functions?⁶¹

Day of month and day of year.
Day of month and day of week.
Day of week and day of year.
Day of weekend and day of month.

mday(today)

[1] 24

yday(today)

[1] 328

What is the result of the code?⁶²
1. TRUE
2. FALSE
3. “2025-09-01”
4. “2025-02-19”

today() > ymd("2025-09-01")

str_subset(very.large.word.list, "q[^u]") would not match which of the following?⁶³
1. Iraqi
2. Iraqian
3. Iraq
4. zaqqun (tree that “springs out of the bottom of Hell”, in the Quran)
5. Qantas (the Australian airline)

Which of the following regex would match to both “grey” and “gray”?⁶⁴
1. “gr[ae]y”
2. “gr(a|e)y”
3. “gray | grey”
4. “gr[a|e]y”
5. some / all of the above – which ones?

What will the result be for the following code?⁶⁵
1. 10
2. 1
3. 0
4. NA

str_extract("My dog is 10 years old", "\\d")

What will the result be for the following code?⁶⁶
1. 10
2. 1
3. 0
4. NA

str_extract("My dog is 10 years old", "\\d+")

What will the result be for the following code?⁶⁷
1. .
2. Episode 2: The pie whisperer. (4 August 2015)
3. Episode
4. E

str_extract("Episode 2: The pie whisperer. (4 August 2015)", ".")

What will the result be for the following code?⁶⁸
1. .
2. Episode 2: The pie whisperer. (4 August 2015)
3. Episode
4. E

str_extract("Episode 2: The pie whisperer. (4 August 2015)", ".+")

What will the result be for the following code?⁶⁹
1. .
2. Episode 2: The pie whisperer. (4 August 2015)
3. Episode
4. E

str_extract("Episode 2: The pie whisperer. (4 August 2015)", "\\.")

What is the difference between the output for the two regular expressions below?⁷⁰
1. They give the same result.
2. The first is not case sensitive.
3. The second allow for all the variants.
4. The first includes Jane.

string <- c("Mary", "Mar", "Janet", "jane", "Susan", "Sue")
str_extract(string, "\\bMary|Jane|Sue\\b")
str_extract(string, "\\b(Mary|Jane|Sue)\\b")

How can I pull out just the numerical information in “$47”?⁷¹
1. "(?<=\\$)\\d"
2. "(?<=\\$)\\d+"
3. "\\d(?=\\$)"
4. "\\d+(?=\\$)"

You want to know all the types of pies in the text strings. They are written as, for example “apple pie”.⁷²
1. "\\w+(?!pie)"
2. "\\w+(?! pie)"
3. "\\w+(?=pie)"
4. "\\w+(?= pie)"

str_extract(c("apple pie", "chocolate pie", "peach pie"), "\\w+(?= pie)")

[1] "apple"     "chocolate" "peach"

str_extract(c("apple pie", "chocolate pie", "peach pie"), "\\w+(?=pie)")

[1] NA NA NA

We say that lookarounds are “zero-lenghth assertions”. What does that mean?⁷³
1. we return the string in the lookaround
2. we replace the string in the lookaround
3. we return the string at the lookaround
4. we replace the string at the lookaround

What will happen when I run the following code?⁷⁴
1. 0
2. 3
3. 9
4. NA
5. error (code will fail)

my_power <- function(x, y){
  return(x^y)
}
my_power(3)

What will happen when I run the following code?⁷⁵
1. 0
2. 3
3. 9
4. NA
5. error (code will fail)

my_power <- function(x, y = 2){
  return(x^y)
}
my_power(3)

What will happen when I run the following code?⁷⁶
1. 4
2. 8
3. 9
4. NA
5. error (code will fail)

my_power <- function(x, y = 2){
  return(x^y)
}
my_power(2, 3)

What will happen when I run the following code?⁷⁷
1. 4
2. 8
3. 9
4. NA
5. error (code will fail)

my_power <- function(x = 2, y = 3){
  return(x^y)
}
my_power( )

Consider the addTen() function. The following output is a result of which map_*() call?⁷⁸

map(c(1,4,7), addTen)
map_dbl(c(1,4,7), addTen)
map_chr(c(1,4,7), addTen)
map_lgl(c(1,4,7), addTen)

addTen <- function(wow) {
  return(wow + 10)
}

[1] "11.000000" "14.000000" "17.000000"

Which of the following input is allowed?⁷⁹
1. map(c(1, 4, 7), addTen)
2. map(list(1, 4, 7), addTen)
3. map(data.frame(a=1, b=4, c=7), addTen)
4. some of the above
5. all of the above

Which of the following produces a different output?⁸⁰
1. map(c(1, 4, 7), addTen)
2. map(c(1, 4, 7), ~addTen(.x))
3. map(c(1, 4, 7), ~addTen)
4. map(c(1, 4, 7), function(hi) (hi + 10))
5. map(c(1, 4, 7), ~(.x + 10))

What will the following code output?⁸¹
1. 3 random normals
2. 6 random normals
3. 18 random normals

input

# A tibble: 3 × 3
      n  mean    sd
  <dbl> <dbl> <dbl>
1     1     1     3
2     2     3     1
3     3    47    10

input |> 
  pmap(rnorm)

What is the following error telling me?⁸²

I haven’t loaded lubridate.
I can’t add months and days.
There is no object called jan31.
months() is not a function.
There is no error

jan31 + months(0:11) + days(31)
#> Error in eval(expr, envir, enclos): object 'jan31' not found

What is the following error telling me?⁸³

I haven’t loaded lubridate.
I can’t add months and days.
There is no object called jan31.
ymd() is not a function.
There is no error.

  jan31 <- ymd("2021-01-31")
#> Error in ymd("2021-01-31"): could not find function "ymd"
  jan31 + months(0:11) + days(31)
#> Error in eval(expr, envir, enclos): object 'jan31' not found

What is the following error telling me?⁸⁴

I haven’t loaded lubridate.
I can’t add months and days.
There is no object called jan31.
ymd() is not a function.
There is no error.

  library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
  jan31 <- ymd("2021-01-31")
  jan31 + months(0:11) + days(31)
#>  [1] "2021-03-03" NA           "2021-05-01" NA           "2021-07-01"
#>  [6] NA           "2021-08-31" "2021-10-01" NA           "2021-12-01"
#> [11] NA           "2022-01-31"

Rank these in order of bad to best.⁸⁵
1. 1, 2, 3
2. 1, 3, 2
3. 2, 1, 3
4. 3, 2, 1
5. 3, 1, 2

# 1
day_one
day_1

# 2
DayOne
dayone

# 3
T <- FALSE
c <- 10
mean <- function(x) sum(x)

Which is better?⁸⁶
1. 1
2. 2

# 1
mean(x, na.rm = TRUE)

# 2
mean (x, na.rm = TRUE)
mean( x, na.rm = TRUE )

Which is better?⁸⁷
1. 1
2. 2

# 1
height<-feet*12+inches
mean(x, na.rm=TRUE)

# 2
height <- (feet * 12) + inches
mean(x, na.rm = TRUE)

Which is better?⁸⁸
1. 1
2. 2

# 1
do_something_very_complicated(
  something = "that",
  requires = many,
  arguments = "some of which may be long"
)

# 2
do_something_very_complicated("that", requires, many, arguments,
                              "some of which may be long"
                              )

Which is better?⁸⁹
1. 1
2. 2

# 1
iris |>
  summarise(Sepal.Length = mean(Sepal.Length), Sepal.Width = mean(Sepal.Width), .by = Species)

# 2
iris |>
  summarise(
    Sepal.Length = mean(Sepal.Length),
    Sepal.Width = mean(Sepal.Width),
    .by = Species
  )

Which one do you like best?⁹⁰
1. 1
2. 2
3. 3

# 1
x |>
  semi_join(y |> filter(is_valid))

# 2
x |>
  select(a, b, w) |>
  left_join(y |> select(a, b, v), join_by(a, b))

# 3
x_join <- x |> select(a, b, w)
y_join <- y |> select(a, b, v)
left_join(x_join, y_join, join_by(a, b))

In R the ifelse() function takes the arguments:⁹¹

question, yes, no
question, no, yes
statement, yes, no
statement, no, yes
option1, option2, option3

What is the output of the following:⁹²
1. “cat”, 30, “cat”, “cat”, 6
2. “cat”, “30”, “cat”, “cat”, “6”
3. 1, “cat”, 5, “cat”, “cat”
4. 1, “cat”, 5, NA, “cat”
5. “1”, “cat”, “5”, NA, “cat”

data <- c(1, 30, 5, NA, 6)

ifelse(data > 5, "cat", data)

Where can I get feedback on my HW assignments / quizzes?⁹³
1. prof will return paper versions
2. on Gradescope
3. on Canvas
4. on GitHub

Where can I get feedback on my projects?⁹⁴
1. prof will return paper versions
2. on Gradescope
3. on Canvas
4. on GitHub

In R, the set.seed() function⁹⁵

makes your computations go faster
keeps track of your computation time
provides an important parameter
repeats the function
makes your results reproducible

What could the following code give us?⁹⁶

[1] “a” “b” “c” “d” “e” “f” “g” “h” “i” “j”
[1] “i” “b” “g” “d” “a”
[1] “j” “g” “f” “i” “f”
[1] “f” “h” “i” “e” “g” “d” “c” “j” “b” “a”
[1] “e” “j” “e” “b” “e” “c” “f” “a” “e” “a”

# shuffle
set.seed(47)
alph <- letters[1:10]
sample(alph, 10, replace = FALSE)

What could the following code give us?⁹⁷

[1] “a” “b” “c” “d” “e” “f” “g” “h” “i” “j”
[1] “i” “b” “g” “d” “a”
[1] “j” “g” “f” “i” “f”
[1] “f” “h” “i” “e” “g” “d” “c” “j” “b” “a”
[1] “e” “j” “e” “b” “e” “c” “f” “a” “e” “a”

# resample
set.seed(47)
alph <- letters[1:10]
sample(alph, 10, replace = TRUE)

What could the following code give us?⁹⁸

[1] “a” “b” “c” “d” “e” “f” “g” “h” “i” “j”
[1] “i” “b” “g” “d” “a”
[1] “j” “g” “f” “i” “f”
[1] “f” “h” “i” “e” “g” “d” “c” “j” “b” “a”
[1] “e” “j” “e” “b” “e” “c” “f” “a” “e” “a”

# sample from an infinite population
set.seed(47)
alph <- letters[1:10]
sample(alph, 10, replace = TRUE)

What could the following code give us?⁹⁹

[1] “a” “b” “c” “d” “e” “f” “g” “h” “i” “j”
[1] “i” “b” “g” “d” “a”
[1] “j” “g” “f” “i” “f”
[1] “f” “h” “i” “e” “g” “d” “c” “j” “b” “a”
[1] “e” “j” “e” “b” “e” “c” “f” “a” “e” “a”

# sample from  finite population
set.seed(47)
alph <- letters[1:10]
sample(alph, 10, replace = FALSE)

What does the following give us?¹⁰⁰

the number of hats that match
the number of hats that don’t match
the proportion of hats that match
the proportion of hats that don’t match
whether or not at least one hat matches

sum(hats == random_hats)

[1] 2

What does the following give us?¹⁰¹

the number of hats that match
the number of hats that don’t match
the proportion of hats that match
the proportion of hats that don’t match
whether or not at least one hat matches

mean(hats == random_hats)

[1] 0.2

What does the following give us?¹⁰²

the number of hats that match
the number of hats that don’t match
the proportion of hats that match
the proportion of hats that don’t match
whether or not at least one hat matches

sum(hats == random_hats) > 0

[1] TRUE

What is the magic number?¹⁰³

10
10, the number of overall hats
10, the number of hats we select
0
0, the number of matching hats

hats <- c(1:10)
random_hats <- sample(hats, size = 10, replace = FALSE)
sum(hats == random_hats) > 0

In the SAT example, we ran a single iteration and found that the false positive and false negative rates were problematic. What should we do next?¹⁰⁴

Repeat for many iterations.
Change the initial settings.
Bring this analysis to the people with power.
Always use two models.
Always use only one model.

In the SAT example, what types of things might we vary?¹⁰⁵

proportion to red vs blue
how variable the values are: N(talent, 15)
different number of times blues get to take the test
how close grades and SAT are to talent (bias?)

What would you want to know from the investment allocation plots?¹⁰⁶
1. What is the average rate of return?
2. What is the maximum rate of return?
3. What is the minimum rate of return?
4. How often do I lose money?

If 16 infants with no genuine preference choose 16 toys, what is the most likely number of “helping” toys that will be chosen?¹⁰⁷

How likely is it that exactly 8 helpers will be chosen (if there is no preference)?¹⁰⁸

0-15%
16-30%
31-49%
50%
51-100%

What if we flipped a coin 160 times? What percent of the time will the simulation flip exactly 80 heads?¹⁰⁹

0-15%
16-30%
31-49%
50%
51-100%

Is our actual result of 14 (under the coin model)…¹¹⁰

very surprising?
somewhat surprising?
not very surprising?

Hypothesis: the number of hours that grade-school children spend doing homework predicts their future success on standardized tests.¹¹¹
1. null, one sided
2. null, two sided
3. alternative, one sided
4. alternative, two sided

Hypothesis: king cheetahs on average run the same speed as standard spotted cheetahs.¹¹²
1. null, one sided
2. null, two sided
3. alternative, one sided
4. alternative, two sided

Hypothesis: the mean length of African elephant tusks has changed over the last 100 years.¹¹³
1. null, one sided
2. null, two sided
3. alternative, one sided
4. alternative, two sided

Hypothesis: the risk of facial clefts is equal for babies born to mothers who take folic acid supplements compared with those from mothers who do not.¹¹⁴
1. null, one sided
2. null, two sided
3. alternative, one sided
4. alternative, two sided

Hypothesis: caffeine intake during pregnancy affects mean birth weight.¹¹⁵
1. null, one sided
2. null, two sided
3. alternative, one sided
4. alternative, two sided

In this class, the word parameter means:¹¹⁶
1. The values in a model
2. Numbers that need to be tuned
3. A number which is calculated from a sample of data.
4. A number which (is almost always unknown and) describes a population.

What does a sampling distribution describe?¹¹⁷
1. The data
2. The statistic
3. The parameter

In hypothesis testing, why do we need a null sampling distribution?¹¹⁸
1. To understand the variability of the data
2. To understand the variability of the statistic
3. To understand the variability of the statistic when $H_0$ is true.
4. To understand the variability of the statistic when $H_0$ is false.
5. To understand the variability of the parameter.

To run a two-sample permutation test, should you permute the variable with or without replacement?¹¹⁹
1. with replacement (replace = TRUE)
2. without replacement (replace = FALSE)

The histogram is a null sampling distribution for the difference in two means. The red line is the observed value from the data. To compute the p-value, which area should be considered?¹²⁰

The area to the left.
The area to the right.
Double the area to the left.
It depends.

The histogram is a null sampling distribution for the difference in two means. The red line is the observed value from the data. The alternative hypothesis is $H_A: \mu_1 - \mu_2 \ne 0$. To compute the p-value, which area should be considered?¹²¹

The area to the left.
The area to the right.
Double the area to the left.
It depends.

What is misleading here?¹²²
1. Very low data-to-ink ratio?¹²³
2. Non-intuitive sorting?
3. Wrong / changing scales?
4. 1-dim information in 2-D or 3-d?
5. Inconsistent labeling

Reproduction of a data graphic reporting the number of gun deaths in Florida over time. The original image was published by Reuters. [@MDSR]

What is misleading here?¹²⁴
1. Very low data-to-ink ratio?¹²⁵
2. Non-intuitive sorting?
3. Wrong / changing scales?
4. 1-dim information in 2-D or 3-d?
5. Inconsistent labeling

A tweet by *National Review* on December 14, 2015 showing the change in global temperature over time. [@MDSR]

What is misleading here?¹²⁶
1. Very low data-to-ink ratio?¹²⁷
2. Non-intuitive sorting?
3. Wrong / changing scales?
4. 1-dim information in 2-D or 3-d?
5. Inconsistent labeling

May 10, 2020, Georgia Department of Health, COVID-19 cases for 5 counties across time. https://dph.georgia.gov/covid-19-daily-status-report

What is misleading here?¹²⁸
1. Very low data-to-ink ratio?¹²⁹
2. Non-intuitive sorting?
3. Wrong / changing scales?
4. 1-dim information in 2-D or 3-d?
5. Inconsistent labeling

July 2, 2020, Georgia Department of Health, COVID-19 cases per 100K

July 17, 2020, https://dph.georgia.gov/covid-19-daily-status-report

How often do you read “The Student Life”?¹³⁰
1. Every day
2. 3-5 times a week
3. Once a week
4. Rarely
5. What is “The Student Life”?

What do you think is the most common word in the titles of the Student Life opinion articles?¹³¹
1. stop
2. health
3. Pomona
4. CMC
5. students

How can you tell the difference between an element and an attribute (when looking at HTML code)?¹³²
1. the elements have .
2. the elements have #
3. the elements have < >
4. the elements have [ ]

How do I find (using html_elements()) all the instances of the <img> (image) element?¹³³
1. use selector: <img>
2. use selector: .img
3. use selector: #img
4. use selector: [img]
5. use selector: img

How do I find (using html_elements()) all the instances of the href= (URL) attribute?¹³⁴
1. use selector: <href>
2. use selector: .href
3. use selector: #href
4. use selector: [href]
5. use selector: href

What is the difference between an attribute and an element?¹³⁵

an attribute describes an element
an element describes an attribute
an attribute is the parent of an element
an element is the parent of an attribute

What is a SQL server?¹³⁶

A relational database management system.
A software program whose main purpose is to store and retrieve data.
A highly secure server that does not allow any database file manipulation during execution.
All of the above.

When was SQL created?¹³⁷
1. 1960s
2. 1970s
3. 1980s
4. 1990s
5. 2000s

What type of databases is SQL designed for?¹³⁸

hierarchical database management systems.
network database management systems.
object-oriented database management systems.
relational database management systems.

Which is bigger:¹³⁹
1. computer’s hard drive / storage
2. computer’s memory / RAM

Where are each stored?¹⁴⁰

SQL tbl and R tibble both in storage
SQL tbl and R tibble both in memory
SQL tbl in storage and R tibble in memory
SQL tbl in memory and R tibble in storage

Which SQL keyword is used to extract data from a database?¹⁴¹

OPEN
EXTRACT
SELECT
GET

With SQL, how to you retrieve a column named “FirstName” from a table named “Persons”?¹⁴²

SELECT Persons.FirstName
EXTRACT FIRSTNAME FROM Persons
SELECT FirstName FROM Persons
SELECT “FirstName” FROM “Persons”

With SQL, how do you select all the columns from a table named “Persons”?¹⁴³

SELECT Persons
SELECT * FROM Persons
SELECT [all] FROM Persons
SELECT *.Persons

With SQL, how can you return the number of records in the “Persons” table?¹⁴⁴

SELECT COLUMNS(*) FROM Persons
SELECT COUNT(*) FROM Persons
SELECT NO(*) FROM Persons
SELECT LEN(*) FROM Persons

With SQL, how do you select all the records from a table named “Persons” where the value of the column “FirstName” is “Peter”?¹⁴⁵

SELECT * FROM Persons WHERE FirstName <> ‘Peter’
SELECT * FROM Persons WHERE FirstName = ‘Peter’
SELECT * FROM Persons WHERE FirstName == ‘Peter’
SELECT * FROM Persons WHERE FirstName LIKE ‘Peter’
SELECT [all] FROM Persons WHERE FirstName = ‘Peter’

With SQL, how do you select all the records from a table named “Persons” where the “FirstName” is “Peter” and the “LastName” is “Jackson”?¹⁴⁶

SELECT FirstName = ‘Peter’, LastName = ‘Jackson’ FROM Persons
SELECT * FROM Persons WHERE FirstName = ‘Peter’ & LastName = ‘Jackson’
SELECT * FROM Persons WHERE FirstName = ‘Peter’ AND LastName = ‘Jackson’
SELECT * FROM Persons WHERE FirstName = ‘Peter’ | LastName = ‘Jackson’

Which keyword selects values within a range?¹⁴⁷
1. BEWTEEN
2. WITHIN
3. RANGE

With SQL, how do you select all the records from a table named “Persons” where the “LastName” is alphabetically between (and including) “Hansen” and “Pettersen”?¹⁴⁸
1. SELECT LastName > ‘Hansen’ AND LastName < ‘Pettersen’ FROM Persons
2. SELECT * FROM Persons WHERE LastName BETWEEN ‘Hansen’ AND ‘Pettersen’
3. SELECT * FROM Persons WHERE LastName > ‘Hansen’ AND LastName < ‘Pettersen’

Which SQL keyword returns only different values?¹⁴⁹
1. SELECT UNIQUE
2. SELECT DISTINCT
3. SELECT DIFFERENT

Which SQL keyword is used to sort the result-set?¹⁵⁰
1. ORDER BY
2. ORDER
3. SORT
4. SORT BY

What is the difference between the original data and the results set?¹⁵¹
1. original comes after SELECT and results comes after FROM
2. original comes after FROM and results comes after WHERE
3. original comes after WHERE and results comes after GROUP BY
4. original is the stored data and results comes after SELECT
5. original is the stored data and results comes after WHERE

With SQL, how can you return all the records from a table named “Persons” sorted descending by “FirstName”?¹⁵²
1. SELECT * FROM Persons ORDER FirstName DESC
2. SELECT * FROM Persons SORT ‘FirstName’ DESC
3. SELECT * FROM Persons ORDER BY FirstName DESC
4. SELECT * FROM Persons SORT BY ‘FirstName’ DESC

The OR operator displays a record if ANY conditions listed are true. The AND operator displays a record if ALL of the conditions listed are true.¹⁵³
1. TRUE
2. FALSE

In order to SELECT the records with foods that are either green or yellow fruit:¹⁵⁴
1. … WHERE type = ‘fruit’ AND color = ‘yellow’ OR color = ‘green’
2. … WHERE (type = ‘fruit’ AND color = ‘yellow’) OR color = ‘green’
3. … WHERE type = ‘fruit’ AND (color = ‘yellow’ OR color = ‘green’)
4. … WHERE type = ‘fruit’ AND color = ‘yellow’ AND color = ‘green’
5. … WHERE type = ‘fruit’ AND (color = ‘yellow’ AND color = ‘green’)

What is the purpose of a JOIN?¹⁵⁵
1. it filters the rows returned by the SELECT statement.
2. it specifies the columns to be retrieved.
3. it combines rows from two or more tables based on a related column.
4. it orders the results in ascending or descending order.

What is the purpose of the UNION operator in SQL?¹⁵⁶
1. it combines the results of two or more SELECT statements.
2. it performs a pattern match on a string.
3. it retrieves the maximum value in a column.
4. it filters the rows returned by the SELECT statement.

What is the purpose of the INNER JOIN in SQL?¹⁵⁷
1. it retrieves the maximum value in a column.
2. it combines rows from two or more tables based on a related column.
3. it filters the rows returned by the SELECT statement.
4. it performs a pattern match on a string.

What is the purpose of the LEFT JOIN in SQL?¹⁵⁸
1. it combines rows from two or more tables based on a related column.
2. it retrieves the maximum value in a column.
3. it filters the rows returned by the SELECT statement.
4. it performs a pattern match on a string.

RIGHT JOIN keeps all the rows in …?¹⁵⁹
1. the first table.
2. the second table.
3. both tables.
4. neither table

Who is removed in a RIGHT JOIN?¹⁶⁰
1. Mick
2. John
3. Paul
4. Keith

Which variable(s) are removed in a RIGHT JOIN?¹⁶¹
1. name
2. band
3. plays
4. none of them

In SQL, what happens to Mick’s “plays” variables in a FULL JOIN?¹⁶²
1. Mick is removed
2. guitar
3. bass
4. NA
5. NULL

What is wrong with this SQL clause?¹⁶³
1. No comma between selected columns
2. Can’t have a column and a summary of that column
3. Use MEAN() instead of AVG()
4. Variables aren’t in flights
5. Need a LIMIT

SELECT cancelled, AVG(cancelled)
FROM flights;

What is wrong with this SQL clause?¹⁶⁴
1. Need a WHERE
2. Need a FROM
3. Need a LIMIT
4. Need a GROUP BY
5. Need a SELECT

SELECT flights;

What is wrong with this SQL clause?¹⁶⁵
1. Need a doubl equals: ==
2. Need quotes around 2014: "2014"
3. Need quotes around dep_delay: "dep_delay"
4. year = 2014 should go in WHERE
5. Need GROUP BY year

SELECT dep_delay, year = 2014
FROM flights
LIMIT 10;

What is wrong with this SQL clause?¹⁶⁶
1. Can’t SELECT a variable from the results set
2. Can’t SELECT a variable from the original data
3. SUM() is not a function in SQL
4. No commas

SELECT SUM(cancelled) AS num_cancelled,
       num_cancelled / SUM(1) AS pct_cancelled
FROM flights
LIMIT 10;

With SQL, how do you select all the records from a table named “Persons” where the value of the column “FirstName” starts with an “a”?¹⁶⁷
1. SELECT * FROM Persons WHERE FirstName = ’a.*’
2. SELECT * FROM Persons WHERE FirstName = ’a*’
3. SELECT * FROM Persons WHERE FirstName REGEXP ’a.*’
4. SELECT * FROM Persons WHERE FirstName REGEXP ’a*’
5. SELECT * FROM Persons WHERE FirstName REGEXP ’(?i)a.*’

What is the main way to absolutely recognize a record within a database?¹⁶⁸
1. Foreign key
2. Primary key
3. Unique key
4. Natural key
5. Alternate key

What does a foreign key do?¹⁶⁹
1. Directly identifies another table
2. Directly identifies another column
3. Gives access to another entire database
4. Translates the database into another language

Which of these would likely be used as a foreign key between a table on student enrollment and student grades?¹⁷⁰
1. grades
2. tuition
3. student_name
4. student_hometown

For the student records (for two tables: enrollment and grades), which is the most likely combination?¹⁷¹
1. name as primary key to both
2. name as foreign to both
3. name as primary in enrollment and foreign in grades
4. name as foreign in enrollment and primary in grades

Which of the following is the primary function used to create a Shiny app?¹⁷²
1. shinyApp()
2. createApp()
3. runApp()
4. startShinyApp()

Which of the following Shiny components contains the code for handling user inputs and realizing outputs?¹⁷³
1. ui
2. server
3. runApp()
4. shinyApp()

Which of the following Shiny UI elements is used to allow users to select a single option from a list of choices?¹⁷⁴

selectInput()
radioButtons()
checkboxGroupInput()
textInput()

In Shiny, what is the purpose of the renderText() function?¹⁷⁵

To display a plot as text
To generate text output based on reactive inputs
To create a text input field
To render HTML elements

What does the ui component in a Shiny app represent?¹⁷⁶

The logic of the application
The server-side calculations
The user interface elements
The global settings for the app

Which Shiny function is used to handle reactive expressions in the server function?¹⁷⁷

reactive()
render()
observe()
updateInput()

What is the default output type for renderPlot() in Shiny?¹⁷⁸
1. plotly chart
2. ggplot2 plot
3. base R plot
4. HTML table

Which of the following is the correct way to create a slider input in Shiny?¹⁷⁹
1. sliderInput("slider", "Slider", min = 1, max = 100, value = 50)
2. inputSlider("slider", min = 1, max = 100)
3. sliderControl("slider", 1, 100)
4. input_slider("slider", 1, 100)

Footnotes

wherever you are, make sure you are communicating with me when you have questions!
wherever you are, make sure you are communicating with me when you have questions!
1. on Gradescope
1. pushing the file(s)
1. poor assignment operator
1. invalid object name
1. unmatched quotes
1. no mistake
1. improper syntax for a function argument
1. I mean, the right answer has to be Yes, right!??!
no right answer here!
1. In the local folder which also has the R project. It could be on the Desktop or the Home directory, but it must be in the same place as the R project. Do not upload files to the remote GitHub directory or you will find yourself with two different copies of the files.
Yes! All the responses are reasons to make a figure.
1. Because that graphic displays the message you want as optimally as possible.
1. length, definitely. Maybe also a. position.
1. position, definitely, also e. color. Maybe also b. length.
1. color, definitely. Also probably d. area.
1. color must be specified outside the aes() function
1. dot color is specified as “navy”, line color is specified as wday.
1. set the information outside the aes() function
answers may vary. I’d say c. putting the work in context. Others might say b. facilitating comparison or d. simplifying the story. However, I don’t think a correct answer is a. making the data stand out.
1. making the data stand out
1. One showed the relevant comparison better.
1. It isn’t at the origin. in combination with d. There wasn’t a label explaining why the axes were where they were. The story associated with the average value axes is not clear to the reader.
1. The job of a bar plot is to count the number of instances.
1. fill = children colors and position = "fill" changes the y-axis. AND c. fill = children goes in the aes and position = "fill" goes outside the aes
1. geom_bar() is for categorical variables and geom_histogram() is for nubmers.
1. Table c is best because the columns allow us to work with each of the variable separately.
1. starbucks in wrong place
1. does something different because it takes the sum() instead of the mean(). The other commands compute the average fat broken down by type of Starbucks item
1. filter()
1. (theme, year)
1. mean(pieces)
running the different code chunks with relevant output.
1. on Gradescope
1. on Canvas
1. -country
1. year
1. gdpval (if possible, good idea to name variables something different from the name of the data frame)
1. use pivot_wider() on raw data
1. use pivot_longer() on raw data. The reference to the study is: Gregory Belenky, Nancy J. Wesensten, David R. Thorne, Maria L. Thomas, Helen C. Sing, Daniel P. Redmond, Michael B. Russo and Thomas J. Balkin (2003) Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: a sleep dose-response study. Journal of Sleep Research 12, 1–12.
1. Mick
1. none of them (the default is to retain all the variables)
1. NA (it would be NULL in SQL)
1. roster |> anti_join(classes, by = “student_id”)`
1. roster |> full_join(classes, by = “student_id”)`
e.roster |> semi_join(classes, by = "student_id")
1. classes |> anti_join(roster, by = "student_id")
1. roster |> inner_join(classes, by = "student_id") |> filter(major != subject)
1. “a1” “b2” “c3”
1. “a 1” “b 2” “c 3”
1. “ab” “ifg” Again, str_sub() is vectorized. So the subset of string one is from 1 to 2. The subset of string two is from 3 to 5.
1. “one -pple” “two p-ars” “three bananas” (because str_replace() is vectorized)
1. TRUE TRUE TRUE TRUE
1. alphabetically, from a to z
1. Changes the order of the levels of type
1. Some of the above (b. Changes the values of x and c. Changes the levels of x and sort of d. Changes the order of the levels of x, jut by the nature of c.)
I don’t know what the answer is. Ill-defined question.
1. different output formatting (the first produces 9 the second produces Sep)
1. Today is the 47th week of the year.
1. Day of month and day of year. (Day of year is often called the “Julian Day”.)
1. TRUE
neither c. nor e. would match. Inside the bracket “[^u]” matches anything other than a “u”, but it has to match something.
1. some of the above. d. Inside a character class | is a normal character and would therefore match “grey” and “gray” and “gr|y”. Which is not what we want, but would work to match both “grey” and “gray”. c. would not match with str_extract() (but might match in other parsers that ignored spaces).
1. 1 (because \d matches only a single digit).
1. 10 (because \d+ matches at least one digit).
1. E (because . matches anything, and returns only a single character).
1. Episode 2: The pie whisperer. (4 August 2015) (because . matches anything, and with the + it returns multiple characters).
1. . (because \. matches the period, .).
1. The first includes Jane.
1. "(?<=\\$)\\d+"
1. "\\w+(?= pie)"
1. we return the string at the lookaround
1. error (code will fail)
1. 9
1. 8
1. 8
1. map_chr(c(1,4,7), addTen) because the output is in quotes, the values are strings, not numbers.
1. all of the above. The map() function allows vectors, lists, and data frames as input.
1. map(c(1, 4, 7), ~addTen). The ~ acts on functions that do not have their own name or that are defined by function(...). By adding the argument (.x) we’ve expanded the addTen() function, and so it needs a ~. The addTen() function all alone does not use a ~.
1. 6 random normals (1 with mean 1, sd 3; 2 with mean 3, sd 1; 3 with mean 47, sd 10)
1. There is no object called jan31.
1. I haven’t loaded lubridate (which is why it doesn’t recognize that ymd() is not a function).
1. There is no error.
1. 3, 2, 1
1. 1
1. 2
1. 1
1. 2
Maybe c. 3? a. 1? They are all okay, but you have to think carefully to read any of them!
1. question, yes, no
1. “1”, “cat”, “5”, NA, “cat” (Note that the numbers were converted to character strings!)
1. on Gradescope
1. on Canvas
1. makes your results reproducible
1. [1] “f” “h” “i” “e” “g” “d” “c” “j” “b” “a”
1. [1] “e” “j” “e” “b” “e” “c” “f” “a” “e” “a”
1. [1] “j” “g” “f” “i” “f” (could have also been b.)
1. [1] “i” “b” “g” “d” “a”
1. the number of hats that match
1. the proportion of hats that match
1. whether or not at least one hat matches
1. 10 (it represents both the number of overall hats and the number of hats we select)
1. Repeat for many iterations. (The next step needs to gather information on how the FP and FN results hold, it might have just been something odd in my simulation… )
all of the above
It totally depends on your personality and your finances. b. doesn’t make much sense. But a., c., and d. are all very reasonable questions to ask about your investments.
1. 8
1. 0.196 (19.6% of the time)
1. 0.063 (6.3% of the time)
1. very surprising (prob of 14 or more is 0.0021)
1. alternative, one sided (because probably we are studying that it increases their success rate)
1. null, two sided (because I have no idea which cheetah might run faster)
1. alternative, two sided (because I have no idea whether they’ve increased or decreased)
1. null, one sided (because I happen to know that folic acid is thought to prevent facial clefts)
1. alternative, one sided (because I happen to know that caffeine is thought to decrease baby’s birth weight)
1. A number which (is almost always unknown and) describes a population.
1. The statistic
1. To understand the variability of the statistic when $H_0$ is true.
1. without replacement (replace = FALSE)
1. It depends. a. would be the correct answer if the alternative hypothesis is $\mu_1 - \mu_2 < 0$, b. would be the right answer if the alternative hypothesis is $\mu_1 - \mu_2 > 0$ and c. would be the right answer if the alternative hypothesis is $\mu_1 - \mu_2 \ne 0$.
1. Double the area to the left.
1. The scale of the y-axis is upside down.
data-to-ink ratio measures how much ink is being used for the data (data-ink) as compared to how much ink is because used on the entire plot (labels, etc.)
1. The scale of the y-axis is way too zoomed in.
data-to-ink ratio measures how much ink is being used for the data (data-ink) as compared to how much ink is because used on the entire plot (labels, etc.)
1. Non-intuitive sorting (sorted in decreasing order instead of by time)
data-to-ink ratio measures how much ink is being used for the data (data-ink) as compared to how much ink is because used on the entire plot (labels, etc.)
1. Inconsistent labeling (the colors mean different things for the two plots)
data-to-ink ratio measures how much ink is being used for the data (data-ink) as compared to how much ink is because used on the entire plot (labels, etc.)
there can’t possibly be a right answer here.
1. students (is the top word over the last 500 opinion articles)
1. the elements have < >
1. use selector: img
1. use selector: [href]
1. an attribute describes an element
1. A relational database management system.
1. The first versions were created in the 1970s and called SEQUEL (Structured English QUEry Language). c. SQL came about in particular systems in the 1980s.
1. relational database management systems.
1. computer’s hard drive / storage
1. SQL tbl in storage and R tibble in memory
1. SELECT
1. SELECT FirstName FROM Persons
1. SELECT * FROM Persons
1. SELECT COUNT(*) FROM Persons
1. SELECT * FROM Persons WHERE FirstName = ‘Peter’ (d. would also work.)
1. SELECT * FROM Persons WHERE FirstName = ‘Peter’ AND LastName = ‘Jackson’
1. BEWTEEN
1. SELECT * FROM Persons WHERE LastName BETWEEN ‘Hansen’ AND ‘Pettersen’
1. SELECT DISTINCT
1. ORDER BY
1. original is the stored data and results comes after SELECT
1. SELECT * FROM Persons ORDER BY FirstName DESC
1. TRUE
1. … WHERE type = ‘fruit’ AND (color = ‘yellow’ OR color = ‘green’)
1. it combines rows from two or more tables based on a related column.
1. it combines the results of two or more SELECT statements.
1. it combines rows from two or more tables based on a related column.
1. it combines rows from two or more tables based on a related column.
1. the second table
1. Mick
1. none of them (all variables are kept in all joins)
1. NULL (it would be NA in R)
1. Can’t have a column and a summary of that column.
1. Need a FROM
1. year = 2014 should go in WHERE
1. Can’t SELECT a variable from the results set
1. SELECT * FROM Persons WHERE FirstName REGEXP ’(?i)a.*’ (n.b., the LIKE function will give you a similar result, with % as a wildcard: SELECT*FROMPersonsWHERE` FirstName LIKE ‘a%’)
1. Primary key
1. Directly identifies another column
1. student_name
1. name as primary in enrollment and foreign in grades (the primary key must uniquely identify the records, and name is unlikely to do that in a grades database.)
1. shinyApp()
1. server
1. selectInput()
1. To generate text output based on reactive inputs
1. The user interface elements
1. reactive()
1. base R plot
1. sliderInput("slider", "Slider", min = 1, max = 100, value = 50)