to go along with
Modern Data Science with R, 3rd edition by Baumer, Kaplan, and Horton
R for Data Science, 2nd edition by Wickham, Çetinkaya-Rundel, and Grolemund
geom_point()aes() functionwday.ggplot() function.aes() function.aes() functiony designation in the aes() function for the geom_bar() geometry?25
aes() function.y should be when not specified.y is specified in ggplot().y variable is the same as the x variable.fill = children and position = "fill"?26
fill = children colors and position = "fill" changes the y-axisfill = children changes the y-axis and position = "fill" colorsfill = children goes in the aes and position = "fill" goes outside the aesfill = children goes outside the aes and position = "fill" goes inside the aesfill = children and position = "fill" are two different ways to write the same thing.geom_bar() and geom_histogram()?27
geom_bar() is for numbers and geom_histogram() is for categorical variables.geom_bar() is for categorical variables and geom_histogram() is for nubmers.geom_bar() produces counts and geom_histogram() produces percentages.geom_bar() produces percentages and geom_histogram() produces counts.| year | Algeria | Brazil | Columbia |
|---|---|---|---|
| 2000 | 7 | 12 | 16 |
| 2001 | 9 | 14 | 18 |
| country | Y2000 | Y2001 |
|---|---|---|
| Algeria | 7 | 9 |
| Brazil | 12 | 14 |
| Columbia | 16 | 18 |
| country | year | value |
|---|---|---|
| Algeria | 2000 | 7 |
| Algeria | 2001 | 9 |
| Brazil | 2000 | 12 |
| Brazil | 2001 | 14 |
| Columbia | 2000 | 16 |
| Columbia | 2001 | 18 |
Bakery should be upper casetype should not be in quotesstarbucks in wrong place#(a)
starbucks |>
group_by(type) |>
summarize(average_fat = mean(fat))
#(b)
group_by(starbucks, type) |>
summarize(average_fat = mean(fat))
#(c)
group_by(starbucks, type) |>
summarize(average_fat = sum(fat))
#(d)
temp <- group_by(starbucks, type)
summarize(temp, average_fat = mean(fat))
#(e)
summarize(group_by(starbucks, type),
average_fat = mean(fat))filter()arrange()select()mutate()group_by()(theme, price)(theme, year)(year, price)(pieces, year)(pieces, price)n_distinct(pieces)n_distinct(price)sum(pieces)sum(pages)mean(pieces)library(openintro)
lego_sample |>
filter(!is.na(minifigures)) |>
# keep only those with minifigures
group_by(theme, year) |>
# for each theme for each year
summarize(ave_pieces = mean(pieces))# A tibble: 9 × 3
# Groups: theme [3]
theme year ave_pieces
<chr> <dbl> <dbl>
1 City 2018 189.
2 City 2019 257.
3 City 2020 349
4 DUPLO® 2018 50.5
5 DUPLO® 2019 32.5
6 DUPLO® 2020 45.8
7 Friends 2018 354.
8 Friends 2019 259.
9 Friends 2020 250.
# A tibble: 7 × 2
type average_fat
<fct> <dbl>
1 bakery 14.6
2 bistro box 18.4
3 hot breakfast 13.7
4 parfait 6.5
5 petite 9.33
6 salad 0
7 sandwich 14.7
# A tibble: 7 × 2
type average_fat
<fct> <dbl>
1 bakery 14.6
2 bistro box 18.4
3 hot breakfast 13.7
4 parfait 6.5
5 petite 9.33
6 salad 0
7 sandwich 14.7
# A tibble: 7 × 2
type average_fat
<fct> <dbl>
1 bakery 597
2 bistro box 147
3 hot breakfast 110.
4 parfait 19.5
5 petite 84
6 salad 0
7 sandwich 103
# A tibble: 7 × 2
type average_fat
<fct> <dbl>
1 bakery 14.6
2 bistro box 18.4
3 hot breakfast 13.7
4 parfait 6.5
5 petite 9.33
6 salad 0
7 sandwich 14.7
# A tibble: 7 × 2
type average_fat
<fct> <dbl>
1 bakery 14.6
2 bistro box 18.4
3 hot breakfast 13.7
4 parfait 6.5
5 petite 9.33
6 salad 0
7 sandwich 14.7
gdpyeargdpvalcountry–countrygdpyeargdpvalcountry–countrygdpyeargdpvalcountry–countryMidterm score on the x-axis and Final score on the y-axis using the following ggplot() code. Which data frame should you use?40
pivot_wider() on raw datapivot_longer() on raw data# A tibble: 4 × 3
student test score
<chr> <chr> <dbl>
1 Alice Midterm 85
2 Alice Final 90
3 Bob Midterm 78
4 Bob Final 82
# A tibble: 2 × 3
student Midterm Final
<chr> <dbl> <dbl>
1 Alice 85 90
2 Bob 78 82
ggplot() code. Which data frame should you use?41
pivot_wider() on raw datapivot_longer() on raw data# A tibble: 18 × 11
Subject day_0 day_1 day_2 day_3 day_4 day_5 day_6 day_7 day_8 day_9
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 308 250. 259. 251. 321. 357. 415. 382. 290. 431. 466.
2 309 223. 205. 203. 205. 208. 216. 214. 218. 224. 237.
3 310 199. 194. 234. 233. 229. 220. 235. 256. 261. 248.
4 330 322. 300. 284. 285. 286. 298. 280. 318. 305. 354.
5 331 288. 285 302. 320. 316. 293. 290. 335. 294. 372.
6 332 235. 243. 273. 310. 317. 310 454. 347. 330. 254.
7 333 284. 290. 277. 300. 297. 338. 332. 349. 333. 362.
8 334 265. 276. 243. 255. 279. 284. 306. 332. 336. 377.
9 335 242. 274. 254. 271. 251. 255. 245. 235. 236. 237.
10 337 312. 314. 292. 346. 366. 392. 404. 417. 456. 459.
11 349 236. 230. 239. 255. 251. 270. 282. 308. 336. 352.
12 350 256. 243. 256. 256. 269. 330. 379. 363. 394. 389.
13 351 251. 300. 270. 281. 272. 305. 288. 267. 322. 348.
14 352 222. 298. 327. 347. 349. 353. 354. 360. 376. 389.
15 369 272. 268. 257. 278. 315. 317. 298. 348. 340. 367.
16 370 225. 235. 239. 240. 268. 344. 281. 348. 365. 372.
17 371 270. 272. 278. 282. 279. 285. 259. 305. 351. 369.
18 372 269. 273. 298. 311. 287. 330. 334. 343. 369. 364.
sleep_long <- sleep_wide |>
pivot_longer(cols = -Subject,
names_to = "day",
names_prefix = "day_",
values_to = "reaction_time")
sleep_long# A tibble: 180 × 3
Subject day reaction_time
<dbl> <chr> <dbl>
1 308 0 250.
2 308 1 259.
3 308 2 251.
4 308 3 321.
5 308 4 357.
6 308 5 415.
7 308 6 382.
8 308 7 290.
9 308 8 431.
10 308 9 466.
# ℹ 170 more rows
right_join()?42right_join()?43namebandplaysplays variable in a full_join()?44NANULL roster |> inner_join(classes, by = “student_id”) |> filter(major != subject)`classes |> anti_join(roster, by = "student_id")roster |> anti_join(classes, by = "student_id")roster |> full_join(classes, by = "student_id")roster |> semi_join(classes, by = "student_id")roster |> inner_join(classes, by = “student_id”) |> filter(major != subject)`classes |> anti_join(roster, by = "student_id")roster |> anti_join(classes, by = "student_id")roster |> full_join(classes, by = "student_id")roster |> semi_join(classes, by = "student_id")roster |> inner_join(classes, by = “student_id”) |> filter(major != subject)`classes |> anti_join(roster, by = "student_id")roster |> anti_join(classes, by = "student_id")roster |> full_join(classes, by = "student_id")roster |> semi_join(classes, by = "student_id")roster |> inner_join(classes, by = “student_id”) |> filter(major != subject)`classes |> anti_join(roster, by = "student_id")roster |> anti_join(classes, by = "student_id")roster |> full_join(classes, by = "student_id")roster |> semi_join(classes, by = "student_id")roster |> inner_join(classes, by = “student_id”) |> filter(major != subject)`classes |> anti_join(roster, by = "student_id")roster |> anti_join(classes, by = "student_id")roster |> full_join(classes, by = "student_id")roster |> semi_join(classes, by = "student_id")caloriestypetypetypetypefct_recode() do here?57
xxxstr_subset(very.large.word.list, "q[^u]") would not match which of the following?63
"(?<=\\$)\\d""(?<=\\$)\\d+""\\d(?=\\$)""\\d+(?=\\$)""\\w+(?!pie)""\\w+(?! pie)""\\w+(?=pie)""\\w+(?= pie)"[1] "apple" "chocolate" "peach"
addTen() function. The following output is a result of which map_*() call?78map(c(1,4,7), addTen)map_dbl(c(1,4,7), addTen)map_chr(c(1,4,7), addTen)map_lgl(c(1,4,7), addTen)[1] "11.000000" "14.000000" "17.000000"
map(c(1, 4, 7), addTen)map(list(1, 4, 7), addTen)map(data.frame(a=1, b=4, c=7), addTen)map(c(1, 4, 7), addTen)map(c(1, 4, 7), ~addTen(.x))map(c(1, 4, 7), ~addTen)map(c(1, 4, 7), function(hi) (hi + 10))map(c(1, 4, 7), ~(.x + 10))jan31.months() is not a function.jan31.ymd() is not a function.jan31.ymd() is not a function. library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
jan31 <- ymd("2021-01-31")
jan31 + months(0:11) + days(31)
#> [1] "2021-03-03" NA "2021-05-01" NA "2021-07-01"
#> [6] NA "2021-08-31" "2021-10-01" NA "2021-12-01"
#> [11] NA "2022-01-31"ifelse() function takes the arguments:91set.seed() function95N(talent, 15)grades and SAT are to talent (bias?)replace = TRUE)replace = FALSE).#< >[ ]html_elements()) all the instances of the <img> (image) element?133
<img>.img#img[img]imghtml_elements()) all the instances of the href= (URL) attribute?134
<href>.href#href[href]hreftbl and R tibble both in storagetbl and R tibble both in memorytbl in storage and R tibble in memorytbl in memory and R tibble in storageSELECT Persons.FirstNameFROM PersonsSELECT FirstName FROM PersonsSELECT “FirstName” FROM “Persons”SELECT PersonsSELECT * FROM PersonsSELECT [all] FROM PersonsSELECT *.PersonsSELECT COLUMNS(*) FROM PersonsSELECT COUNT(*) FROM PersonsSELECT NO(*) FROM PersonsSELECT LEN(*) FROM PersonsSELECT * FROM Persons WHERE FirstName <> ‘Peter’SELECT * FROM Persons WHERE FirstName = ‘Peter’SELECT * FROM Persons WHERE FirstName == ‘Peter’SELECT * FROM Persons WHERE FirstName LIKE ‘Peter’SELECT [all] FROM Persons WHERE FirstName = ‘Peter’SELECT FirstName = ‘Peter’, LastName = ‘Jackson’ FROM PersonsSELECT * FROM Persons WHERE FirstName = ‘Peter’ & LastName = ‘Jackson’SELECT * FROM Persons WHERE FirstName = ‘Peter’ AND LastName = ‘Jackson’SELECT * FROM Persons WHERE FirstName = ‘Peter’ | LastName = ‘Jackson’BEWTEENWITHINRANGESELECT LastName > ‘Hansen’ AND LastName < ‘Pettersen’ FROM PersonsSELECT * FROM Persons WHERE LastName BETWEEN ‘Hansen’ AND ‘Pettersen’SELECT * FROM Persons WHERE LastName > ‘Hansen’ AND LastName < ‘Pettersen’SELECT UNIQUESELECT DISTINCTSELECT DIFFERENTORDER BYORDERSORTSORT BYSELECT and results comes after FROMFROM and results comes after WHEREWHERE and results comes after GROUP BYSELECTWHERESELECT * FROM Persons ORDER FirstName DESCSELECT * FROM Persons SORT ‘FirstName’ DESCSELECT * FROM Persons ORDER BY FirstName DESCSELECT * FROM Persons SORT BY ‘FirstName’ DESCSELECT the records with foods that are either green or yellow fruit:154
WHERE type = ‘fruit’ AND color = ‘yellow’ OR color = ‘green’WHERE (type = ‘fruit’ AND color = ‘yellow’) OR color = ‘green’WHERE type = ‘fruit’ AND (color = ‘yellow’ OR color = ‘green’)WHERE type = ‘fruit’ AND color = ‘yellow’ AND color = ‘green’WHERE type = ‘fruit’ AND (color = ‘yellow’ AND color = ‘green’)JOIN?155
SELECT statement.UNION operator in SQL?156
SELECT statements.SELECT statement.INNER JOIN in SQL?157
SELECT statement.LEFT JOIN in SQL?158
SELECT statement.RIGHT JOIN keeps all the rows in …?159
RIGHT JOIN?160
RIGHT JOIN?161
FULL JOIN?162
NULLMEAN() instead of AVG()flightsLIMITWHEREFROMLIMITGROUP BYSELECT=="2014"dep_delay: "dep_delay"year = 2014 should go in WHEREGROUP BY yearSELECT a variable from the results setSELECT a variable from the original dataSUM() is not a function in SQLSELECT * FROM Persons WHERE FirstName = ’a.*’SELECT * FROM Persons WHERE FirstName = ’a*’SELECT * FROM Persons WHERE FirstName REGEXP ’a.*’SELECT * FROM Persons WHERE FirstName REGEXP ’a*’SELECT * FROM Persons WHERE FirstName REGEXP ’(?i)a.*’shinyApp()createApp()runApp()startShinyApp()uiserverrunApp()shinyApp()selectInput()radioButtons()checkboxGroupInput()textInput()renderText() function?175ui component in a Shiny app represent?176reactive()render()observe()updateInput()renderPlot() in Shiny?178
sliderInput("slider", "Slider", min = 1, max = 100, value = 50)inputSlider("slider", min = 1, max = 100)sliderControl("slider", 1, 100)input_slider("slider", 1, 100)wherever you are, make sure you are communicating with me when you have questions!
wherever you are, make sure you are communicating with me when you have questions!
no right answer here!
Yes! All the responses are reasons to make a figure.
aes() functionwday.aes() functionanswers may vary. I’d say c. putting the work in context. Others might say b. facilitating comparison or d. simplifying the story. However, I don’t think a correct answer is a. making the data stand out.
fill = children colors and position = "fill" changes the y-axis. AND c. fill = children goes in the aes and position = "fill" goes outside the aesgeom_bar() is for categorical variables and geom_histogram() is for nubmers.starbucks in wrong placesum() instead of the mean(). The other commands compute the average fat broken down by type of Starbucks itemfilter()(theme, year)mean(pieces)running the different code chunks with relevant output.
-countryyeargdpval (if possible, good idea to name variables something different from the name of the data frame)pivot_wider() on raw datapivot_longer() on raw data. The reference to the study is: Gregory Belenky, Nancy J. Wesensten, David R. Thorne, Maria L. Thomas, Helen C. Sing, Daniel P. Redmond, Michael B. Russo and Thomas J. Balkin (2003) Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: a sleep dose-response study. Journal of Sleep Research 12, 1–12.NA (it would be NULL in SQL)roster |> anti_join(classes, by = “student_id”)`roster |> full_join(classes, by = “student_id”)`e.roster |> semi_join(classes, by = "student_id")
classes |> anti_join(roster, by = "student_id")roster |> inner_join(classes, by = "student_id") |> filter(major != subject)str_sub() is vectorized. So the subset of string one is from 1 to 2. The subset of string two is from 3 to 5.str_replace() is vectorized)typex and c. Changes the levels of x and sort of d. Changes the order of the levels of x, jut by the nature of c.)I don’t know what the answer is. Ill-defined question.
9 the second produces Sep)neither c. nor e. would match. Inside the bracket “[^u]” matches anything other than a “u”, but it has to match something.
| is a normal character and would therefore match “grey” and “gray” and “gr|y”. Which is not what we want, but would work to match both “grey” and “gray”. c. would not match with str_extract() (but might match in other parsers that ignored spaces).\d matches only a single digit).\d+ matches at least one digit).. matches anything, and returns only a single character).. matches anything, and with the + it returns multiple characters).\. matches the period, .)."(?<=\\$)\\d+""\\w+(?= pie)"map_chr(c(1,4,7), addTen) because the output is in quotes, the values are strings, not numbers.map() function allows vectors, lists, and data frames as input.map(c(1, 4, 7), ~addTen). The ~ acts on functions that do not have their own name or that are defined by function(...). By adding the argument (.x) we’ve expanded the addTen() function, and so it needs a ~. The addTen() function all alone does not use a ~.jan31.ymd() is not a function).Maybe c. 3? a. 1? They are all okay, but you have to think carefully to read any of them!
all of the above
It totally depends on your personality and your finances. b. doesn’t make much sense. But a., c., and d. are all very reasonable questions to ask about your investments.
replace = FALSE)data-to-ink ratio measures how much ink is being used for the data (data-ink) as compared to how much ink is because used on the entire plot (labels, etc.)
data-to-ink ratio measures how much ink is being used for the data (data-ink) as compared to how much ink is because used on the entire plot (labels, etc.)
data-to-ink ratio measures how much ink is being used for the data (data-ink) as compared to how much ink is because used on the entire plot (labels, etc.)
data-to-ink ratio measures how much ink is being used for the data (data-ink) as compared to how much ink is because used on the entire plot (labels, etc.)
there can’t possibly be a right answer here.
< >img[href]tbl in storage and R tibble in memorySELECT FirstName FROM PersonsSELECT * FROM PersonsSELECT COUNT(*) FROM PersonsSELECT * FROM Persons WHERE FirstName = ‘Peter’ (d. would also work.)SELECT * FROM Persons WHERE FirstName = ‘Peter’ AND LastName = ‘Jackson’BEWTEENSELECT * FROM Persons WHERE LastName BETWEEN ‘Hansen’ AND ‘Pettersen’SELECT DISTINCTORDER BYSELECTSELECT * FROM Persons ORDER BY FirstName DESCWHERE type = ‘fruit’ AND (color = ‘yellow’ OR color = ‘green’)SELECT statements.NULL (it would be NA in R)FROMyear = 2014 should go in WHERESELECT a variable from the results setSELECT * FROM Persons WHERE FirstName REGEXP ’(?i)a.*’ (n.b., the LIKE function will give you a similar result, with % as a wildcard: SELECT*FROMPersonsWHERE` FirstName LIKE ‘a%’)shinyApp()serverselectInput()sliderInput("slider", "Slider", min = 1, max = 100, value = 50)