library(tidyverse)
superfluous material examples
An okay way to look at a data frame
Sometimes the way the dataset is saved will make it print just the first few rows. This is totally okay in your rendered doc. Indeed, oftentimes printing the data to the screen is a great way to communicate aspects of the data (variable names, variable types, etc.)
library(nycflights13)
flights
# A tibble: 336,776 × 19
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
<int> <int> <int> <int> <int> <dbl> <int> <int>
1 2013 1 1 517 515 2 830 819
2 2013 1 1 533 529 4 850 830
3 2013 1 1 542 540 2 923 850
4 2013 1 1 544 545 -1 1004 1022
5 2013 1 1 554 600 -6 812 837
6 2013 1 1 554 558 -4 740 728
7 2013 1 1 555 600 -5 913 854
8 2013 1 1 557 600 -3 709 723
9 2013 1 1 557 600 -3 838 846
10 2013 1 1 558 600 -2 753 745
# ℹ 336,766 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
# tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
# hour <dbl>, minute <dbl>, time_hour <dttm>
A not okay way to look at a data frame
But sometimes, the formatting is such that the entire dataset is printed in the output!!! Having all the data print out isn’t helpful for your reader who is overwhelmed by pages of numbers.
<- data.frame(action = c(rep("non-smoker", 51), rep("smoker", 69)),
smokecancer outcome = c(rep("lung_cancer", 19), rep("healthy", 32),
rep("lung_cancer", 41), rep("healthy", 28)))
smokecancer
action outcome
1 non-smoker lung_cancer
2 non-smoker lung_cancer
3 non-smoker lung_cancer
4 non-smoker lung_cancer
5 non-smoker lung_cancer
6 non-smoker lung_cancer
7 non-smoker lung_cancer
8 non-smoker lung_cancer
9 non-smoker lung_cancer
10 non-smoker lung_cancer
11 non-smoker lung_cancer
12 non-smoker lung_cancer
13 non-smoker lung_cancer
14 non-smoker lung_cancer
15 non-smoker lung_cancer
16 non-smoker lung_cancer
17 non-smoker lung_cancer
18 non-smoker lung_cancer
19 non-smoker lung_cancer
20 non-smoker healthy
21 non-smoker healthy
22 non-smoker healthy
23 non-smoker healthy
24 non-smoker healthy
25 non-smoker healthy
26 non-smoker healthy
27 non-smoker healthy
28 non-smoker healthy
29 non-smoker healthy
30 non-smoker healthy
31 non-smoker healthy
32 non-smoker healthy
33 non-smoker healthy
34 non-smoker healthy
35 non-smoker healthy
36 non-smoker healthy
37 non-smoker healthy
38 non-smoker healthy
39 non-smoker healthy
40 non-smoker healthy
41 non-smoker healthy
42 non-smoker healthy
43 non-smoker healthy
44 non-smoker healthy
45 non-smoker healthy
46 non-smoker healthy
47 non-smoker healthy
48 non-smoker healthy
49 non-smoker healthy
50 non-smoker healthy
51 non-smoker healthy
52 smoker lung_cancer
53 smoker lung_cancer
54 smoker lung_cancer
55 smoker lung_cancer
56 smoker lung_cancer
57 smoker lung_cancer
58 smoker lung_cancer
59 smoker lung_cancer
60 smoker lung_cancer
61 smoker lung_cancer
62 smoker lung_cancer
63 smoker lung_cancer
64 smoker lung_cancer
65 smoker lung_cancer
66 smoker lung_cancer
67 smoker lung_cancer
68 smoker lung_cancer
69 smoker lung_cancer
70 smoker lung_cancer
71 smoker lung_cancer
72 smoker lung_cancer
73 smoker lung_cancer
74 smoker lung_cancer
75 smoker lung_cancer
76 smoker lung_cancer
77 smoker lung_cancer
78 smoker lung_cancer
79 smoker lung_cancer
80 smoker lung_cancer
81 smoker lung_cancer
82 smoker lung_cancer
83 smoker lung_cancer
84 smoker lung_cancer
85 smoker lung_cancer
86 smoker lung_cancer
87 smoker lung_cancer
88 smoker lung_cancer
89 smoker lung_cancer
90 smoker lung_cancer
91 smoker lung_cancer
92 smoker lung_cancer
93 smoker healthy
94 smoker healthy
95 smoker healthy
96 smoker healthy
97 smoker healthy
98 smoker healthy
99 smoker healthy
100 smoker healthy
101 smoker healthy
102 smoker healthy
103 smoker healthy
104 smoker healthy
105 smoker healthy
106 smoker healthy
107 smoker healthy
108 smoker healthy
109 smoker healthy
110 smoker healthy
111 smoker healthy
112 smoker healthy
113 smoker healthy
114 smoker healthy
115 smoker healthy
116 smoker healthy
117 smoker healthy
118 smoker healthy
119 smoker healthy
120 smoker healthy
Solutions
One way to print only the top of the data frame is to use the head()
function:
|>
smokecancer head()
action outcome
1 non-smoker lung_cancer
2 non-smoker lung_cancer
3 non-smoker lung_cancer
4 non-smoker lung_cancer
5 non-smoker lung_cancer
6 non-smoker lung_cancer
Another way to print only the top few rows is to change the object from a data.frame
into a tibble
(a tibble
is like a data.frame
but it has nice printing options). Note that one benefit of tibble()
over head()
is that the reader can see the size of the data frame (number of rows and columns).
|>
smokecancer tibble()
# A tibble: 120 × 2
action outcome
<chr> <chr>
1 non-smoker lung_cancer
2 non-smoker lung_cancer
3 non-smoker lung_cancer
4 non-smoker lung_cancer
5 non-smoker lung_cancer
6 non-smoker lung_cancer
7 non-smoker lung_cancer
8 non-smoker lung_cancer
9 non-smoker lung_cancer
10 non-smoker lung_cancer
# ℹ 110 more rows
Messages and warnings that the reader doesn’t need
Usually, at the top (in the YAML), I’ll scaffold the assignment to make the messages and warnings false
. That is, I’ll instruct the computer to not print messages. But for your project (where you don’t have a template), you’ll have to remember to do this! (Look at the HW YAML.)
This is what it looks like if you have set the messages to true
. Again, not very helpful for your reader (unless you are trying to communicate something specific about the versioning of the R package).
library(tidyverse)
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openintro)
Loading required package: airports
Loading required package: cherryblossom
Loading required package: usdata