superfluous material examples

library(tidyverse)

An okay way to look at a data frame

Sometimes the way the dataset is saved will make it print just the first few rows. This is totally okay in your rendered doc. Indeed, oftentimes printing the data to the screen is a great way to communicate aspects of the data (variable names, variable types, etc.)

library(nycflights13)
flights
# A tibble: 336,776 × 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
 1  2013     1     1      517            515         2      830            819
 2  2013     1     1      533            529         4      850            830
 3  2013     1     1      542            540         2      923            850
 4  2013     1     1      544            545        -1     1004           1022
 5  2013     1     1      554            600        -6      812            837
 6  2013     1     1      554            558        -4      740            728
 7  2013     1     1      555            600        -5      913            854
 8  2013     1     1      557            600        -3      709            723
 9  2013     1     1      557            600        -3      838            846
10  2013     1     1      558            600        -2      753            745
# ℹ 336,766 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

A not okay way to look at a data frame

But sometimes, the formatting is such that the entire dataset is printed in the output!!! Having all the data print out isn’t helpful for your reader who is overwhelmed by pages of numbers.

smokecancer <- data.frame(action = c(rep("non-smoker", 51), rep("smoker", 69)),
                     outcome = c(rep("lung_cancer", 19), rep("healthy", 32), 
                                 rep("lung_cancer", 41), rep("healthy", 28)))

smokecancer
        action     outcome
1   non-smoker lung_cancer
2   non-smoker lung_cancer
3   non-smoker lung_cancer
4   non-smoker lung_cancer
5   non-smoker lung_cancer
6   non-smoker lung_cancer
7   non-smoker lung_cancer
8   non-smoker lung_cancer
9   non-smoker lung_cancer
10  non-smoker lung_cancer
11  non-smoker lung_cancer
12  non-smoker lung_cancer
13  non-smoker lung_cancer
14  non-smoker lung_cancer
15  non-smoker lung_cancer
16  non-smoker lung_cancer
17  non-smoker lung_cancer
18  non-smoker lung_cancer
19  non-smoker lung_cancer
20  non-smoker     healthy
21  non-smoker     healthy
22  non-smoker     healthy
23  non-smoker     healthy
24  non-smoker     healthy
25  non-smoker     healthy
26  non-smoker     healthy
27  non-smoker     healthy
28  non-smoker     healthy
29  non-smoker     healthy
30  non-smoker     healthy
31  non-smoker     healthy
32  non-smoker     healthy
33  non-smoker     healthy
34  non-smoker     healthy
35  non-smoker     healthy
36  non-smoker     healthy
37  non-smoker     healthy
38  non-smoker     healthy
39  non-smoker     healthy
40  non-smoker     healthy
41  non-smoker     healthy
42  non-smoker     healthy
43  non-smoker     healthy
44  non-smoker     healthy
45  non-smoker     healthy
46  non-smoker     healthy
47  non-smoker     healthy
48  non-smoker     healthy
49  non-smoker     healthy
50  non-smoker     healthy
51  non-smoker     healthy
52      smoker lung_cancer
53      smoker lung_cancer
54      smoker lung_cancer
55      smoker lung_cancer
56      smoker lung_cancer
57      smoker lung_cancer
58      smoker lung_cancer
59      smoker lung_cancer
60      smoker lung_cancer
61      smoker lung_cancer
62      smoker lung_cancer
63      smoker lung_cancer
64      smoker lung_cancer
65      smoker lung_cancer
66      smoker lung_cancer
67      smoker lung_cancer
68      smoker lung_cancer
69      smoker lung_cancer
70      smoker lung_cancer
71      smoker lung_cancer
72      smoker lung_cancer
73      smoker lung_cancer
74      smoker lung_cancer
75      smoker lung_cancer
76      smoker lung_cancer
77      smoker lung_cancer
78      smoker lung_cancer
79      smoker lung_cancer
80      smoker lung_cancer
81      smoker lung_cancer
82      smoker lung_cancer
83      smoker lung_cancer
84      smoker lung_cancer
85      smoker lung_cancer
86      smoker lung_cancer
87      smoker lung_cancer
88      smoker lung_cancer
89      smoker lung_cancer
90      smoker lung_cancer
91      smoker lung_cancer
92      smoker lung_cancer
93      smoker     healthy
94      smoker     healthy
95      smoker     healthy
96      smoker     healthy
97      smoker     healthy
98      smoker     healthy
99      smoker     healthy
100     smoker     healthy
101     smoker     healthy
102     smoker     healthy
103     smoker     healthy
104     smoker     healthy
105     smoker     healthy
106     smoker     healthy
107     smoker     healthy
108     smoker     healthy
109     smoker     healthy
110     smoker     healthy
111     smoker     healthy
112     smoker     healthy
113     smoker     healthy
114     smoker     healthy
115     smoker     healthy
116     smoker     healthy
117     smoker     healthy
118     smoker     healthy
119     smoker     healthy
120     smoker     healthy

Solutions

One way to print only the top of the data frame is to use the head() function:

smokecancer |> 
  head()
      action     outcome
1 non-smoker lung_cancer
2 non-smoker lung_cancer
3 non-smoker lung_cancer
4 non-smoker lung_cancer
5 non-smoker lung_cancer
6 non-smoker lung_cancer

Another way to print only the top few rows is to change the object from a data.frame into a tibble (a tibble is like a data.frame but it has nice printing options). Note that one benefit of tibble() over head() is that the reader can see the size of the data frame (number of rows and columns).

smokecancer |> 
  tibble()
# A tibble: 120 × 2
   action     outcome    
   <chr>      <chr>      
 1 non-smoker lung_cancer
 2 non-smoker lung_cancer
 3 non-smoker lung_cancer
 4 non-smoker lung_cancer
 5 non-smoker lung_cancer
 6 non-smoker lung_cancer
 7 non-smoker lung_cancer
 8 non-smoker lung_cancer
 9 non-smoker lung_cancer
10 non-smoker lung_cancer
# ℹ 110 more rows

Messages and warnings that the reader doesn’t need

Usually, at the top (in the YAML), I’ll scaffold the assignment to make the messages and warnings false. That is, I’ll instruct the computer to not print messages. But for your project (where you don’t have a template), you’ll have to remember to do this! (Look at the HW YAML.)

This is what it looks like if you have set the messages to true. Again, not very helpful for your reader (unless you are trying to communicate something specific about the versioning of the R package).

library(tidyverse)
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openintro)
Loading required package: airports
Loading required package: cherryblossom
Loading required package: usdata

Reuse

CC-BY-SA-4.0