Variable Types

September 22 + 24, 2025

Jo Hardin

Variable Types

Some variable types:

  • character strings
  • factor variables
  • dates
  • numeric (decimal)
  • integer
  • logical (Boolean)

A variable’s type determines the values that the variable can take on and the operations that can be performed on it. Specifying variable types ensures the data’s integrity and increases performance.

Agenda 9/22/25

  1. Character strings
  2. str_*() functions

Character strings

When working with character strings, we might want to detect, replace, or extract certain patterns.

Strings are objects of the character class (abbreviated as <chr> in tibbles). When you print out strings, they display with double quotes:

some_string <- "banana"
some_string
[1] "banana"

Creating strings

Creating strings by hand is useful for testing out regular expressions.

To create a string, type any text in either double quotes " or single quotes '. Using double or single quotes doesn’t matter unless your string itself has single or double quotes.

string1 <- "This is a string"
string2 <- 'If I want to include a "quote" inside a string, I use single quotes'

string1
[1] "This is a string"
string2
[1] "If I want to include a \"quote\" inside a string, I use single quotes"

str_view()

We can view these strings “naturally” (without the opening and closing quotes) with str_view():

str_view(string1)
[1] │ This is a string
str_view(string2)
[1] │ If I want to include a "quote" inside a string, I use single quotes

Working with strings

  • yes, lots of code to learn
  • no, the code doesn’t really matter
  • learning goal for today:

what can you even do with string variables???

str_c

Similar to paste() (gluing strings together), but works well in a tidy pipeline.

df <- tibble(name = c("Flora", "David", "Terra", NA))
df |> mutate(greeting = str_c("Hi ", name, "!"))
# A tibble: 4 × 2
  name  greeting 
  <chr> <chr>    
1 Flora Hi Flora!
2 David Hi David!
3 Terra Hi Terra!
4 <NA>  <NA>     

str_sub()

str_sub(string, start, end) will extract parts of a string where start and end are the positions where the substring starts and ends.

fruits <- c("Apple", "Banana", "Pear")
str_sub(fruits, 1, 3)
[1] "App" "Ban" "Pea"
str_sub(fruits, -3, -1)
[1] "ple" "ana" "ear"

Won’t fail if the string is too short.

str_sub(fruits, 1, 5)
[1] "Apple" "Banan" "Pear" 

str_sub() in a pipeline

We can use the str_*() functions inside the mutate() function.

titanic |> 
  mutate(class1 = str_sub(Class, 1, 1))
   Class    Sex   Age Survived Freq class1
1    1st   Male Child       No    0      1
2    2nd   Male Child       No    0      2
3    3rd   Male Child       No   35      3
4   Crew   Male Child       No    0      C
5    1st Female Child       No    0      1
6    2nd Female Child       No    0      2
7    3rd Female Child       No   17      3
8   Crew Female Child       No    0      C
9    1st   Male Adult       No  118      1
10   2nd   Male Adult       No  154      2
11   3rd   Male Adult       No  387      3
12  Crew   Male Adult       No  670      C
13   1st Female Adult       No    4      1
14   2nd Female Adult       No   13      2
15   3rd Female Adult       No   89      3
16  Crew Female Adult       No    3      C
17   1st   Male Child      Yes    5      1
18   2nd   Male Child      Yes   11      2
19   3rd   Male Child      Yes   13      3
20  Crew   Male Child      Yes    0      C
21   1st Female Child      Yes    1      1
22   2nd Female Child      Yes   13      2
23   3rd Female Child      Yes   14      3
24  Crew Female Child      Yes    0      C
25   1st   Male Adult      Yes   57      1
26   2nd   Male Adult      Yes   14      2
27   3rd   Male Adult      Yes   75      3
28  Crew   Male Adult      Yes  192      C
29   1st Female Adult      Yes  140      1
30   2nd Female Adult      Yes   80      2
31   3rd Female Adult      Yes   76      3
32  Crew Female Adult      Yes   20      C

str_replace*()

str_replace() replaces the first match of a pattern. str_replace_all() replaces all the matches of a pattern.

fruits
[1] "Apple"  "Banana" "Pear"  
str_replace(fruits, "a", "x")
[1] "Apple"  "Bxnana" "Pexr"  
str_replace_all(fruits, "a", "x")
[1] "Apple"  "Bxnxnx" "Pexr"  

str_detect()

fruits
[1] "Apple"  "Banana" "Pear"  
str_detect(fruits, "a")
[1] FALSE  TRUE  TRUE
str_detect(fruits, "A")
[1]  TRUE FALSE FALSE

str_detect() in pipeline

str_detect() used in a filter() pipeline.

starwars |> 
  select(name, films) 
# A tibble: 87 × 2
   name               films    
   <chr>              <list>   
 1 Luke Skywalker     <chr [5]>
 2 C-3PO              <chr [6]>
 3 R2-D2              <chr [7]>
 4 Darth Vader        <chr [4]>
 5 Leia Organa        <chr [5]>
 6 Owen Lars          <chr [3]>
 7 Beru Whitesun Lars <chr [3]>
 8 R5-D4              <chr [1]>
 9 Biggs Darklighter  <chr [1]>
10 Obi-Wan Kenobi     <chr [6]>
# ℹ 77 more rows
starwars |> 
  select(name, films) |> 
  unnest_wider(films, names_sep = "")
# A tibble: 87 × 8
   name               films1 films2 films3 films4 films5 films6 films7
   <chr>              <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr> 
 1 Luke Skywalker     A New… The E… Retur… Reven… The F… <NA>   <NA>  
 2 C-3PO              A New… The E… Retur… The P… Attac… Reven… <NA>  
 3 R2-D2              A New… The E… Retur… The P… Attac… Reven… The F…
 4 Darth Vader        A New… The E… Retur… Reven… <NA>   <NA>   <NA>  
 5 Leia Organa        A New… The E… Retur… Reven… The F… <NA>   <NA>  
 6 Owen Lars          A New… Attac… Reven… <NA>   <NA>   <NA>   <NA>  
 7 Beru Whitesun Lars A New… Attac… Reven… <NA>   <NA>   <NA>   <NA>  
 8 R5-D4              A New… <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
 9 Biggs Darklighter  A New… <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
10 Obi-Wan Kenobi     A New… The E… Retur… The P… Attac… Reven… <NA>  
# ℹ 77 more rows
starwars |> 
  filter(str_detect(films, "Empire")) |> 
  select(name, films) |> 
  unnest_wider(films, names_sep = "")
# A tibble: 16 × 8
   name             films1   films2 films3 films4 films5 films6 films7
   <chr>            <chr>    <chr>  <chr>  <chr>  <chr>  <chr>  <chr> 
 1 Luke Skywalker   A New H… The E… Retur… Reven… The F… <NA>   <NA>  
 2 C-3PO            A New H… The E… Retur… The P… Attac… Reven… <NA>  
 3 R2-D2            A New H… The E… Retur… The P… Attac… Reven… The F…
 4 Darth Vader      A New H… The E… Retur… Reven… <NA>   <NA>   <NA>  
 5 Leia Organa      A New H… The E… Retur… Reven… The F… <NA>   <NA>  
 6 Obi-Wan Kenobi   A New H… The E… Retur… The P… Attac… Reven… <NA>  
 7 Chewbacca        A New H… The E… Retur… Reven… The F… <NA>   <NA>  
 8 Han Solo         A New H… The E… Retur… The F… <NA>   <NA>   <NA>  
 9 Wedge Antilles   A New H… The E… Retur… <NA>   <NA>   <NA>   <NA>  
10 Yoda             The Emp… Retur… The P… Attac… Reven… <NA>   <NA>  
11 Palpatine        The Emp… Retur… The P… Attac… Reven… <NA>   <NA>  
12 Boba Fett        The Emp… Retur… Attac… <NA>   <NA>   <NA>   <NA>  
13 IG-88            The Emp… <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
14 Bossk            The Emp… <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
15 Lando Calrissian The Emp… Retur… <NA>   <NA>   <NA>   <NA>   <NA>  
16 Lobot            The Emp… <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  

str_length()

Returns the number of pieces in a string (usually a single letter, number, or space).

x <- c("apple", "banana", "cherry pie", "")

str_length(x)
[1]  5  6 10  0

str_count()

Counts the number of times pattern is found within each element of string.

x <- c("apple pie apple tart", "banana apple", "cherry")

str_count(x, "apple")
[1] 2 1 0

str_extract*()

str_extract() extracts the first complete match from each string,

str_extract_all() extracts all matches from each string.

x <- c("apple pie apple tart", "banana apple", "cherry")


str_extract(x, "a")
[1] "a" "a" NA 
str_extract_all(x, "a")
[[1]]
[1] "a" "a" "a"

[[2]]
[1] "a" "a" "a" "a"

[[3]]
character(0)
str_extract(x, "apple")
[1] "apple" "apple" NA     
str_extract_all(x, "apple")
[[1]]
[1] "apple" "apple"

[[2]]
[1] "apple"

[[3]]
character(0)

stringr functions

The stringr package within tidyverse contains lots of functions to help process strings. Letting x be a string variable…

str function arguments returns
str_view() x the string
str_c() …, sep, collapse a new concatenated string
str_sub() x, start, end a modified string
str_replace() x, pattern, replacement a modified string
str_replace_all() x, pattern, replacement a modified string
str_detect() x, pattern TRUE/FALSE
str_to_lower() x a modified string
str_to_upper() x a modified string
str_length() x a number
str_count() x, pattern an integer vector
str_extract() x, pattern a character vector
str_extract_all() x, pattern a list of character vectors

Use the stringr cheat sheet.

str_*() functions on non-strings?

Do the functions that were built to handle strings work if the variable is not a string?

Starbucks data

The str_*() functions work:

set.seed(47)
starbucks |> 
  sample_n(10) |> 
  select(item, calories, fat, type)
# A tibble: 10 × 4
   item                          calories   fat type    
   <chr>                            <int> <dbl> <fct>   
 1 "Morning Bun"                      350    16 bakery  
 2 "Red Velvet Whoopie Pie"           190    11 petite  
 3 "Chonga Bagel"                     310     5 bakery  
 4 "8-Grain Roll"                     350     8 bakery  
 5 "Marshmallow Dream Bar"            210     4 bakery  
 6 "Chocolate Croissant"              300    17 bakery  
 7 "Mallorca Sweet Bread"             420    25 bakery  
 8 "Ham & Swiss Panini"               360     9 sandwich
 9 "Butter Croissant "                310    18 bakery  
10 "Chocolate Creme Whoopie Pie"      190    11 petite  
set.seed(47)
starbucks |> 
  sample_n(10) |> 
  select(item, calories, fat, type) |> 
  mutate(TYPE = str_to_upper(type))
# A tibble: 10 × 5
   item                          calories   fat type     TYPE    
   <chr>                            <int> <dbl> <fct>    <chr>   
 1 "Morning Bun"                      350    16 bakery   BAKERY  
 2 "Red Velvet Whoopie Pie"           190    11 petite   PETITE  
 3 "Chonga Bagel"                     310     5 bakery   BAKERY  
 4 "8-Grain Roll"                     350     8 bakery   BAKERY  
 5 "Marshmallow Dream Bar"            210     4 bakery   BAKERY  
 6 "Chocolate Croissant"              300    17 bakery   BAKERY  
 7 "Mallorca Sweet Bread"             420    25 bakery   BAKERY  
 8 "Ham & Swiss Panini"               360     9 sandwich SANDWICH
 9 "Butter Croissant "                310    18 bakery   BAKERY  
10 "Chocolate Creme Whoopie Pie"      190    11 petite   PETITE  
set.seed(47)
starbucks |> 
  sample_n(10) |> 
  select(item, calories, fat, type) |> 
  filter(str_detect(fat, "1"))
# A tibble: 5 × 4
  item                          calories   fat type  
  <chr>                            <int> <dbl> <fct> 
1 "Morning Bun"                      350    16 bakery
2 "Red Velvet Whoopie Pie"           190    11 petite
3 "Chocolate Croissant"              300    17 bakery
4 "Butter Croissant "                310    18 bakery
5 "Chocolate Creme Whoopie Pie"      190    11 petite
set.seed(47)
starbucks |> 
  sample_n(10) |> 
  select(item, calories, fat, type) |> 
  mutate(newcal = str_replace(calories, "0", "&"))
# A tibble: 10 × 5
   item                          calories   fat type     newcal
   <chr>                            <int> <dbl> <fct>    <chr> 
 1 "Morning Bun"                      350    16 bakery   35&   
 2 "Red Velvet Whoopie Pie"           190    11 petite   19&   
 3 "Chonga Bagel"                     310     5 bakery   31&   
 4 "8-Grain Roll"                     350     8 bakery   35&   
 5 "Marshmallow Dream Bar"            210     4 bakery   21&   
 6 "Chocolate Croissant"              300    17 bakery   3&0   
 7 "Mallorca Sweet Bread"             420    25 bakery   42&   
 8 "Ham & Swiss Panini"               360     9 sandwich 36&   
 9 "Butter Croissant "                310    18 bakery   31&   
10 "Chocolate Creme Whoopie Pie"      190    11 petite   19&   

Agenda 9/24/25

  1. Factor variables, fct_*() functions
  2. Time and date objects, lubridate package

Factor variables

Factor variables are a special type of character string. The computer actually stores them as integers (?!?!!?) with a label (abbreviated as <fct> in tibbles).

  • categorical variable
  • represented in discrete levels with an ordering

Where do we order?

The ordering of the factor variable is important in:

  • plots (e.g., barplots)
  • tables (e.g., group_by())
  • modeling (e.g., the baseline level in a linear regression)

Order matters

SurveyUSA poll from 2012 on views of the DREAM Act.

What is off about the data viz part of the report?

openintro::dream
# A tibble: 910 × 2
   ideology     stance
   <fct>        <fct> 
 1 Conservative Yes   
 2 Conservative Yes   
 3 Conservative Yes   
 4 Conservative Yes   
 5 Conservative Yes   
 6 Conservative Yes   
 7 Conservative Yes   
 8 Conservative Yes   
 9 Conservative Yes   
10 Conservative Yes   
# ℹ 900 more rows
dream |> 
  ggplot(aes(x = ideology, fill = stance)) + 
  geom_bar()

The levels() function reports the levels and their order.

dream |> 
  select(ideology) |> 
  pull() |>  # because levels() works only on vectors, not data frames
  levels()
[1] "Conservative" "Liberal"      "Moderate"    

Change the order

We can fix the order of the ideology variable.

dream |> 
  mutate(ideology = fct_relevel(ideology, 
                                c("Liberal", "Moderate", "Conservative"))) |> 
  ggplot(aes(x = ideology, fill = stance)) + 
  geom_bar()

Factor and character variables

starbucks |> 
  select(item, type, calories)
# A tibble: 77 × 3
   item                          type   calories
   <chr>                         <fct>     <int>
 1 "8-Grain Roll"                bakery      350
 2 "Apple Bran Muffin"           bakery      350
 3 "Apple Fritter"               bakery      420
 4 "Banana Nut Loaf"             bakery      490
 5 "Birthday Cake Mini Doughnut" bakery      130
 6 "Blueberry Oat Bar"           bakery      370
 7 "Blueberry Scone"             bakery      460
 8 "Bountiful Blueberry Muffin"  bakery      370
 9 "Butter Croissant "           bakery      310
10 "Cheese Danish"               bakery      420
# ℹ 67 more rows

Reorder according to another variable

Lets say that we wanted to order the type of food item based on the average number of calories in that food.

starbucks |> 
  mutate(type = fct_reorder(type, calories, .fun = "mean", .desc = TRUE)) |> 
  ggplot(aes(x = type, y = calories)) + 
  geom_point() + 
  labs(x = "type of food",
       y = "",
       title = "Calories for food items at Starbucks")

starbucks |> 
  mutate(type = fct_reorder(fiber, calories, .fun = "mean", .desc = TRUE)) 
Error in `mutate()`:
ℹ In argument: `type = fct_reorder(fiber, calories, .fun = "mean", .desc
  = TRUE)`.
Caused by error in `fct_reorder()`:
! `.f` must be a factor or character vector, not an integer vector.

What really are levels?

x <- factor(c("apple", "bear", "dear", "banana", "apple", "apple", "dear"))
x
[1] apple  bear   dear   banana apple  apple  dear  
Levels: apple banana bear dear
unclass(x)
[1] 1 3 4 2 1 1 4
attr(,"levels")
[1] "apple"  "banana" "bear"   "dear"  
x <- fct_recode(x, fruit = "apple", fruit = "banana")
x
[1] fruit bear  dear  fruit fruit fruit dear 
Levels: fruit bear dear
unclass(x)
[1] 1 2 3 1 1 1 3
attr(,"levels")
[1] "fruit" "bear"  "dear" 
new_fruit <- data.frame(words = x) |> 
  mutate(x_recode = fct_recode(words, fruit = "apple", fruit = "banana"),
         x_rewrite = ifelse(words == "apple", "fruit",
                            ifelse(words == "banana", "fruit", words))) 

new_fruit
  words x_recode x_rewrite
1 fruit    fruit         1
2  bear     bear         2
3  dear     dear         3
4 fruit    fruit         1
5 fruit    fruit         1
6 fruit    fruit         1
7  dear     dear         3
new_fruit |> 
  str()
'data.frame':   7 obs. of  3 variables:
 $ words    : Factor w/ 3 levels "fruit","bear",..: 1 2 3 1 1 1 3
 $ x_recode : Factor w/ 3 levels "fruit","bear",..: 1 2 3 1 1 1 3
 $ x_rewrite: int  1 2 3 1 1 1 3
# If you make a mistake you'll get a warning
fct_recode(x, fruit = "apple", fruit = "bananana")
Warning: Unknown levels in `f`: apple, bananana
[1] fruit bear  dear  fruit fruit fruit dear 
Levels: fruit bear dear
# If you name the level NULL it will be removed
fct_recode(x, NULL = "apple", fruit = "banana")
[1] fruit bear  dear  fruit fruit fruit dear 
Levels: fruit bear dear
# Wrap the left hand side in quotes if it contains special variables
fct_recode(x, "an apple" = "apple", "a bear" = "bear")
[1] fruit  a bear dear   fruit  fruit  fruit  dear  
Levels: fruit a bear dear

Change factor to character

f <- factor(c("cat", "dog", "cat"))
f
[1] cat dog cat
Levels: cat dog
unclass(f)
[1] 1 2 1
attr(,"levels")
[1] "cat" "dog"

… produces the labels because that is how the function is defined.

as.character(f)
[1] "cat" "dog" "cat"

… produces the integer codes because that is how the function is defined.

as.numeric(f)
[1] 1 2 1

… coerces everything to the lowest common denominator. That is, it strips the labels off the factor variable, uses the integers, and converts them to character strings.

ifelse(f == "cat", "meow", f)
[1] "meow" "2"    "meow"

Change character to factor

starbucks |> 
  select(item, calories, type)
# A tibble: 77 × 3
   item                          calories type  
   <chr>                            <int> <fct> 
 1 "8-Grain Roll"                     350 bakery
 2 "Apple Bran Muffin"                350 bakery
 3 "Apple Fritter"                    420 bakery
 4 "Banana Nut Loaf"                  490 bakery
 5 "Birthday Cake Mini Doughnut"      130 bakery
 6 "Blueberry Oat Bar"                370 bakery
 7 "Blueberry Scone"                  460 bakery
 8 "Bountiful Blueberry Muffin"       370 bakery
 9 "Butter Croissant "                310 bakery
10 "Cheese Danish"                    420 bakery
# ℹ 67 more rows
starbucks |> 
  mutate(item_fac = as.factor(item)) |> 
  select(item, calories, type, item_fac)
# A tibble: 77 × 4
   item                          calories type   item_fac             
   <chr>                            <int> <fct>  <fct>                
 1 "8-Grain Roll"                     350 bakery "8-Grain Roll"       
 2 "Apple Bran Muffin"                350 bakery "Apple Bran Muffin"  
 3 "Apple Fritter"                    420 bakery "Apple Fritter"      
 4 "Banana Nut Loaf"                  490 bakery "Banana Nut Loaf"    
 5 "Birthday Cake Mini Doughnut"      130 bakery "Birthday Cake Mini …
 6 "Blueberry Oat Bar"                370 bakery "Blueberry Oat Bar"  
 7 "Blueberry Scone"                  460 bakery "Blueberry Scone"    
 8 "Bountiful Blueberry Muffin"       370 bakery "Bountiful Blueberry…
 9 "Butter Croissant "                310 bakery "Butter Croissant "  
10 "Cheese Danish"                    420 bakery "Cheese Danish"      
# ℹ 67 more rows

forcats functions

The forcats package within tidyverse contains lots of functions to help process factor variables Use the forcats cheat sheet. We’ll focus on the most common functions.

  • functions for changing the order of factor levels
    • fct_relevel() = manually reorder levels
    • fct_reorder() = reorder levels according to values of another variable
    • fct_infreq() = order levels from highest to lowest frequency
    • fct_rev() = reverse the current order
  • functions for changing the labels or values of factor levels
    • fct_recode() = manually change levels
    • fct_lump() = group together least common levels
    • fct_other() = manually replace some levels with “other”

Time and date objects

If the variable is formatted as a time or date object, you will find that there are very convenient ways to access, wrangle, and plot the information.

Types of time and date objects

There are 3 types of date/time data that refer to an instant in time:

A date. Tibbles print this as <date>.

A time within a day. Tibbles print this as <time>.

A date-time is a date plus a time: it uniquely identifies an instant in time (typically to the nearest second). Tibbles print this as <dttm>. Base R calls these POSIXct, but that doesn’t exactly flow off the tongue.

Formatting time variables

xkcd comic listing the many different ways that date objects can be written.

image credit: https://xkcd.com/1179/

What time is it?

today()
[1] "2025-11-18"
now()
[1] "2025-11-18 22:37:38 PST"

Creating dates

ymd() and friends create dates

ymd("2025-02-19")
[1] "2025-02-19"
mdy("February 19th, 2025")
[1] "2025-02-19"
dmy("19-Feb-2025")
[1] "2025-02-19"

… with times

To create a date-time, add an underscore and one or more of “h”, “m”, and “s” to the name of the parsing function:

ymd_hms("2025-02-19 11:45:59", tz = "America/Los_Angeles")
[1] "2025-02-19 11:45:59 PST"
mdy_hm("02/19/2025 15:01")  # default is UTC = GMT
[1] "2025-02-19 15:01:00 UTC"

More information about time zones in R.

lubridate

lubridate is a another R package meant for data wrangling!

In particular, lubridate makes it very easy to work with days, times, and dates. The base idea is to start with dates in a ymd (year month day) format and transform the information into whatever you want.

Examples from the lubridate vignette.

If anyone drove a time machine, they would crash

The length of months and years change so often that doing arithmetic with them can be unintuitive.

Consider a simple operation: January 31st + one month.

If anyone drove a time machine, they would crash

The length of months and years change so often that doing arithmetic with them can be unintuitive.

Consider a simple operation: January 31st + one month.

Should the answer be:

  1. February 31st (which doesn’t exist)?
  2. March 3rd (31 days after January 31)?
  3. February 28th (assuming its not a leap year)?

If anyone drove a time machine, they would crash

A basic property of arithmetic is that a + b - b = a. Only solution 1 obeys the mathematical property, but it is an invalid date. Wickham wants to make lubridate as consistent as possible by invoking the following rule: if adding or subtracting a month or a year creates an invalid date, lubridate will return an NA.

If you thought solution 2 or 3 was more useful, no problem. You can still get those results with clever arithmetic, or by using the special %m+% and %m-% operators. %m+% and %m-% automatically roll dates back to the last day of the month, should that be necessary.

basics in lubridate

library(lubridate)
rightnow <- now()
rightnow
[1] "2025-11-18 22:37:38 PST"
day(rightnow)
[1] 18
week(rightnow)
[1] 46
month(rightnow, label=FALSE)
[1] 11
month(rightnow, label=TRUE)
[1] Nov
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
year(rightnow)
[1] 2025

basics in lubridate

minute(rightnow)
[1] 37
hour(rightnow)
[1] 22
yday(rightnow)
[1] 322
mday(rightnow)
[1] 18
wday(rightnow, label=FALSE)
[1] 3
wday(rightnow, label=TRUE)
[1] Tue
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat

Working with a date object

jan31 <- ymd("2025-01-31")
jan31 + months(0:11)
 [1] "2025-01-31" NA           "2025-03-31" NA           "2025-05-31"
 [6] NA           "2025-07-31" "2025-08-31" NA           "2025-10-31"
[11] NA           "2025-12-31"
floor_date(jan31, "month")
[1] "2025-01-01"
floor_date(jan31, "month") + months(0:11) + days(31)
 [1] "2025-02-01" "2025-03-04" "2025-04-01" "2025-05-02" "2025-06-01"
 [6] "2025-07-02" "2025-08-01" "2025-09-01" "2025-10-02" "2025-11-01"
[11] "2025-12-02" "2026-01-01"
jan31 + months(0:11) + days(31)
 [1] "2025-03-03" NA           "2025-05-01" NA           "2025-07-01"
 [6] NA           "2025-08-31" "2025-10-01" NA           "2025-12-01"
[11] NA           "2026-01-31"
jan31 %m+% months(0:11)
 [1] "2025-01-31" "2025-02-28" "2025-03-31" "2025-04-30" "2025-05-31"
 [6] "2025-06-30" "2025-07-31" "2025-08-31" "2025-09-30" "2025-10-31"
[11] "2025-11-30" "2025-12-31"

NYC flights

library(nycflights13)
names(flights)
 [1] "year"           "month"          "day"            "dep_time"      
 [5] "sched_dep_time" "dep_delay"      "arr_time"       "sched_arr_time"
 [9] "arr_delay"      "carrier"        "flight"         "tailnum"       
[13] "origin"         "dest"           "air_time"       "distance"      
[17] "hour"           "minute"         "time_hour"     

NYC flights

Creating a date object from variables.

flightsWK <- flights |>  
   mutate(ymdday = ymd(str_c(year, month, day, sep="-"))) |> 
   mutate(weekdy = wday(ymdday, label=TRUE), 
          whichweek = week(ymdday)) 

flightsWK |>  select(year, month, day, ymdday, weekdy, whichweek, 
                     dep_time, arr_time) 
# A tibble: 336,776 × 8
    year month   day ymdday     weekdy whichweek dep_time arr_time
   <int> <int> <int> <date>     <ord>      <dbl>    <int>    <int>
 1  2013     1     1 2013-01-01 Tue            1      517      830
 2  2013     1     1 2013-01-01 Tue            1      533      850
 3  2013     1     1 2013-01-01 Tue            1      542      923
 4  2013     1     1 2013-01-01 Tue            1      544     1004
 5  2013     1     1 2013-01-01 Tue            1      554      812
 6  2013     1     1 2013-01-01 Tue            1      554      740
 7  2013     1     1 2013-01-01 Tue            1      555      913
 8  2013     1     1 2013-01-01 Tue            1      557      709
 9  2013     1     1 2013-01-01 Tue            1      557      838
10  2013     1     1 2013-01-01 Tue            1      558      753
# ℹ 336,766 more rows