Variable Types

September 22 + 24, 2025

Jo Hardin

Variable Types

Some variable types:

character strings
factor variables
dates

numeric (decimal)
integer
logical (Boolean)

A variable’s type determines the values that the variable can take on and the operations that can be performed on it. Specifying variable types ensures the data’s integrity and increases performance.

Agenda 9/22/25

Character strings
str_*() functions

Character strings

When working with character strings, we might want to detect, replace, or extract certain patterns.

Strings are objects of the character class (abbreviated as <chr> in tibbles). When you print out strings, they display with double quotes:

some_string <- "banana"
some_string

[1] "banana"

Creating strings

Creating strings by hand is useful for testing out regular expressions.

To create a string, type any text in either double quotes " or single quotes '. Using double or single quotes doesn’t matter unless your string itself has single or double quotes.

string1 <- "This is a string"
string2 <- 'If I want to include a "quote" inside a string, I use single quotes'

string1

[1] "This is a string"

string2

[1] "If I want to include a \"quote\" inside a string, I use single quotes"

`str_view()`

We can view these strings “naturally” (without the opening and closing quotes) with str_view():

str_view(string1)

[1] │ This is a string

str_view(string2)

[1] │ If I want to include a "quote" inside a string, I use single quotes

Working with strings

yes, lots of code to learn
no, the code doesn’t really matter
learning goal for today:

what can you even do with string variables???

`str_c`

Similar to paste() (gluing strings together), but works well in a tidy pipeline.

df <- tibble(name = c("Flora", "David", "Terra", NA))
df |> mutate(greeting = str_c("Hi ", name, "!"))

# A tibble: 4 × 2
  name  greeting 
  <chr> <chr>    
1 Flora Hi Flora!
2 David Hi David!
3 Terra Hi Terra!
4 <NA>  <NA>

`str_sub()`

str_sub(string, start, end) will extract parts of a string where start and end are the positions where the substring starts and ends.

fruits <- c("Apple", "Banana", "Pear")
str_sub(fruits, 1, 3)

[1] "App" "Ban" "Pea"

str_sub(fruits, -3, -1)

[1] "ple" "ana" "ear"

Won’t fail if the string is too short.

str_sub(fruits, 1, 5)

[1] "Apple" "Banan" "Pear"

`str_sub()` in a pipeline

We can use the str_*() functions inside the mutate() function.

titanic |> 
  mutate(class1 = str_sub(Class, 1, 1))

   Class    Sex   Age Survived Freq class1
1    1st   Male Child       No    0      1
2    2nd   Male Child       No    0      2
3    3rd   Male Child       No   35      3
4   Crew   Male Child       No    0      C
5    1st Female Child       No    0      1
6    2nd Female Child       No    0      2
7    3rd Female Child       No   17      3
8   Crew Female Child       No    0      C
9    1st   Male Adult       No  118      1
10   2nd   Male Adult       No  154      2
11   3rd   Male Adult       No  387      3
12  Crew   Male Adult       No  670      C
13   1st Female Adult       No    4      1
14   2nd Female Adult       No   13      2
15   3rd Female Adult       No   89      3
16  Crew Female Adult       No    3      C
17   1st   Male Child      Yes    5      1
18   2nd   Male Child      Yes   11      2
19   3rd   Male Child      Yes   13      3
20  Crew   Male Child      Yes    0      C
21   1st Female Child      Yes    1      1
22   2nd Female Child      Yes   13      2
23   3rd Female Child      Yes   14      3
24  Crew Female Child      Yes    0      C
25   1st   Male Adult      Yes   57      1
26   2nd   Male Adult      Yes   14      2
27   3rd   Male Adult      Yes   75      3
28  Crew   Male Adult      Yes  192      C
29   1st Female Adult      Yes  140      1
30   2nd Female Adult      Yes   80      2
31   3rd Female Adult      Yes   76      3
32  Crew Female Adult      Yes   20      C

`str_replace*()`

str_replace() replaces the first match of a pattern. str_replace_all() replaces all the matches of a pattern.

fruits

[1] "Apple"  "Banana" "Pear"

str_replace(fruits, "a", "x")

[1] "Apple"  "Bxnana" "Pexr"

str_replace_all(fruits, "a", "x")

[1] "Apple"  "Bxnxnx" "Pexr"

`str_detect()`

fruits

[1] "Apple"  "Banana" "Pear"

str_detect(fruits, "a")

[1] FALSE  TRUE  TRUE

str_detect(fruits, "A")

[1]  TRUE FALSE FALSE

`str_detect()` in pipeline

str_detect() used in a filter() pipeline.

original data
unnested data
filtered unnested data

starwars |> 
  select(name, films)

# A tibble: 87 × 2
   name               films    
   <chr>              <list>   
 1 Luke Skywalker     <chr [5]>
 2 C-3PO              <chr [6]>
 3 R2-D2              <chr [7]>
 4 Darth Vader        <chr [4]>
 5 Leia Organa        <chr [5]>
 6 Owen Lars          <chr [3]>
 7 Beru Whitesun Lars <chr [3]>
 8 R5-D4              <chr [1]>
 9 Biggs Darklighter  <chr [1]>
10 Obi-Wan Kenobi     <chr [6]>
# ℹ 77 more rows

starwars |> 
  select(name, films) |> 
  unnest_wider(films, names_sep = "")

# A tibble: 87 × 8
   name               films1 films2 films3 films4 films5 films6 films7
   <chr>              <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr> 
 1 Luke Skywalker     A New… The E… Retur… Reven… The F… <NA>   <NA>  
 2 C-3PO              A New… The E… Retur… The P… Attac… Reven… <NA>  
 3 R2-D2              A New… The E… Retur… The P… Attac… Reven… The F…
 4 Darth Vader        A New… The E… Retur… Reven… <NA>   <NA>   <NA>  
 5 Leia Organa        A New… The E… Retur… Reven… The F… <NA>   <NA>  
 6 Owen Lars          A New… Attac… Reven… <NA>   <NA>   <NA>   <NA>  
 7 Beru Whitesun Lars A New… Attac… Reven… <NA>   <NA>   <NA>   <NA>  
 8 R5-D4              A New… <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
 9 Biggs Darklighter  A New… <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
10 Obi-Wan Kenobi     A New… The E… Retur… The P… Attac… Reven… <NA>  
# ℹ 77 more rows

starwars |> 
  filter(str_detect(films, "Empire")) |> 
  select(name, films) |> 
  unnest_wider(films, names_sep = "")

# A tibble: 16 × 8
   name             films1   films2 films3 films4 films5 films6 films7
   <chr>            <chr>    <chr>  <chr>  <chr>  <chr>  <chr>  <chr> 
 1 Luke Skywalker   A New H… The E… Retur… Reven… The F… <NA>   <NA>  
 2 C-3PO            A New H… The E… Retur… The P… Attac… Reven… <NA>  
 3 R2-D2            A New H… The E… Retur… The P… Attac… Reven… The F…
 4 Darth Vader      A New H… The E… Retur… Reven… <NA>   <NA>   <NA>  
 5 Leia Organa      A New H… The E… Retur… Reven… The F… <NA>   <NA>  
 6 Obi-Wan Kenobi   A New H… The E… Retur… The P… Attac… Reven… <NA>  
 7 Chewbacca        A New H… The E… Retur… Reven… The F… <NA>   <NA>  
 8 Han Solo         A New H… The E… Retur… The F… <NA>   <NA>   <NA>  
 9 Wedge Antilles   A New H… The E… Retur… <NA>   <NA>   <NA>   <NA>  
10 Yoda             The Emp… Retur… The P… Attac… Reven… <NA>   <NA>  
11 Palpatine        The Emp… Retur… The P… Attac… Reven… <NA>   <NA>  
12 Boba Fett        The Emp… Retur… Attac… <NA>   <NA>   <NA>   <NA>  
13 IG-88            The Emp… <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
14 Bossk            The Emp… <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
15 Lando Calrissian The Emp… Retur… <NA>   <NA>   <NA>   <NA>   <NA>  
16 Lobot            The Emp… <NA>   <NA>   <NA>   <NA>   <NA>   <NA>

`str_length()`

Returns the number of pieces in a string (usually a single letter, number, or space).

x <- c("apple", "banana", "cherry pie", "")

str_length(x)

[1]  5  6 10  0

`str_count()`

Counts the number of times pattern is found within each element of string.

x <- c("apple pie apple tart", "banana apple", "cherry")

str_count(x, "apple")

[1] 2 1 0

`str_extract*()`

str_extract() extracts the first complete match from each string,

str_extract_all() extracts all matches from each string.

x <- c("apple pie apple tart", "banana apple", "cherry")


str_extract(x, "a")

[1] "a" "a" NA

str_extract_all(x, "a")

[[1]]
[1] "a" "a" "a"

[[2]]
[1] "a" "a" "a" "a"

[[3]]
character(0)

str_extract(x, "apple")

[1] "apple" "apple" NA

str_extract_all(x, "apple")

[[1]]
[1] "apple" "apple"

[[2]]
[1] "apple"

[[3]]
character(0)

stringr functions

The stringr package within tidyverse contains lots of functions to help process strings. Letting x be a string variable…

str function	arguments	returns
`str_view()`	`x`	the string
`str_c()`	…, `sep`, `collapse`	a new concatenated string
`str_sub()`	`x`, `start`, `end`	a modified string
`str_replace()`	`x`, `pattern`, `replacement`	a modified string
`str_replace_all()`	`x`, `pattern`, `replacement`	a modified string
`str_detect()`	`x`, `pattern`	TRUE/FALSE
`str_to_lower()`	`x`	a modified string
`str_to_upper()`	`x`	a modified string
`str_length()`	`x`	a number
`str_count()`	`x`, `pattern`	an integer vector
`str_extract()`	`x`, `pattern`	a character vector
`str_extract_all()`	`x`, `pattern`	a list of character vectors

Use the stringr cheat sheet.

`str_*()` functions on non-strings?

Do the functions that were built to handle strings work if the variable is not a string?

Starbucks data

The str_*() functions work:

data
on factor
on numeric
on integers

set.seed(47)
starbucks |> 
  sample_n(10) |> 
  select(item, calories, fat, type)

# A tibble: 10 × 4
   item                          calories   fat type    
   <chr>                            <int> <dbl> <fct>   
 1 "Morning Bun"                      350    16 bakery  
 2 "Red Velvet Whoopie Pie"           190    11 petite  
 3 "Chonga Bagel"                     310     5 bakery  
 4 "8-Grain Roll"                     350     8 bakery  
 5 "Marshmallow Dream Bar"            210     4 bakery  
 6 "Chocolate Croissant"              300    17 bakery  
 7 "Mallorca Sweet Bread"             420    25 bakery  
 8 "Ham & Swiss Panini"               360     9 sandwich
 9 "Butter Croissant "                310    18 bakery  
10 "Chocolate Creme Whoopie Pie"      190    11 petite

set.seed(47)
starbucks |> 
  sample_n(10) |> 
  select(item, calories, fat, type) |> 
  mutate(TYPE = str_to_upper(type))

# A tibble: 10 × 5
   item                          calories   fat type     TYPE    
   <chr>                            <int> <dbl> <fct>    <chr>   
 1 "Morning Bun"                      350    16 bakery   BAKERY  
 2 "Red Velvet Whoopie Pie"           190    11 petite   PETITE  
 3 "Chonga Bagel"                     310     5 bakery   BAKERY  
 4 "8-Grain Roll"                     350     8 bakery   BAKERY  
 5 "Marshmallow Dream Bar"            210     4 bakery   BAKERY  
 6 "Chocolate Croissant"              300    17 bakery   BAKERY  
 7 "Mallorca Sweet Bread"             420    25 bakery   BAKERY  
 8 "Ham & Swiss Panini"               360     9 sandwich SANDWICH
 9 "Butter Croissant "                310    18 bakery   BAKERY  
10 "Chocolate Creme Whoopie Pie"      190    11 petite   PETITE

set.seed(47)
starbucks |> 
  sample_n(10) |> 
  select(item, calories, fat, type) |> 
  filter(str_detect(fat, "1"))

# A tibble: 5 × 4
  item                          calories   fat type  
  <chr>                            <int> <dbl> <fct> 
1 "Morning Bun"                      350    16 bakery
2 "Red Velvet Whoopie Pie"           190    11 petite
3 "Chocolate Croissant"              300    17 bakery
4 "Butter Croissant "                310    18 bakery
5 "Chocolate Creme Whoopie Pie"      190    11 petite

set.seed(47)
starbucks |> 
  sample_n(10) |> 
  select(item, calories, fat, type) |> 
  mutate(newcal = str_replace(calories, "0", "&"))

# A tibble: 10 × 5
   item                          calories   fat type     newcal
   <chr>                            <int> <dbl> <fct>    <chr> 
 1 "Morning Bun"                      350    16 bakery   35&   
 2 "Red Velvet Whoopie Pie"           190    11 petite   19&   
 3 "Chonga Bagel"                     310     5 bakery   31&   
 4 "8-Grain Roll"                     350     8 bakery   35&   
 5 "Marshmallow Dream Bar"            210     4 bakery   21&   
 6 "Chocolate Croissant"              300    17 bakery   3&0   
 7 "Mallorca Sweet Bread"             420    25 bakery   42&   
 8 "Ham & Swiss Panini"               360     9 sandwich 36&   
 9 "Butter Croissant "                310    18 bakery   31&   
10 "Chocolate Creme Whoopie Pie"      190    11 petite   19&

Agenda 9/24/25

Factor variables, fct_*() functions
Time and date objects, lubridate package

Factor variables

Factor variables are a special type of character string. The computer actually stores them as integers (?!?!!?) with a label (abbreviated as <fct> in tibbles).

categorical variable
represented in discrete levels with an ordering

Where do we order?

The ordering of the factor variable is important in:

plots (e.g., barplots)
tables (e.g., group_by())
modeling (e.g., the baseline level in a linear regression)

Order matters

SurveyUSA poll from 2012 on views of the DREAM Act.

What is off about the data viz part of the report?

Data
Plot
Levels

openintro::dream

# A tibble: 910 × 2
   ideology     stance
   <fct>        <fct> 
 1 Conservative Yes   
 2 Conservative Yes   
 3 Conservative Yes   
 4 Conservative Yes   
 5 Conservative Yes   
 6 Conservative Yes   
 7 Conservative Yes   
 8 Conservative Yes   
 9 Conservative Yes   
10 Conservative Yes   
# ℹ 900 more rows

dream |> 
  ggplot(aes(x = ideology, fill = stance)) + 
  geom_bar()

The levels() function reports the levels and their order.

dream |> 
  select(ideology) |> 
  pull() |>  # because levels() works only on vectors, not data frames
  levels()

[1] "Conservative" "Liberal"      "Moderate"

Change the order

We can fix the order of the ideology variable.

Code
Plot

dream |> 
  mutate(ideology = fct_relevel(ideology, 
                                c("Liberal", "Moderate", "Conservative"))) |> 
  ggplot(aes(x = ideology, fill = stance)) + 
  geom_bar()

starbucks |> 
  select(item, type, calories)

# A tibble: 77 × 3
   item                          type   calories
   <chr>                         <fct>     <int>
 1 "8-Grain Roll"                bakery      350
 2 "Apple Bran Muffin"           bakery      350
 3 "Apple Fritter"               bakery      420
 4 "Banana Nut Loaf"             bakery      490
 5 "Birthday Cake Mini Doughnut" bakery      130
 6 "Blueberry Oat Bar"           bakery      370
 7 "Blueberry Scone"             bakery      460
 8 "Bountiful Blueberry Muffin"  bakery      370
 9 "Butter Croissant "           bakery      310
10 "Cheese Danish"               bakery      420
# ℹ 67 more rows

Reorder according to another variable

Lets say that we wanted to order the type of food item based on the average number of calories in that food.

Code
Plot
For funzies

starbucks |> 
  mutate(type = fct_reorder(type, calories, .fun = "mean", .desc = TRUE)) |> 
  ggplot(aes(x = type, y = calories)) + 
  geom_point() + 
  labs(x = "type of food",
       y = "",
       title = "Calories for food items at Starbucks")

starbucks |> 
  mutate(type = fct_reorder(fiber, calories, .fun = "mean", .desc = TRUE))

Error in `mutate()`:
ℹ In argument: `type = fct_reorder(fiber, calories, .fun = "mean", .desc
  = TRUE)`.
Caused by error in `fct_reorder()`:
! `.f` must be a factor or character vector, not an integer vector.

What really are levels?

recode
rewrite
mistake
NULL
special variables

x <- factor(c("apple", "bear", "dear", "banana", "apple", "apple", "dear"))
x

[1] apple  bear   dear   banana apple  apple  dear  
Levels: apple banana bear dear

unclass(x)

[1] 1 3 4 2 1 1 4
attr(,"levels")
[1] "apple"  "banana" "bear"   "dear"

x <- fct_recode(x, fruit = "apple", fruit = "banana")
x

[1] fruit bear  dear  fruit fruit fruit dear 
Levels: fruit bear dear

unclass(x)

[1] 1 2 3 1 1 1 3
attr(,"levels")
[1] "fruit" "bear"  "dear"

new_fruit <- data.frame(words = x) |> 
  mutate(x_recode = fct_recode(words, fruit = "apple", fruit = "banana"),
         x_rewrite = ifelse(words == "apple", "fruit",
                            ifelse(words == "banana", "fruit", words))) 

new_fruit

  words x_recode x_rewrite
1 fruit    fruit         1
2  bear     bear         2
3  dear     dear         3
4 fruit    fruit         1
5 fruit    fruit         1
6 fruit    fruit         1
7  dear     dear         3

new_fruit |> 
  str()

'data.frame':   7 obs. of  3 variables:
 $ words    : Factor w/ 3 levels "fruit","bear",..: 1 2 3 1 1 1 3
 $ x_recode : Factor w/ 3 levels "fruit","bear",..: 1 2 3 1 1 1 3
 $ x_rewrite: int  1 2 3 1 1 1 3

# If you make a mistake you'll get a warning
fct_recode(x, fruit = "apple", fruit = "bananana")

Warning: Unknown levels in `f`: apple, bananana

[1] fruit bear  dear  fruit fruit fruit dear 
Levels: fruit bear dear

# If you name the level NULL it will be removed
fct_recode(x, NULL = "apple", fruit = "banana")

[1] fruit bear  dear  fruit fruit fruit dear 
Levels: fruit bear dear

# Wrap the left hand side in quotes if it contains special variables
fct_recode(x, "an apple" = "apple", "a bear" = "bear")

[1] fruit  a bear dear   fruit  fruit  fruit  dear  
Levels: fruit a bear dear

Change factor to character

data
as.character()
as.numeric()
ifelse()

f <- factor(c("cat", "dog", "cat"))
f

[1] cat dog cat
Levels: cat dog

unclass(f)

[1] 1 2 1
attr(,"levels")
[1] "cat" "dog"

… produces the labels because that is how the function is defined.

as.character(f)

[1] "cat" "dog" "cat"

… produces the integer codes because that is how the function is defined.

as.numeric(f)

[1] 1 2 1

… coerces everything to the lowest common denominator. That is, it strips the labels off the factor variable, uses the integers, and converts them to character strings.

ifelse(f == "cat", "meow", f)

[1] "meow" "2"    "meow"

Change character to factor

OG data
New factor

starbucks |> 
  select(item, calories, type)

# A tibble: 77 × 3
   item                          calories type  
   <chr>                            <int> <fct> 
 1 "8-Grain Roll"                     350 bakery
 2 "Apple Bran Muffin"                350 bakery
 3 "Apple Fritter"                    420 bakery
 4 "Banana Nut Loaf"                  490 bakery
 5 "Birthday Cake Mini Doughnut"      130 bakery
 6 "Blueberry Oat Bar"                370 bakery
 7 "Blueberry Scone"                  460 bakery
 8 "Bountiful Blueberry Muffin"       370 bakery
 9 "Butter Croissant "                310 bakery
10 "Cheese Danish"                    420 bakery
# ℹ 67 more rows

starbucks |> 
  mutate(item_fac = as.factor(item)) |> 
  select(item, calories, type, item_fac)

# A tibble: 77 × 4
   item                          calories type   item_fac             
   <chr>                            <int> <fct>  <fct>                
 1 "8-Grain Roll"                     350 bakery "8-Grain Roll"       
 2 "Apple Bran Muffin"                350 bakery "Apple Bran Muffin"  
 3 "Apple Fritter"                    420 bakery "Apple Fritter"      
 4 "Banana Nut Loaf"                  490 bakery "Banana Nut Loaf"    
 5 "Birthday Cake Mini Doughnut"      130 bakery "Birthday Cake Mini …
 6 "Blueberry Oat Bar"                370 bakery "Blueberry Oat Bar"  
 7 "Blueberry Scone"                  460 bakery "Blueberry Scone"    
 8 "Bountiful Blueberry Muffin"       370 bakery "Bountiful Blueberry…
 9 "Butter Croissant "                310 bakery "Butter Croissant "  
10 "Cheese Danish"                    420 bakery "Cheese Danish"      
# ℹ 67 more rows

forcats functions

The forcats package within tidyverse contains lots of functions to help process factor variables Use the forcats cheat sheet. We’ll focus on the most common functions.

functions for changing the order of factor levels
- fct_relevel() = manually reorder levels
- fct_reorder() = reorder levels according to values of another variable
- fct_infreq() = order levels from highest to lowest frequency
- fct_rev() = reverse the current order
functions for changing the labels or values of factor levels
- fct_recode() = manually change levels
- fct_lump() = group together least common levels
- fct_other() = manually replace some levels with “other”

Time and date objects

If the variable is formatted as a time or date object, you will find that there are very convenient ways to access, wrangle, and plot the information.

Types of time and date objects

There are 3 types of date/time data that refer to an instant in time:

A date. Tibbles print this as <date>.

A time within a day. Tibbles print this as <time>.

A date-time is a date plus a time: it uniquely identifies an instant in time (typically to the nearest second). Tibbles print this as <dttm>. Base R calls these POSIXct, but that doesn’t exactly flow off the tongue.

Formatting time variables

xkcd comic listing the many different ways that date objects can be written.

image credit: https://xkcd.com/1179/

What time is it?

today()

[1] "2025-11-18"

now()

[1] "2025-11-18 22:37:38 PST"

Creating dates

ymd() and friends create dates

ymd("2025-02-19")

[1] "2025-02-19"

mdy("February 19th, 2025")

[1] "2025-02-19"

dmy("19-Feb-2025")

[1] "2025-02-19"

… with times

To create a date-time, add an underscore and one or more of “h”, “m”, and “s” to the name of the parsing function:

ymd_hms("2025-02-19 11:45:59", tz = "America/Los_Angeles")

[1] "2025-02-19 11:45:59 PST"

mdy_hm("02/19/2025 15:01")  # default is UTC = GMT

[1] "2025-02-19 15:01:00 UTC"

More information about time zones in R.

lubridate

lubridate is a another R package meant for data wrangling!

In particular, lubridate makes it very easy to work with days, times, and dates. The base idea is to start with dates in a ymd (year month day) format and transform the information into whatever you want.

Examples from the lubridate vignette.

If anyone drove a time machine, they would crash

The length of months and years change so often that doing arithmetic with them can be unintuitive.

Consider a simple operation: January 31st + one month.

If anyone drove a time machine, they would crash

The length of months and years change so often that doing arithmetic with them can be unintuitive.

Consider a simple operation: January 31st + one month.

Should the answer be:

February 31st (which doesn’t exist)?
March 3rd (31 days after January 31)?
February 28th (assuming its not a leap year)?

If anyone drove a time machine, they would crash

A basic property of arithmetic is that a + b - b = a. Only solution 1 obeys the mathematical property, but it is an invalid date. Wickham wants to make lubridate as consistent as possible by invoking the following rule: if adding or subtracting a month or a year creates an invalid date, lubridate will return an NA.

If you thought solution 2 or 3 was more useful, no problem. You can still get those results with clever arithmetic, or by using the special %m+% and %m-% operators. %m+% and %m-% automatically roll dates back to the last day of the month, should that be necessary.

basics in `lubridate`

library(lubridate)
rightnow <- now()
rightnow

[1] "2025-11-18 22:37:38 PST"

day(rightnow)

[1] 18

week(rightnow)

[1] 46

month(rightnow, label=FALSE)

[1] 11

month(rightnow, label=TRUE)

[1] Nov
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec

year(rightnow)

[1] 2025

basics in `lubridate`

minute(rightnow)

[1] 37

hour(rightnow)

[1] 22

yday(rightnow)

[1] 322

mday(rightnow)

[1] 18

wday(rightnow, label=FALSE)

[1] 3

wday(rightnow, label=TRUE)

[1] Tue
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat

Working with a date object

jan31 <- ymd("2025-01-31")
jan31 + months(0:11)

 [1] "2025-01-31" NA           "2025-03-31" NA           "2025-05-31"
 [6] NA           "2025-07-31" "2025-08-31" NA           "2025-10-31"
[11] NA           "2025-12-31"

floor_date(jan31, "month")

[1] "2025-01-01"

floor_date(jan31, "month") + months(0:11) + days(31)

 [1] "2025-02-01" "2025-03-04" "2025-04-01" "2025-05-02" "2025-06-01"
 [6] "2025-07-02" "2025-08-01" "2025-09-01" "2025-10-02" "2025-11-01"
[11] "2025-12-02" "2026-01-01"

jan31 + months(0:11) + days(31)

 [1] "2025-03-03" NA           "2025-05-01" NA           "2025-07-01"
 [6] NA           "2025-08-31" "2025-10-01" NA           "2025-12-01"
[11] NA           "2026-01-31"

jan31 %m+% months(0:11)

 [1] "2025-01-31" "2025-02-28" "2025-03-31" "2025-04-30" "2025-05-31"
 [6] "2025-06-30" "2025-07-31" "2025-08-31" "2025-09-30" "2025-10-31"
[11] "2025-11-30" "2025-12-31"

NYC flights

library(nycflights13)
names(flights)

 [1] "year"           "month"          "day"            "dep_time"      
 [5] "sched_dep_time" "dep_delay"      "arr_time"       "sched_arr_time"
 [9] "arr_delay"      "carrier"        "flight"         "tailnum"       
[13] "origin"         "dest"           "air_time"       "distance"      
[17] "hour"           "minute"         "time_hour"

NYC flights

Creating a date object from variables.

flightsWK <- flights |>  
   mutate(ymdday = ymd(str_c(year, month, day, sep="-"))) |> 
   mutate(weekdy = wday(ymdday, label=TRUE), 
          whichweek = week(ymdday)) 

flightsWK |>  select(year, month, day, ymdday, weekdy, whichweek, 
                     dep_time, arr_time)

# A tibble: 336,776 × 8
    year month   day ymdday     weekdy whichweek dep_time arr_time
   <int> <int> <int> <date>     <ord>      <dbl>    <int>    <int>
 1  2013     1     1 2013-01-01 Tue            1      517      830
 2  2013     1     1 2013-01-01 Tue            1      533      850
 3  2013     1     1 2013-01-01 Tue            1      542      923
 4  2013     1     1 2013-01-01 Tue            1      544     1004
 5  2013     1     1 2013-01-01 Tue            1      554      812
 6  2013     1     1 2013-01-01 Tue            1      554      740
 7  2013     1     1 2013-01-01 Tue            1      555      913
 8  2013     1     1 2013-01-01 Tue            1      557      709
 9  2013     1     1 2013-01-01 Tue            1      557      838
10  2013     1     1 2013-01-01 Tue            1      558      753
# ℹ 336,766 more rows

Variable Types

Variable Types

Agenda 9/22/25

Character strings

Creating strings

str_view()

Working with strings

str_c

str_sub()

str_sub() in a pipeline

str_replace*()

str_detect()

str_detect() in pipeline

str_length()

str_count()

str_extract*()

stringr functions

str_*() functions on non-strings?

Starbucks data

Agenda 9/24/25

Factor variables

Where do we order?

Order matters

Change the order

Factor and character variables

Reorder according to another variable

What really are levels?

Change factor to character

Change character to factor

forcats functions

Time and date objects

Types of time and date objects

Formatting time variables

What time is it?

Creating dates

… with times

lubridate

If anyone drove a time machine, they would crash

If anyone drove a time machine, they would crash

If anyone drove a time machine, they would crash

basics in lubridate

basics in lubridate

Working with a date object

NYC flights

NYC flights

`str_view()`

`str_c`

`str_sub()`

`str_sub()` in a pipeline

`str_replace*()`

`str_detect()`

`str_detect()` in pipeline

`str_length()`

`str_count()`

`str_extract*()`

`str_*()` functions on non-strings?

basics in `lubridate`

basics in `lubridate`