Better Data Visualizations

January 27 + 29, 2025

Author

Jo Hardin

Agenda 1/27/25

GitHub
NSSD
grammar of graphics

Important

Before Wednesday, read: Tufte. 1997. Visual and Statistical Thinking: Displays of Evidence for Making Decisions. (Use Google to find it.)

NSSD:

What was Hilary trying to answer in her data collection?
Name two of Hilary’s main hurdles in gathering accurate data.
Which is better: high touch (manual) or low touch (automatic) data collection? Why?
What additional covariates are needed / desired? Any problems with them?
How much data does she need?
Are there any ethical considerations to think about?

Data Visualization

Based on https://www.effectivedatastorytelling.com/post/a-deeper-dive-into-lego-bricks-and-data-stories, original source: https://www.linkedin.com/learning/instructors/bill-shander

Graphics

Grammar of graphics

Yau (2013) gives us nine visual cues, and Wickham (2014) translates them into a language using ggplot2.

Visual Cues: the aspects of the figure where we should focus.
Position (numerical) where in relation to other things?
Length (numerical) how big (in one dimension)?
Angle (numerical) how wide? parallel to something else?
Direction (numerical) at what slope? In a time series, going up or down?
Shape (categorical) belonging to what group?
Area (numerical) how big (in two dimensions)? Beware of improper scaling!
Volume (numerical) how big (in three dimensions)? Beware of improper scaling!
Shade (either) to what extent? how severely?
Color (either) to what extent? how severely? Beware of red/green color blindness.
Coordinate System: rectangular, polar, geographic, etc.
Scale: numeric (linear? logarithmic?), categorical (ordered?), time
Context: in comparison to what (think back to ideas from Tufte)

Pieces of the Graph

Visual Cues of Yau (2013):
Position (numerical)
Length (numerical)
Angle (numerical)
Direction (numerical)
Shape (categorical)
Area (numerical)
Volume (numerical)
Shade (either)
Color (either)

Order Matters

Cues Together

Attributes

Attributes can focus your reader’s attention.¹

Agenda 1/29/25

thoughts on plotting
Tufte
ggplot

Advice for Plotting

Basic plotting

Avoid having other graph elements interfere with data
Use visually prominent symbols
Avoid over-plotting (One way to avoid over plotting: jitter the values)
Different values of data may obscure each other
Include all or nearly all of the data
Fill data region

Advice for Plotting

Basic plotting
Eliminate superfluous material

Chart junk & stuff that adds no meaning, e.g. butterflies on top of barplots, background images
Extra tick marks and grid lines
Unnecessary text and arrows
Decimal places beyond the measurement error or the level of difference

Advice for Plotting

Basic plotting
Eliminate superfluous material
Facilitate comparisons

Put juxtaposed plots on same scale
Make it easy to distinguish elements of superposed plots (e.g. color)
Emphasizes the important difference
Comparison: volume, area, height (be careful, volume can seem bigger than you mean it to)

Advice for Plotting

Basic plotting
Eliminate superfluous material
Facilitate comparisons
Choosing the scale

Keep scales on x and y axes the same for both plots to facilitate the comparison
Zoom in to focus on the region that contains the bulk of the data
Keep the scale the same throughout the plot (i.e. don’t change it mid-axis)
Origin need not be on the scale
Choose a scale that improves resolution
Avoid jiggling the baseline

Advice for Plotting

Basic plotting
Eliminate superfluous material
Facilitate comparisons
Choosing the scale
How to make a plot information rich

Describe what you see in the caption
Add context with reference markers (lines and points) including text
Add legends and labels
Use color and plotting symbols to add more information
Plot the same thing more than once in different ways/scales
Reduce clutter

Advice for Plotting

Basic plotting
Eliminate superfluous material
Facilitate comparisons
Choosing the scale
How to make a plot information rich
Captions should

Be comprehensive
Self-contained
Describe what has been graphed
Draw attention to important features
Describe conclusions drawn from graph

Advice for Plotting

Basic plotting
Eliminate superfluous material
Facilitate comparisons
Choosing the scale
How to make a plot information rich
Captions should
Good Plot Making Practice

Put major conclusions in graphical form
Provide reference information
Proof read for clarity and consistency
Graphing is an iterative process
Multiplicity is OK, i.e. two plots of the same variable may provide different messages
Make plots data rich

Examples in the wild

Tufte – Cholera & Challenger
Fonts
NYT often does data viz quite well
W.E.B Du Bois

Preliminaries

Make the data stand out
Facilitate comparison
Add information

(Nolan & Perrrett, 2016)

Preliminaries

Tufte lists two main motivational steps to working with graphics as part of an argument.

“An essential analytic task in making decisions based on evidence is to understand how things work.”
Making decisions based on evidence requires the appropriate display of that evidence.”

Tufte

Tufte (1997) Visual and Statistical Thinking: Displays of Evidence for Making Decisions. (Use Google to find it.)

Cholera - a picture tells 1000 words

How many aspects of this graph can you point out which are relevant to figuring out that cholera infection was coming from a single pump? Are there any distracting aspects?

Cholera - difficult to interpret

Why would the outbreak already have begun to decline before the pump handle was removed?

Challenger - Problematic

One of the graphics which was particularly unconvincing in trying to explain that O-rings fail in the cold.

Challenger - Better????

A different graph of the Challenger information, now sorted by temperature

Challenger - Improved

The graphic the engineers should have led with in trying to persuade the administrators not to launch. It is evident that the number of O-ring failures is quite highly associated with the ambient temperature. Note the *vital* information on the x-axis associated with the large number of launches at warm temperatures that had *zero* O-ring failures.

Note that the “improved” Challenger graphic was made by Tufte, not by the engineers working on the problem at the time.

Fonts matter

image credit: Will Chase RStudio::conf 2020

Advice on plotting, specific

Avoid having other graph elements interfere with data
Use visually prominent symbols
Avoid over-plotting (One way to avoid over plotting: jitter the values)
Different values of data may obscure each other
Include all or nearly all of the data
Fill data region

Advice on plotting, general

Eliminate superfluous material
Facilitate comparisons
Choose the best scale
Make the plot data / information rich
Use good captions, alt text, conclusions

Simplify

Simplified

The before and after images with the process of simplifying a barplot.

image credit: https://www.darkhorseanalytics.com/portfolio-data-looks-better-naked

A scatterplot showing that states with higher vaccination rates have lower COVID case rates. A few states are highlighted in stronger font: NY, CA, MA have low COVID rates and high vaccination rates; SC GA, ID have high COVID rates and low vaccination rates; TX and USA are in the middle with medium vaccination and medium COVID rates. — One in 5,000, NYT, D. Leonhardt 9/7/21; image credit: https://www.nytimes.com/2021/09/07/briefing/risk-breakthrough-infections-delta.html

lighter grid lines
no extra information
good caption
regression line to give context to the trend
y axes labels horizontal, not vertical
a few states (and the US) are highlighted to draw the reader’s eye

W.E.B. Du Bois

One of the great early data viz pioneers. Remarkable ability to convey information.

Worth a Mention

W.E.B. Du Bois (1868-1963)

sociologist
data scientist

image of WEB Du Bois — image credit: wikipedia

In 1900 Du Bois contributed approximately 60 data visualizations to an exhibit at the Exposition Universelle in Paris, an exhibit designed to illustrate the progress made by African Americans since the end of slavery (only 37 years prior, in 1863).

Beautiful & Informative Graphics

https://drawingmatter.org/w-e-b-du-bois-visionary-infographics/

Goals of `ggplot2`

What I will try to do

give a tour of ggplot2
explain how to think about plots the ggplot2 way
prepare/encourage you to learn more later

What I can’t do in one session

show every bell and whistle
make you an expert at using ggplot2

Getting help

One of the best ways to get started with ggplot is to Google what you want to do with the word ggplot. Then look through the images that come up. More often than not, the associated code is there. There are also ggplot galleries of images, one of them is here: https://plot.ly/ggplot2/
Look at the end of this presentation and the syllabus. More help options there.

What are the visual cues on this plot?

position
length
shape
area/volume
shade/color

What are the visual cues on this plot?

position
length
shape
area/volume
shade/color

What are the visual cues on this plot?

position
length
shape
area/volume
shade/color

The grammar of graphics `ggplot`

geom: the geometric “shape” used to display data

bar, point, line, ribbon, text, etc.

aesthetic: an attribute controlling how geom is displayed with respect to variables

x position, y position, color, fill, shape, size, etc.

guide: helps user convert visual data back into raw data (legends, axes)

stat: a transformation applied to data before geom gets it

example: histograms work on binned data

Set up

library(mosaic)
data(Births2015)

head(Births2015)

date	births	wday	year	month	day_of_year	day_of_month	day_of_week
2015-01-01	8068	Thu	2015	1	1	1	5
2015-01-02	10850	Fri	2015	1	2	2	6
2015-01-03	8328	Sat	2015	1	3	3	7
2015-01-04	7065	Sun	2015	1	4	4	1
2015-01-05	11892	Mon	2015	1	5	5	2
2015-01-06	12425	Tue	2015	1	6	6	3

Obtained from the National Center for Health Statistics, National Vital Statistics System, Natality, 2015 data.

How do we make this plot?

Two Questions:

What do we want R to do? (What is the goal?)
What does R need to know?

How do we make this plot?

Goal: scatterplot = a plot with points
What does R need to know?
- data source: Births2015
- aesthetics:
  - date -> x
  - births -> y
  - points (!)

How do we make this plot?

ggplot(data = Births2015, 
       aes(x = date, y = births)) + 
  geom_point() +
  labs(title = "US Births in 2015")

ggplot() +
  geom_point(data = Births2015, 
             aes(x = date, y = births)) +
  labs(title = "US Births in 2015")

Layers

Layer 1

ggplot(data = Births2015, 
       aes(x = date, y = births))

Layers

Layer 2

ggplot(data = Births2015, 
       aes(x = date, y = births)) + 
  geom_point()

Layers

Layer 3

ggplot(data = Births2015, 
       aes(x = date, y = births)) + 
  geom_point() +
  labs(title = "US Births in 2015")

How do we make this plot?

What has changed?

new aesthetic: mapping color to day of week

How do we make this plot?

ggplot(data = Births2015,
       aes(x = date,
           y = births, 
           color = wday)) +
  geom_point() +
  labs(title = "US Births in 2015")

How do we make this plot?

How do we make this plot?

lines instead of dots!

ggplot(data = Births2015,
         aes(x = date, 
             y = births,
             color = wday)) +
  geom_line() +
  labs(title = "US Births in 2015")

How do we make this plot?

How do we make this plot?

Now there are two layers: one with points and one with lines

ggplot(data = Births2015,
       aes(x = date,
           y = births,
           color = wday)) + 
  geom_point() +  
  geom_line() +
  labs(title = "US Births in 2015")

The layers are placed one on top of the other: the points are below and the lines are above.
data and aes specified in ggplot() affect all geoms

What does this code do?

ggplot(data = Births2015,
       aes(x = date, y = births, color = "navy")) + 
  geom_point() +
  labs(title = "US Births in 2015")

What does this code do?

ggplot(data = Births2015,
       aes(x = date, y = births, color = "navy")) + 
  geom_point()  +
  labs(title = "US Births in 2015")

This is mapping the color aesthetic to a new variable with only one value (“navy”).
So all the dots get set to the same color, but it’s not navy.

Setting vs. Mapping

If we want to set the color to be navy for all of the dots, we do it outside the aes() designation:

ggplot(data = Births2015,
       aes(x = date, y = births)) +   # map variables 
  geom_point(color = "navy")    +   # set attributes
  labs(title = "US Births in 2015")

Note that color = "navy" is now outside of the aesthetics list. That’s how ggplot2 distinguishes between mapping and setting.

How do we make this plot?

How do we make this plot?

ggplot(data = Births2015,
       aes(x = date,
           y = births)) + 
  geom_line(aes(color = wday)) +      
  geom_point(color = "navy")  +         
  labs(title = "US Births in 2015")

ggplot() establishes the default data and aesthetics for the geoms, but each geom may change these defaults.
good practice: put into ggplot() the things that affect all (or most) of the layers; rest in geom_XXXX()

Setting vs. Mapping (again)

Information gets passed to the plot via:

map the variable information inside the aes (aesthetic) command
set the non-variable information outside the aes (aesthetic) command

Other geoms

apropos("^geom_")

 [1] "geom_abline"                  "geom_area"                   
 [3] "geom_ash"                     "geom_bar"                    
 [5] "geom_bin_2d"                  "geom_bin2d"                  
 [7] "geom_blank"                   "geom_boxplot"                
 [9] "geom_bracket"                 "geom_col"                    
[11] "geom_contour"                 "geom_contour_filled"         
[13] "geom_count"                   "geom_crossbar"               
[15] "geom_curve"                   "geom_density"                
[17] "geom_density_2d"              "geom_density_2d_filled"      
[19] "geom_density_line"            "geom_density_ridges"         
[21] "geom_density_ridges_gradient" "geom_density_ridges2"        
[23] "geom_density2d"               "geom_density2d_filled"       
[25] "geom_dotplot"                 "geom_errorbar"               
[27] "geom_errorbarh"               "geom_exec"                   
[29] "geom_freqpoly"                "geom_function"               
[31] "geom_hex"                     "geom_histogram"              
[33] "geom_hline"                   "geom_jitter"                 
[35] "geom_label"                   "geom_label_repel"            
[37] "geom_line"                    "geom_linerange"              
[39] "geom_lm"                      "geom_map"                    
[41] "geom_mosaic"                  "geom_mosaic_jitter"          
[43] "geom_mosaic_text"             "geom_path"                   
[45] "geom_pictogram"               "geom_point"                  
[47] "geom_pointrange"              "geom_polygon"                
[49] "geom_pwc"                     "geom_qq"                     
[51] "geom_qq_line"                 "geom_quantile"               
[53] "geom_rangeframe"              "geom_raster"                 
[55] "geom_rect"                    "geom_ribbon"                 
[57] "geom_ridgeline"               "geom_ridgeline_gradient"     
[59] "geom_rug"                     "geom_segment"                
[61] "geom_sf"                      "geom_sf_label"               
[63] "geom_sf_text"                 "geom_signif"                 
[65] "geom_smooth"                  "geom_spline"                 
[67] "geom_spoke"                   "geom_step"                   
[69] "geom_stripped_cols"           "geom_stripped_rows"          
[71] "geom_text"                    "geom_text_repel"             
[73] "geom_tile"                    "geom_tufteboxplot"           
[75] "geom_violin"                  "geom_vline"                  
[77] "geom_vridgeline"              "geom_waffle"

Other geoms

help pages will tell you their aesthetics, default stats, etc.

?geom_area             # for example

Let’s try `geom_area`

ggplot(data = Births2015,
       aes(x = date,
           y = births, 
           fill = wday)) + 
  geom_area() +
  labs(title = "US Births in 2015")

Let’s try `geom_area`

ggplot(data = Births2015,
       aes(x = date, y = births, fill = wday)) + 
  geom_area() +
  labs(title = "US Births in 2015")

… not a good plot

overplotting is hiding much of the data
extending y-axis to 0 may or may not be desirable.

Side note: what makes a plot good?

Most (all?) graphics are intended to help us make comparisons

How does something change over time?
Do my treatments matter? How much?
Do treatment and control respond the same way?

Key plot metric

Does my plot make the comparisons I am interested in:

easily, and
accurately?

Time for some different data

HELPrct: Health Evaluation and Linkage to Primary care randomized clinical trial. Subjects admitted for treatment for addiction to one of three substances.

head(HELPrct)

age	anysubstatus	anysub	cesd	d1	daysanysub	dayslink	drugrisk	e2b	female	sex	g1b	homeless	i1	i2	id	indtot	linkstatus	link	mcs	pcs	pss_fr	racegrp	satreat	sexrisk	substance	treat	avg_drinks	max_drinks	hospitalizations
37	1	yes	49	3	177	225	0	NA	0	male	yes	housed	13	26	1	39	1	yes	25.11	58.4	0	black	no	4	cocaine	yes	13	26	3
37	1	yes	30	22	2	NA	0	NA	0	male	yes	homeless	56	62	2	43	NA	NA	26.67	36.0	1	white	no	7	alcohol	yes	56	62	22
26	1	yes	39	0	3	365	20	NA	0	male	no	housed	0	0	3	41	0	no	6.76	74.8	13	black	no	2	heroin	no	0	0	0
39	1	yes	15	2	189	343	0	1	1	female	no	housed	5	5	4	28	0	no	43.97	61.9	11	white	yes	4	heroin	no	5	5	2
32	1	yes	39	12	2	57	0	1	0	male	no	homeless	10	13	5	38	1	yes	21.68	37.3	10	black	no	6	cocaine	no	10	13	12
47	1	yes	6	1	31	365	0	NA	1	female	no	housed	4	4	6	29	0	no	55.51	46.5	5	black	no	5	cocaine	yes	4	4	1

Who are the people in the study?

ggplot(data = HELP_data,
       aes(x = substance)) + 
  geom_bar() +
  labs(title = "HELP trial")

Hmm. What’s up with y?
- stat_bin() is being applied to the data before the geom_bar() gets to do its thing. Binning creates the y values.

Who are the people in the study?

ggplot(data = HELP_data,
       aes(x = substance,
           fill = children)) + 
  geom_bar() +
  labs(title = "HELP trial")

Who are the people in the study?

ggplot(HELP_data,
       aes(x = substance,
           fill = children)) + 
  geom_bar(position = "fill") +
  labs(title = "HELP trial",
       y = "actually, percent")

How old are people in the HELP study?

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data = HELP_data,
       aes(x = age)) + 
  geom_histogram() +
  labs(title = "HELP trial")

Notice the messages

stat_bin: Histograms are not mapping the raw data but binned data.
stat_bin() performs the data transformation.
binwidth: a default binwidth has been selected, but we should really choose our own.

Setting the binwidth manually

ggplot(data = HELP_data,
       aes(x = age)) + 
  geom_histogram(binwidth = 2) +
  labs(title = "HELP trial")

How old are people in the HELP study? – Other geoms

ggplot(data = HELP_data,
       aes(x = age)) + 
  geom_freqpoly(binwidth = 2) +
  labs(title = "HELP clinical trial at detoxification unit")

ggplot(data = HELP_data,
       aes(x = age)) + 
  geom_density() +
  labs(title = "HELP clinical trial at detoxification unit")

Selecting stat and geom manually

Every geom comes with a default stat

for simple cases, the stat is stat_identity() which does nothing
we can mix and match geoms and stats however we like

ggplot(data = HELP_data,
       aes(x = age)) + 
  geom_line(stat = "density") +
  labs(title = "HELP clinical trial at detoxification unit")

Selecting stat and geom manually

Every stat comes with a default geom, every geom with a default stat

we can specify stats instead of geom, if we prefer
we can mix and match geoms and stats however we like

ggplot(data = HELP_data,
       aes(x = age)) + 
  stat_density(geom = "line") +
  labs(title = "HELP clinical trial at detoxification unit")

More combinations

ggplot(data = HELP_data,
       aes(x = age)) + 
  geom_point(stat = "bin", binwidth = 3) + 
  geom_line(stat = "bin", binwidth = 3)  +
  labs(title = "HELP clinical trial at detoxification unit")

More combinations

ggplot(data = HELP_data,
       aes(x = age)) + 
  geom_area(stat = "bin", binwidth = 3)  +
  labs(title = "HELP clinical trial at detoxification unit")

More combinations

ggplot(data = HELP_data,
       aes(x = age)) + 
  geom_point(stat = "bin", 
             binwidth = 3, 
             aes(size = ..count..)) +
  geom_line(stat = "bin", binwidth = 3) +
  labs(title = "HELP clinical trial at detoxification unit")

How much drinking? (i1)

HELP_data |> 
  ggplot(aes(x = i1)) + geom_histogram() +
  labs(title = "HELP clinical trial at detoxification unit")

How much drinking? (i1)

HELP_data |> 
  ggplot(aes(x = i1)) + geom_density() +
  labs(title = "HELP clinical trial at detoxification unit")

How much drinking? (i1)

HELP_data |> 
  ggplot(aes(x = i1)) + geom_area(stat = "density") +
  labs(title = "HELP clinical trial at detoxification unit")

Covariates: Adding in more variables

Using color and linetype:

ggplot(data = HELP_data,
       aes(x = i1,
           color = substance,
           linetype = children)) + 
  geom_line(stat = "density") +
  labs(title = "HELP clinical trial at detoxification unit")

Using color and facets

ggplot(data = HELP_data,
       aes(x = i1, color = substance)) + 
  geom_line(stat = "density") + 
  facet_grid( . ~ children ) +
  labs(title = "HELP clinical trial at detoxification unit")

ggplot(data = HELP_data,
       aes(x = i1, color = substance)) + 
  geom_line(stat = "density") + 
  facet_grid( children ~ . ) +
  labs(title = "HELP clinical trial at detoxification unit")

Boxplots

Boxplots use stat_quantile() (five number summary).

The quantitative variable must be y, and there must be an additional x variable.

HELP_data |> 
  ggplot(aes(x = substance, y = age, color = children)) + 
  geom_boxplot() +
  labs(title = "HELP clinical trial at detoxification unit")

Horizontal boxplots

Horizontal boxplots are obtained by flipping the coordinate system:

coord_flip() may be used with other plots as well to reverse the roles of x and y on the plot.

ggplot(data = HELP_data,
       aes(x = substance, 
           y = age, 
           color = children)) + 
  geom_boxplot() +
  coord_flip() +
  labs(title = "HELP clinical trial at detoxification unit")

Axes scaling with boxplots

We can scale the continuous axis

ggplot(data = HELP_data,
       aes(x = substance, 
           y = age, 
           color = children)) + 
  geom_boxplot() +
  coord_trans(y = "exp") +
  labs(title = "HELP clinical trial at detoxification unit")

Give me some space

We’ve triggered a new feature: dodge (for dodging things left/right). We can control how much if we set the dodge manually.

ggplot(data = HELP_data,
       aes(x = substance, 
           y = age, 
           color = children)) + 
  geom_boxplot(position = position_dodge(width=1)) +
  labs(title = "HELP clinical trial at detoxification unit")

Issues with bigger data

Although we can see a generally positive association (as we would expect), the overplotting may be hiding information.

library(NHANES)
dim(NHANES)

[1] 10000    76

ggplot(data = NHANES,
       aes(x = Height, y = Weight)) +
  geom_point() + 
  facet_grid( Gender ~ PregnantNow )

Using alpha (opacity)

One way to deal with overplotting is to set the opacity low.

ggplot(data = NHANES,
       aes(x = Height, y = Weight)) +
  geom_point(alpha=0.01) + 
  facet_grid( Gender ~ PregnantNow )

geom_density2d

Alternatively (or simultaneously) we might prefer a different geom altogether.

ggplot(data = NHANES,
       aes(x = Height, y = Weight)) +
  geom_density2d() + 
  facet_grid( Gender ~ PregnantNow )

Multiple layers

ggplot(data = HELP_data, 
       aes(x = children, y = age)) +
  geom_boxplot(outlier.size = 0) +
  geom_point(alpha=.6) +
  coord_flip() +
  labs(title = "HELP clinical trial at detoxification unit")

ggplot(data = HELP_data,
       aes(x = children, y = age)) +
  geom_boxplot(outlier.size = 0) +
  geom_jitter(alpha=.6, width = 0.1) +
  coord_flip() +
  labs(title = "HELP clinical trial at detoxification unit")

Multiple layers

ggplot(data = HELP_data,
       aes(x = children, y = age)) +
  geom_boxplot(outlier.size = 0) +
  geom_point(alpha=.6, 
             position = position_jitter(width=.1, height=0)) +
  coord_flip() +
  labs(title = "HELP clinical trial at detoxification unit")

Things I haven’t mentioned (much)

coords (coord_flip() is good to know about)
themes (for customizing appearance)
position (position_dodge(), position_jitterdodge(), position_stack(), etc.)
transforming axes

themes

library(ggthemes)
ggplot(Births2015, aes(x = date, y = births)) + 
  geom_point() + 
  theme_wsj()

`jitterdodge()`

ggplot(data = HELP_data, 
       aes(x = substance, y = age, color = children)) +
  geom_boxplot(coef = 10, position = position_dodge()) +
  geom_point(aes(color = children, 
                 fill = children), 
             position = position_jitterdodge()) +
  labs(title = "HELP clinical trial at detoxification unit")

A little bit of everything

ggplot(data = HELP_data, aes(x = substance, y = age, color = children)) +
  geom_boxplot(coef = 10, position = position_dodge(width=1)) +
  geom_point(aes(fill = children), alpha=.5, 
             position = position_jitterdodge(dodge.width=1, jitter.width = 0.2)) + 
  facet_wrap(~homeless) +
  labs(title = "HELP clinical trial at detoxification unit")

Want to learn more?

docs.ggplot2.org/
R for Data Science by Hadley Wickham and Garrett Grolemund

What’s around the corner?

shiny

interactive graphics / modeling
https://shiny.rstudio.com/

plotly

Plotly is an R package for creating interactive web-based graphs via plotly’s JavaScript graphing library, plotly.js. The plotly R libary contains the ggplotly function , which will convert ggplot2 figures into a Plotly object. Furthermore, you have the option of manipulating the Plotly object with the style function.

https://plot.ly/ggplot2/getting-started/

gganimate

gganimate tutorial

Footnotes

image credit: Better Data Visualization by Schwabish↩︎

Reuse

CC-BY-SA-4.0

Other Formats

Agenda 1/27/25

NSSD:

Data Visualization

Graphics

Grammar of graphics

Pieces of the Graph

Order Matters

Cues Together

Attributes

Agenda 1/29/25

Advice for Plotting

Advice for Plotting

Advice for Plotting

Advice for Plotting

Advice for Plotting

Advice for Plotting

Advice for Plotting

Examples in the wild

Preliminaries

Preliminaries

Tufte

Cholera - a picture tells 1000 words

Cholera - difficult to interpret

Challenger - Problematic

Challenger - Better????

Challenger - Improved

Fonts matter

Advice on plotting, specific

Advice on plotting, general

Simplify

Simplified

NYT 9/7/21

W.E.B. Du Bois

Worth a Mention

Beautiful & Informative Graphics

Goals of ggplot2

Getting help

What are the visual cues on this plot?

What are the visual cues on this plot?

What are the visual cues on this plot?

The grammar of graphics ggplot

Set up

How do we make this plot?

How do we make this plot?

How do we make this plot?

Layers

Layer 1

Layers

Layer 2

Layers

Layer 3

How do we make this plot?

How do we make this plot?

How do we make this plot?

How do we make this plot?

How do we make this plot?

How do we make this plot?

What does this code do?

What does this code do?

Setting vs. Mapping

How do we make this plot?

How do we make this plot?

Setting vs. Mapping (again)

Other geoms

Other geoms

Let’s try geom_area

Let’s try geom_area

Side note: what makes a plot good?

Time for some different data

Who are the people in the study?

Who are the people in the study?

Who are the people in the study?

How old are people in the HELP study?

Setting the binwidth manually

How old are people in the HELP study? – Other geoms

Selecting stat and geom manually

Selecting stat and geom manually

More combinations

More combinations

Goals of `ggplot2`

The grammar of graphics `ggplot`

Let’s try `geom_area`

Let’s try `geom_area`

`jitterdodge()`