install.packages("devtools")
::install_github("nicholasjhorton/FederalistPapers") devtools
Project 2
Completing a full text analysis
Overview
You will find a data set containing string data. The data could be on newspaper articles, tweets, songs, plays, movie reviews, or anything else you can imagine. Then you will answer questions of interest and tell a story about your data using string and regular expression skills you have developed.
Your analysis must contain the following elements:
- at least 3
str_*()
functions - at least 3 regular expressions
- at least 2 illustrative, well-labeled plots or tables
- a description of what insights can be gained from your plots and tables
- a reference / documentation of the data source.
Logistics:
- please include all your code used in the analysis.
- make sure that all graphs are well-labeled (including x and y axes, title of the graph, and accurate and succinct labels for color and fill).
- do not include superfluous error or warning messages.
- include a few sentences describing each of your plots or tables. That is, tell the reader what they see when they look at the plot. Your narrative description should be in the text part of the qmd file, not as a comment in an R chunk.
Some potential places to find text data
I’ve gathered some potential datasets for you to work with. All of the datasets below contain some or a lot of text.
synopses data frame for Broadway Weekly Grosses
-
- Load the package into R using
- All of Emily Dickinson’s poems
- Load the package into R using
::install_github("Amherst-Statistics/DickinsonPoems") devtools
All of Barack Obama’s tweets archived by the National Archives.
The “Dear Abby” stories underlying The Pudding’s 30 Years of American Anxieties article
- See data on the The Pudding’s GitHub site
- Load the data in using
read_csv("https://raw.githubusercontent.com/the-pudding/data/master/dearabby/raw_da_qs.csv")
NY Times headlines from the RTextTools package (see below)
library(RTextTools)
data(NYTimes)
as_tibble(NYTimes)
- the options are endless – be resourceful and creative!
Timeline
Mini-Project 2 must be submitted on Canvas (not Gradescope) by 11:59 PM on Wednesday October 2. You will add a tab to your Quarto webpage for Mini-Project 3 and submit the new page’s URL.
:::