Project 4

SQL

You will use SQL to query the Wideband Acoustic Immittance (WAI) Database hosted by Smith College. WAI measurements are being developed as noninvasive auditory diagnostic tools for people of all ages, and the WAI Database hosts WAI ear measurements that have been published in peer-review articles. The goal of the database is to “enable auditory researchers to share WAI measurements and combine analyses over multiple datasets.”1

You have two primary goals:

  1. Duplicate Figure 1 from Voss (2019). You will need to query the WAI Database to build a dataset which you can pipe into ggplot() to recreate Figure 1 as closely as possible (might not be possible to reproduce exactly).

  2. Find a single study (from the WAI database) where subjects of different sex, race, ethnicity, or age groups were enrolled, and produce plots of frequency vs. mean absorption by the group of interest.

You should be using JOINs in both (1) and (2), do the JOINing in SQL, not in R. Indeed, do as much of the work as possible using SQL functions! Really only the ggplot() should be in done in R.

Timeline

Mini-Project 4 must be submitted on Canvas (not Gradescope) by 11:59 PM on Tuesday November 26. You will add a tab to your Quarto webpage for Mini-Project 4 and submit the new page’s URL.

Hints

  • Parse the caption from Figure 1 carefully to determine how mean absorbances are calculated: “Mean absorbances for the 12 studies within the WAI database as of July 1, 2019. Noted in the legend are the peer-reviewed publications associated with the datasets, the number of individual ears, and the equipment used in the study. When multiple measurements were made on the same ear, the average from those measurements was used in the calculation across subjects for a given study. Some subjects have measurements on both a right and a left ear, and some subjects have measurements from only one ear; this figure includes every ear in the database and does not control for the effect of number of ears from each subject.”
  • filter for only the 12 studies shown in Figure 1 (and also for frequencies shown in Figure 1)
  • study the patterns of frequencies. It seems that most researchers used the same set of frequencies for each subject, ear, and session.
  • note the scale of the x-axis
  • the labels on the legend contains AuthorsShortList, Year, and Instrument, in addition to the number of unique ears (I think Werner’s N may be incorrect?). To get the number of unique ears, you might want something like this inside your SELECT query: COUNT(DISTINCT SubjectNumber, Ear) AS Unique_Ears
  • when creating the labels, you’ll need to GROUP BY Identifier, Instrument. That is, group by both the Identifier and the Instrument. The Instrument will not perfectly match the labels in the plot.

Logistics:

  • start by describing what you plan to do (3-4 sentences). end with a description of what you did (3-4 sentences). That is, use words to guide the reader through your analysis.
  • please include all your code used in the analysis.
  • make sure that all graphs are well-labeled (including x and y axes, title of the graph, and accurate and succinct labels for color and fill).
  • do not include superfluous error or warning messages.
  • include a few sentences describing each of your plots or tables. That is, tell the reader what they see when they look at the plot. Your narrative description should be in the text part of the qmd file, not as a comment in an R chunk.
  • include the source of the data.

Starter Code

library(RMariaDB)
con_wai <- dbConnect(
  MariaDB(), host = "scidb.smith.edu",
  user = "waiuser", password = "smith_waiDB", 
  dbname = "wai"
)
Measurements <- tbl(con_wai, "Measurements")
PI_Info <- tbl(con_wai, "PI_Info")
Subjects <- tbl(con_wai, "Subjects")

# collect(Measurements)
SHOW TABLES;
7 records
Tables_in_wai
Codebook
Measurements
Measurements_pre2020
PI_Info
PI_Info_OLD
Subjects
Subjects_pre2020
DESCRIBE Measurements;
Displaying records 1 - 10
Field Type Null Key Default Extra
Identifier varchar(50) NO PRI NA
SubjectNumber int NO PRI NA
Session int NO PRI NA
Ear varchar(50) NO PRI
Instrument varchar(50) NO PRI
Age float YES NA
AgeCategory varchar(50) YES NA
EarStatus varchar(50) YES NA
TPP float YES NA
AreaCanal float YES NA
SELECT *
FROM Measurements
LIMIT 0, 5;
5 records
Identifier SubjectNumber Session Ear Instrument Age AgeCategory EarStatus TPP AreaCanal PressureCanal SweepDirection Frequency Absorbance Zmag Zang
Abur_2014 1 1 Left HearID 20 Adult Normal -5 4.42e-05 0 Ambient 210.938 0.0333379 113780000 -0.233504
Abur_2014 1 1 Left HearID 20 Adult Normal -5 4.42e-05 0 Ambient 234.375 0.0315705 103585000 -0.235778
Abur_2014 1 1 Left HearID 20 Adult Normal -5 4.42e-05 0 Ambient 257.812 0.0405751 92951696 -0.233482
Abur_2014 1 1 Left HearID 20 Adult Normal -5 4.42e-05 0 Ambient 281.250 0.0438399 86058000 -0.233421
Abur_2014 1 1 Left HearID 20 Adult Normal -5 4.42e-05 0 Ambient 304.688 0.0486400 79492800 -0.232931

References

Voss, SE. 2019. “Resource Review.” Ear and Hearing 40 (6). https://doi.org/10.1097/AUD.0000000000000790.

Footnotes

  1. Thanks to Paul Roback at St. Olaf for sharing this project idea.↩︎

Reuse