library(RMariaDB)
<- dbConnect(
con_wai MariaDB(), host = "scidb.smith.edu",
user = "waiuser", password = "smith_waiDB",
dbname = "wai"
)<- tbl(con_wai, "Measurements")
Measurements <- tbl(con_wai, "PI_Info")
PI_Info <- tbl(con_wai, "Subjects")
Subjects
# collect(Measurements)
Project 4
SQL
You will use SQL to query the Wideband Acoustic Immittance (WAI) Database hosted by Smith College. WAI measurements are being developed as noninvasive auditory diagnostic tools for people of all ages, and the WAI Database hosts WAI ear measurements that have been published in peer-review articles. The goal of the database is to “enable auditory researchers to share WAI measurements and combine analyses over multiple datasets.”1
You have two primary goals:
Duplicate Figure 1 from Voss (2019). You will need to query the WAI Database to build a dataset which you can pipe into
ggplot()
to recreate Figure 1 as closely as possible (might not be possible to reproduce exactly).Find a single study (from the WAI database) where subjects of different sex, race, ethnicity, or age groups were enrolled, and produce plots of frequency vs. mean absorption by the group of interest.
You should be using JOIN
s in both (1) and (2), do the JOIN
ing in SQL, not in R. Indeed, do as much of the work as possible using SQL functions! Really only the ggplot()
should be in done in R.
Timeline
Mini-Project 4 must be submitted on Canvas (not Gradescope) by 11:59 PM on Tuesday November 26. You will add a tab to your Quarto webpage for Mini-Project 4 and submit the new page’s URL.
Hints
- Parse the caption from Figure 1 carefully to determine how mean absorbances are calculated: “Mean absorbances for the 12 studies within the WAI database as of July 1, 2019. Noted in the legend are the peer-reviewed publications associated with the datasets, the number of individual ears, and the equipment used in the study. When multiple measurements were made on the same ear, the average from those measurements was used in the calculation across subjects for a given study. Some subjects have measurements on both a right and a left ear, and some subjects have measurements from only one ear; this figure includes every ear in the database and does not control for the effect of number of ears from each subject.”
- filter for only the 12 studies shown in Figure 1 (and also for frequencies shown in Figure 1)
- study the patterns of frequencies. It seems that most researchers used the same set of frequencies for each subject, ear, and session.
- note the scale of the x-axis
- the labels on the legend contains
AuthorsShortList
,Year
, andInstrument
, in addition to the number of unique ears (I think Werner’s N may be incorrect?). To get the number of unique ears, you might want something like this inside yourSELECT
query:COUNT(DISTINCT SubjectNumber, Ear) AS Unique_Ears
- when creating the labels, you’ll need to
GROUP BY Identifier, Instrument
. That is, group by both theIdentifier
and theInstrument
. TheInstrument
will not perfectly match the labels in the plot.
Logistics:
- start by describing what you plan to do (3-4 sentences). end with a description of what you did (3-4 sentences). That is, use words to guide the reader through your analysis.
- please include all your code used in the analysis.
- make sure that all graphs are well-labeled (including x and y axes, title of the graph, and accurate and succinct labels for color and fill).
- do not include superfluous error or warning messages.
- include a few sentences describing each of your plots or tables. That is, tell the reader what they see when they look at the plot. Your narrative description should be in the text part of the qmd file, not as a comment in an R chunk.
- include the source of the data.
Starter Code
TABLES; SHOW
Tables_in_wai |
---|
Codebook |
Measurements |
Measurements_pre2020 |
PI_Info |
PI_Info_OLD |
Subjects |
Subjects_pre2020 |
DESCRIBE Measurements;
Field | Type | Null | Key | Default | Extra |
---|---|---|---|---|---|
Identifier | varchar(50) | NO | PRI | NA | |
SubjectNumber | int | NO | PRI | NA | |
Session | int | NO | PRI | NA | |
Ear | varchar(50) | NO | PRI | ||
Instrument | varchar(50) | NO | PRI | ||
Age | float | YES | NA | ||
AgeCategory | varchar(50) | YES | NA | ||
EarStatus | varchar(50) | YES | NA | ||
TPP | float | YES | NA | ||
AreaCanal | float | YES | NA |
SELECT *
FROM Measurements
LIMIT 0, 5;
Identifier | SubjectNumber | Session | Ear | Instrument | Age | AgeCategory | EarStatus | TPP | AreaCanal | PressureCanal | SweepDirection | Frequency | Absorbance | Zmag | Zang |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Abur_2014 | 1 | 1 | Left | HearID | 20 | Adult | Normal | -5 | 4.42e-05 | 0 | Ambient | 210.938 | 0.0333379 | 113780000 | -0.233504 |
Abur_2014 | 1 | 1 | Left | HearID | 20 | Adult | Normal | -5 | 4.42e-05 | 0 | Ambient | 234.375 | 0.0315705 | 103585000 | -0.235778 |
Abur_2014 | 1 | 1 | Left | HearID | 20 | Adult | Normal | -5 | 4.42e-05 | 0 | Ambient | 257.812 | 0.0405751 | 92951696 | -0.233482 |
Abur_2014 | 1 | 1 | Left | HearID | 20 | Adult | Normal | -5 | 4.42e-05 | 0 | Ambient | 281.250 | 0.0438399 | 86058000 | -0.233421 |
Abur_2014 | 1 | 1 | Left | HearID | 20 | Adult | Normal | -5 | 4.42e-05 | 0 | Ambient | 304.688 | 0.0486400 | 79492800 | -0.232931 |
References
Footnotes
Thanks to Paul Roback at St. Olaf for sharing this project idea.↩︎