November 13, 2024
Consider a database of information from Stanford Open Policing Project on over 200 million traffic stops in the US.
88 tables in the traffic
database! Lots of details on traffic stops across cities and highway patrols.
raw_row_number | date | time | district | region | subject_race | subject_sex | officer_id_hash | type | raw_descent_description |
---|---|---|---|---|---|---|---|---|---|
5692933 | 13:59:00 | 0753 | WILSHIRE | hispanic | male | 15ecd81b00 | pedestrian | HISPANIC | |
240731 | 2010-01-01 | 00:05:00 | 665 | WEST TRAFFIC | other | male | b707de41e0 | vehicular | OTHER |
240592|240593 | 2010-01-01 | 00:10:00 | 1258 | SEVENTY-SEVENTH | hispanic | male | b49ae7078c | pedestrian | HISPANIC |
241116 | 2010-01-01 | 00:10:00 | 1635 | FOOTHILL | hispanic | male | e70ce46248 | vehicular | HISPANIC |
240681 | 2010-01-01 | 00:15:00 | 882 | WEST LA | other | male | 8587bd743c | vehicular | OTHER |
240602 | 2010-01-01 | 00:20:00 | 559 | HARBOR | hispanic | male | 601bf1fdff | vehicular | HISPANIC |
240594 | 2010-01-01 | 00:20:00 | 667 | HOLLYWOOD | hispanic | male | fb2e6d8ab3 | pedestrian | HISPANIC |
241434 | 2010-01-01 | 00:25:00 | 1658 | FOOTHILL | hispanic | female | 5100c8c759 | vehicular | HISPANIC |
240582 | 2010-01-01 | 00:30:00 | 1041 | WEST VALLEY | black | female | d54d90a4d0 | vehicular | BLACK |
240642 | 2010-01-01 | 00:30:00 | 1162 | NORTH EAST | hispanic | male | 5d6fbef1d3 | pedestrian | HISPANIC |
Still using a {sql}
chunk. The DESCRIBE
command shows the 10 field names (variables) in the ca_los_angeles_2020_04_01
table. Some of the variables are characters (text
) and some are date
or time
.
Field
is the name of the variableType
is the type of the variable (numeric, character, etc.)Null
indicates whether or not NULL
values are acceptable. NULL
values are never allowed for a Primary Key.Key
designates whether the variable is a Primary Key. PRI
is a primary key, all unique, no NULL
values. UNI
is a unique key, but NULL
values are allowed. MUL
is a non-unique (“Multiple”) key, allowing for duplicates, can speed up queries.Default
provides the default value of the variable if no value is provided when a row is added to the database.Queries in SQL start with the SELECT
keyword and consist of several clauses, which must be written in the following order:
SELECT
FROM
JOIN
WHERE
GROUP BY
HAVING
ORDER BY
LIMIT
The clauses are similar to data wrangling verbs in R, but the order in SQL is super important!
SELECT
, include a corresponding FROM
*
returns every columnThe shortest SQL query is the following SELECT
command. Do not run this command!!! The ca_los_angeles_2020_04_01
table has 5 million rows, and we do not want to look at them (print to screen) simultaneously.
Much better for big tables:
raw_row_number | date | time | district | region | subject_race | subject_sex | officer_id_hash | type | raw_descent_description |
---|---|---|---|---|---|---|---|---|---|
5692933 | 13:59:00 | 0753 | WILSHIRE | hispanic | male | 15ecd81b00 | pedestrian | HISPANIC | |
240731 | 2010-01-01 | 00:05:00 | 665 | WEST TRAFFIC | other | male | b707de41e0 | vehicular | OTHER |
240592|240593 | 2010-01-01 | 00:10:00 | 1258 | SEVENTY-SEVENTH | hispanic | male | b49ae7078c | pedestrian | HISPANIC |
241116 | 2010-01-01 | 00:10:00 | 1635 | FOOTHILL | hispanic | male | e70ce46248 | vehicular | HISPANIC |
240681 | 2010-01-01 | 00:15:00 | 882 | WEST LA | other | male | 8587bd743c | vehicular | OTHER |
240602 | 2010-01-01 | 00:20:00 | 559 | HARBOR | hispanic | male | 601bf1fdff | vehicular | HISPANIC |
240594 | 2010-01-01 | 00:20:00 | 667 | HOLLYWOOD | hispanic | male | fb2e6d8ab3 | pedestrian | HISPANIC |
241434 | 2010-01-01 | 00:25:00 | 1658 | FOOTHILL | hispanic | female | 5100c8c759 | vehicular | HISPANIC |
240582 | 2010-01-01 | 00:30:00 | 1041 | WEST VALLEY | black | female | d54d90a4d0 | vehicular | BLACK |
240642 | 2010-01-01 | 00:30:00 | 1162 | NORTH EAST | hispanic | male | 5d6fbef1d3 | pedestrian | HISPANIC |
How do we know how many traffic stops are in the database? Two different ways of counting the rows (that produce the same number):
COUNT(*) | SUM(1) |
---|---|
5418402 | 5418402 |
More than 5.4 million LAPD traffic stops.
Let’s look at New Orleans, LA
How old was the car being driven?
Unfortunately, date
is saved as a date variable, and vehicle_year
is saved as a double. Fortunately, we can convert vehicle_year
to a date variable.
SELECT
date, vehicle_year,
STR_TO_DATE(CONCAT(vehicle_year, '-01-01'), '%Y-%m-%d') AS vehicle_date
FROM la_new_orleans_2020_04_01
LIMIT 0, 10;
date | vehicle_year | vehicle_date |
---|---|---|
2010-01-01 | 2005 | 2005-01-01 |
2010-01-01 | 2005 | 2005-01-01 |
2010-01-01 | 2005 | 2005-01-01 |
2010-01-01 | 2003 | 2003-01-01 |
2010-01-01 | ||
2010-01-01 | ||
2010-01-01 | ||
2010-01-01 | ||
2010-01-01 | ||
2010-01-01 |
Why can’t we find the difference between the two time variables?
SELECT
date, vehicle_year,
STR_TO_DATE(CONCAT(vehicle_year, '-01-01'), '%Y-%m-%d') AS vehicle_date
TIMESTAMPDIFF(YEAR, vehicle_date, date) AS car_age
FROM la_new_orleans_2020_04_01
LIMIT 0, 10;
Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'TIMESTAMPDIFF(YEAR, vehicle_date, date) AS car_age
FROM la_new_orleans_2020_0' at line 4 [1064]
Solution 1: two layers of SELECT
SELECT
(i.e., inside) layer creates the new variablesSELECT
(i.e., outside) layer subtracts the two timesSELECT
date,
vehicle_date,
TIMESTAMPDIFF(YEAR, vehicle_date, date) AS car_age
FROM (
SELECT
date,
STR_TO_DATE(CONCAT(vehicle_year, '-01-01'), '%Y-%m-%d') AS vehicle_date
FROM la_new_orleans_2020_04_01)
AS subquery_table
LIMIT 0, 10;
date | vehicle_date | car_age |
---|---|---|
2010-01-01 | 2005-01-01 | 5 |
2010-01-01 | 2005-01-01 | 5 |
2010-01-01 | 2005-01-01 | 5 |
2010-01-01 | 2003-01-01 | 7 |
2010-01-01 | ||
2010-01-01 | ||
2010-01-01 | ||
2010-01-01 | ||
2010-01-01 | ||
2010-01-01 |
Solution 2: apply the STR_TO_DATE()
function inside the TIMESTAMPDIFF()
function
SELECT
date,
vehicle_year,
TIMESTAMPDIFF(YEAR,
STR_TO_DATE(CONCAT(vehicle_year, '-01-01'), '%Y-%m-%d'),
date) AS car_age
FROM la_new_orleans_2020_04_01
LIMIT 0, 10;
date | vehicle_year | car_age |
---|---|---|
2010-01-01 | 2005 | 5 |
2010-01-01 | 2005 | 5 |
2010-01-01 | 2005 | 5 |
2010-01-01 | 2003 | 7 |
2010-01-01 | ||
2010-01-01 | ||
2010-01-01 | ||
2010-01-01 | ||
2010-01-01 | ||
2010-01-01 |
There is a distinction between clauses that operate on the variables of the original table versus those that operate on the variables of the results set.
date
and vehicle_year
are columns in the original table - they are written to disk on the SQL server.
car_age
exist only in the results set, which is passed from the server (SQL server) to the client (e.g., RStudio or DBeaver) in your computer’s memory and is not written to disk.
Returns only unique rows.
raw_actions_taken |
---|
Stop Results: Citation Issued;Subject Type: Driver |
Stop Results: Verbal Warning;Search Occurred: Yes;Search Types: Vehicle;Search Types: Pat-down;Search Types: Driver;Legal Basises: Consent to search;Legal Basises: Probable cause |
Stop Results: Verbal Warning;Subject Type: Driver;Search Occurred: No;Evidence Seized: No |
Stop Results: Verbal Warning;Subject Type: Driver |
Stop Results: Verbal Warning;Subject Type: Driver;Search Occurred: No |
Stop Results: No action taken;Search Occurred: No |
Stop Results: Citation Issued;Subject Type: Driver;Search Occurred: No |
Stop Results: Citation Issued;Subject Type: Driver;Search Occurred: No;Evidence Seized: No |
Stop Results: No action taken;Search Occurred: No;Search Types: Pat-down |
outcome |
---|
citation |
warning |
arrest |
vehicle_make | vehicle_color |
---|---|
DODGE | BLACK |
NISSAN | BLUE |
JEEP | GRAY |
GMC - GENERAL MOTORS CORP. | BLUE |
CHEVROLET | YELLOW |
OLDSMOBILE | SILVER |
TOYOTA | GREEN |
PONTIAC | BRONZE |
CHEVROLET | WHITE |
(note that we haven’t yet arranged the distinct values!)
The WHERE
clause is analogous to the filter()
function in dplyr. However, keep in mind that there are two SQL commands that resemble the dplyr filter()
function. WHERE
operates on the original data in the table and HAVING
operates on the result set.
What was the time of day for those traffic stops search_conducted
was true and vehicle_make
was Subaru?
Note that in SQL the equality logical is =
and in R the equality logical is ==
. Note also that the WHERE command is case-insensitive!
SELECT time, search_conducted, vehicle_year, vehicle_make
FROM la_new_orleans_2020_04_01
WHERE search_conducted = 1
AND vehicle_make = "Subaru"
LIMIT 0, 10;
time | search_conducted | vehicle_year | vehicle_make |
---|---|---|---|
21:05:00 | 1 | 1996 | SUBARU |
21:05:00 | 1 | 1996 | SUBARU |
03:27:00 | 1 | 1998 | SUBARU |
19:28:00 | 1 | 2012 | SUBARU |
21:40:00 | 1 | 1998 | SUBARU |
10:38:00 | 1 | 2014 | SUBARU |
05:01:00 | 1 | 1998 | SUBARU |
22:00:00 | 1 | 1998 | SUBARU |
13:46:00 | 1 | 2007 | SUBARU |
22:50:00 | 1 | 1999 | SUBARU |
BETWEEN
can be used to specify a range of values for a numeric value. BETWEEN
is inclusive.
SELECT time, search_conducted, vehicle_year, vehicle_make
FROM la_new_orleans_2020_04_01
WHERE vehicle_year BETWEEN 1970 and 1999
AND vehicle_make = "Subaru"
LIMIT 0, 10;
time | search_conducted | vehicle_year | vehicle_make |
---|---|---|---|
20:16:00 | 0 | 1991 | SUBARU |
20:54:00 | 0 | 1998 | SUBARU |
17:09:00 | 0 | 1999 | SUBARU |
13:20:00 | 0 | 1996 | SUBARU |
20:44:00 | 0 | 1998 | SUBARU |
21:05:00 | 1 | 1996 | SUBARU |
21:05:00 | 1 | 1996 | SUBARU |
18:52:00 | 0 | 1995 | SUBARU |
17:36:00 | 0 | 1996 | SUBARU |
08:40:00 | 0 | 1999 | SUBARU |
IN
is similar to the dplyr %in%
function which specifies distinct values for the variable.
SELECT time, search_conducted, vehicle_year, vehicle_make
FROM la_new_orleans_2020_04_01
WHERE vehicle_make IN ("Subaru", "Dodge")
AND search_conducted = 1
LIMIT 0, 10;
time | search_conducted | vehicle_year | vehicle_make |
---|---|---|---|
03:45:00 | 1 | 2010 | DODGE |
20:29:00 | 1 | 2012 | DODGE |
21:12:00 | 1 | 2011 | DODGE |
21:12:00 | 1 | 2011 | DODGE |
02:00:00 | 1 | 2014 | DODGE |
20:46:00 | 1 | 2008 | DODGE |
20:46:00 | 1 | 2008 | DODGE |
04:47:00 | 1 | 2014 | DODGE |
19:16:00 | 1 | 1995 | DODGE |
09:43:00 | 1 | 2009 | DODGE |
AND
takes precedent over OR
in the order of operations, when there are no parentheses.
SELECT time, search_conducted, vehicle_year, vehicle_make
FROM la_new_orleans_2020_04_01
WHERE vehicle_year BETWEEN 1970 and 1999 OR
vehicle_make IN ("Subaru", "Dodge") AND
search_conducted = 1
LIMIT 0, 10;
time | search_conducted | vehicle_year | vehicle_make |
---|---|---|---|
16:03:00 | 0 | 1997 | GMC - GENERAL MOTORS CORP. |
16:03:00 | 0 | 1997 | GMC - GENERAL MOTORS CORP. |
07:54:00 | 0 | 1995 | TOYOTA |
21:01:00 | 0 | 1997 | GMC - GENERAL MOTORS CORP. |
21:06:00 | 0 | 1996 | DODGE |
21:16:00 | 0 | 1996 | DODGE |
10:20:00 | 0 | 1994 | MERCEDES-BENZ |
10:35:00 | 0 | 1998 | PONTIAC |
12:46:00 | 0 | 1996 | ACURA |
15:17:00 | 0 | 1997 | OLDSMOBILE |
same as:
SELECT time, search_conducted, vehicle_year, vehicle_make
FROM la_new_orleans_2020_04_01
WHERE vehicle_year BETWEEN 1970 and 1999 OR
(vehicle_make IN ("Subaru", "Dodge") AND
search_conducted = 1)
LIMIT 0, 10;
time | search_conducted | vehicle_year | vehicle_make |
---|---|---|---|
16:03:00 | 0 | 1997 | GMC - GENERAL MOTORS CORP. |
16:03:00 | 0 | 1997 | GMC - GENERAL MOTORS CORP. |
07:54:00 | 0 | 1995 | TOYOTA |
21:01:00 | 0 | 1997 | GMC - GENERAL MOTORS CORP. |
21:06:00 | 0 | 1996 | DODGE |
21:16:00 | 0 | 1996 | DODGE |
10:20:00 | 0 | 1994 | MERCEDES-BENZ |
10:35:00 | 0 | 1998 | PONTIAC |
12:46:00 | 0 | 1996 | ACURA |
15:17:00 | 0 | 1997 | OLDSMOBILE |
Parentheses take precedent over AND
and OR
.
SELECT time, search_conducted, vehicle_year, vehicle_make
FROM la_new_orleans_2020_04_01
WHERE (vehicle_year BETWEEN 1970 and 1999 OR
vehicle_make IN ("Subaru", "Dodge") ) AND
search_conducted = 1
LIMIT 0, 10;
time | search_conducted | vehicle_year | vehicle_make |
---|---|---|---|
03:45:00 | 1 | 2010 | DODGE |
07:11:00 | 1 | 1999 | CADILLAC |
20:21:00 | 1 | 1999 | MERCURY |
20:29:00 | 1 | 2012 | DODGE |
20:31:00 | 1 | 1993 | INFINITY |
21:12:00 | 1 | 2011 | DODGE |
21:12:00 | 1 | 2011 | DODGE |
02:00:00 | 1 | 2014 | DODGE |
14:31:00 | 1 | 1994 | JEEP |
04:39:00 | 1 | 1995 | CHEVROLET |
IS NULL
not = NULL
(because NULL
indicates unknown). (Just like how we need to use is.na()
in R!)
SELECT time, search_conducted, vehicle_year, vehicle_make
FROM la_new_orleans_2020_04_01
WHERE vehicle_make IS NULL
LIMIT 0, 10;
time | search_conducted | vehicle_year | vehicle_make |
---|---|---|---|
02:06:00 | 0 | ||
02:06:00 | 0 | ||
02:06:00 | 0 | ||
02:06:00 | 0 | ||
02:06:00 | 0 | ||
02:06:00 | 0 | ||
02:06:00 | 0 | ||
08:02:00 | 0 | ||
20:05:00 | 0 | ||
20:05:00 | 0 |
NULL
exampleThe logic of NULL
:1
NULL
, you’ll just get NULL
. For instance if \(x\) is NULL
, then \(x > 3\), \(1 = x\), and \(x + 4\) all evaluate to NULL
. Even \(x =\) NULL
evaluates to NULL
! if you want to check whether \(x\) is NULL
, use x IS NULL
or x IS NOT NULL
.NULL
short-circuits with boolean operators. That means a boolean expression involving NULL
will evaluate to:
NULL
value is really TRUE or FALSE.NULL
value is really TRUE or FALSE.NULL
, if it depends on the NULL
value.NULL
exampleConsider the following table and SQL query to evaluate WHERE age <= 20 OR num_dogs = 3
:
SELECT * FROM (
SELECT 'Ace' AS name, 20 AS age, 4 as num_dogs
UNION
SELECT 'Ada' AS name, NULL AS age, 3 as num_dogs
UNION
SELECT 'Ben' AS name, NULL AS age, NULL as num_dogs
UNION
SELECT 'Cho' AS name, 27 AS age, NULL as num_dogs
) AS temptable;
name | age | num_dogs |
---|---|---|
Ace | 20 | 4 |
Ada | 3 | |
Ben | ||
Cho | 27 |
NULL
exampleWhere does the WHERE
clause do? It tells us that we only want to keep the rows satisfying the age <= 20
OR num_dogs = 3
. Let’s consider each row one at a time:
age <= 20
evaluates to TRUE so the claim is satisfied.age <= 20
evaluates to NULL
but num_dogs = 3
evaluates to TRUE so the claim is satisfied.age <= 20
evaluates to NULL
and num_dogs = 3
evaluates to NULL
so the overall expression is NULL
which has a FALSE value.age <= 20
evaluates to FALSE and num_dogs = 3
evaluates to NULL
so the overall expression evaluates to NULL
(because it depends on the value of the NULL
).Thus we keep only Ace and Ada.
NULL
exampleSELECT * FROM (
SELECT 'Ace' AS name, 20 AS age, 4 as num_dogs
UNION
SELECT 'Ada' AS name, NULL AS age, 3 as num_dogs
UNION
SELECT 'Ben' AS name, NULL AS age, NULL as num_dogs
UNION
SELECT 'Cho' AS name, 27 AS age, NULL as num_dogs
) AS temptable
WHERE age <= 20 OR num_dogs = 3;
name | age | num_dogs |
---|---|---|
Ace | 20 | 4 |
Ada | 3 |
The GROUP BY
clause will direct SQL to carry out the query separately for each category in the grouped variable.
COUNT()
, SUM()
, MAX()
, MIN()
, and AVG()
.SELECT subject_race,
COUNT(*) AS num_stops,
SUM(1) AS num_stops_also,
SUM(2) AS double_stops
FROM la_new_orleans_2020_04_01
WHERE vehicle_year BETWEEN 1970 and 1999
GROUP BY subject_race;
subject_race | num_stops | num_stops_also | double_stops |
---|---|---|---|
2217 | 2217 | 4434 | |
asian/pacific islander | 315 | 315 | 630 |
black | 31586 | 31586 | 63172 |
hispanic | 2347 | 2347 | 4694 |
other | 23 | 23 | 46 |
unknown | 211 | 211 | 422 |
white | 11475 | 11475 | 22950 |
For those people whose cars are between 1970 and 1999, how old is the youngest and oldest person for each subject_race
?
SELECT subject_race,
COUNT(*) AS num_stops,
MIN(subject_age) AS min_age,
MAX(subject_age) AS max_age
FROM la_new_orleans_2020_04_01
WHERE vehicle_year BETWEEN 1970 and 1999
GROUP BY subject_race;
subject_race | num_stops | min_age | max_age |
---|---|---|---|
2217 | |||
asian/pacific islander | 315 | 17 | 78 |
black | 31586 | 10 | 100 |
hispanic | 2347 | 13 | 87 |
other | 23 | 19 | 68 |
unknown | 211 | 17 | 81 |
white | 11475 | 12 | 99 |
GROUP BY
will work applied to multiple columns.
SELECT subject_race,
COUNT(*) AS num_stops,
MIN(subject_age) AS min_age,
MAX(subject_age) AS max_age,
DAYNAME(date) AS wday
FROM la_new_orleans_2020_04_01
GROUP BY subject_race, wday;
subject_race | num_stops | min_age | max_age | wday |
---|---|---|---|---|
1800 | Friday | |||
1590 | Monday | |||
1332 | Saturday | |||
1172 | Sunday | |||
1953 | Thursday | |||
1877 | Tuesday | |||
2006 | Wednesday | |||
asian/pacific islander | 551 | 16 | 78 | Friday |
asian/pacific islander | 484 | 13 | 78 | Monday |
asian/pacific islander | 475 | 15 | 78 | Saturday |
ORDER BY
allows us to look at interesting aspects of the data by sorting the data.
SELECT subject_race,
COUNT(*) AS num_stops,
MIN(subject_age) AS min_age,
MAX(subject_age) AS max_age,
DAYNAME(date) AS wday
FROM la_new_orleans_2020_04_01
GROUP BY subject_race, wday
ORDER BY min_age ASC;
subject_race | num_stops | min_age | max_age | wday |
---|---|---|---|---|
1332 | Saturday | |||
1172 | Sunday | |||
1877 | Tuesday | |||
2006 | Wednesday | |||
1953 | Thursday | |||
1800 | Friday | |||
1590 | Monday | |||
black | 49945 | 10 | 110 | Friday |
black | 42593 | 10 | 106 | Saturday |
black | 39252 | 10 | 92 | Sunday |
WHAT?!?!! How are 10 year olds getting pulled over for traffic stops?
SELECT subject_race,
COUNT(*) AS num_stops,
MIN(subject_age) AS min_age,
MAX(subject_age) AS max_age,
DAYNAME(date) AS wday
FROM la_new_orleans_2020_04_01
GROUP BY subject_race, wday
ORDER BY max_age DESC;
subject_race | num_stops | min_age | max_age | wday |
---|---|---|---|---|
black | 49945 | 10 | 110 | Friday |
black | 56459 | 10 | 110 | Tuesday |
black | 57153 | 10 | 110 | Wednesday |
black | 54858 | 10 | 110 | Thursday |
hispanic | 2007 | 11 | 110 | Thursday |
black | 49557 | 10 | 110 | Monday |
white | 18494 | 11 | 107 | Friday |
black | 42593 | 10 | 106 | Saturday |
white | 20764 | 11 | 105 | Wednesday |
white | 15778 | 11 | 104 | Sunday |
So many people over 100 years old… the data seem fishy!
SELECT subject_race,
COUNT(*) AS num_stops,
MIN(subject_age) AS min_age,
MAX(subject_age) AS max_age,
DAYNAME(date) AS wday
FROM la_new_orleans_2020_04_01
GROUP BY subject_race, wday
ORDER BY wday, subject_race;
subject_race | num_stops | min_age | max_age | wday |
---|---|---|---|---|
black | 2 | 17 | 24 | |
white | 2 | 23 | 57 | |
1800 | Friday | |||
asian/pacific islander | 551 | 16 | 78 | Friday |
black | 49945 | 10 | 110 | Friday |
hispanic | 1851 | 11 | 83 | Friday |
other | 61 | 19 | 66 | Friday |
unknown | 510 | 17 | 75 | Friday |
white | 18494 | 11 | 107 | Friday |
1590 | Monday |
Note that both GROUP BY
and ORDER BY
evaluate the data after it has been retrieved. Therefore, the functions operate on the results set, not the original rows of the data.
We are able to GROUP BY
and ORDER BY
on the new variables we had created, wday
.
Recall that WHERE
acts only on the original data. If we are interested in traffic stops that happened on Friday
, we need to use the derived variable wday
instead of the raw variable date
. Fortunately, HAVING
works on the results set.
SELECT subject_race,
COUNT(*) AS num_stops,
MIN(subject_age) AS min_age,
MAX(subject_age) AS max_age,
DAYNAME(date) AS wday
FROM la_new_orleans_2020_04_01
GROUP BY subject_race, wday
HAVING wday = 'Friday';
subject_race | num_stops | min_age | max_age | wday |
---|---|---|---|---|
1800 | Friday | |||
asian/pacific islander | 551 | 16 | 78 | Friday |
black | 49945 | 10 | 110 | Friday |
hispanic | 1851 | 11 | 83 | Friday |
other | 61 | 19 | 66 | Friday |
unknown | 510 | 17 | 75 | Friday |
white | 18494 | 11 | 107 | Friday |
While it worked out quite well for us that HAVING
was able to filter the data based on the results set, the use of HAVING
was quite onerous because the entire data set was considered before the filter was applied. That is, if the filter can be done on the original data using WHERE
, the query will be much faster and more efficient.
Note: HAVING
requires a GROUP BY
clause. And the variable(s) used in HAVING
must also be part of the GROUP BY
clause.
LIMIT
truncates the query to specified rows. The first number is the offset (i.e., the number of rows to skip), the second number is the (maximum) number of rows to return. Here, we return rows 154219 through 154228.
The first number is optional.
raw_row_number | date | time | location | lat | lng | district | zone | subject_age | subject_race | subject_sex | officer_assignment | type | arrest_made | citation_issued | warning_issued | outcome | contraband_found | contraband_drugs | contraband_weapons | frisk_performed | search_conducted | search_person | search_vehicle | search_basis | reason_for_stop | vehicle_color | vehicle_make | vehicle_model | vehicle_year | raw_actions_taken | raw_subject_race |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
442170 | 2017-04-20 | 13:15:00 | Gentilly Blvd & Franklin Ave | 30.0 | -90.0 | 3 | Y | 31 | black | male | 3rd District | vehicular | 0 | 0 | 1 | warning | 0 | 0 | 0 | 0 | TRAFFIC VIOLATION | BLACK | DODGE | CHARGER | 2008 | Stop Results: Verbal Warning;Subject Type: Driver;Search Occurred: No;Evidence Seized: No;Consent To Search: No;Exit Vehicle: No | BLACK | ||||
442169 | 2017-04-20 | 13:15:00 | Gentilly Blvd & Franklin Ave | 30.0 | -90.0 | 3 | Y | 32 | black | male | 3rd District | vehicular | 1 | 0 | 0 | arrest | 0 | 0 | 0 | 0 | TRAFFIC VIOLATION | BLACK | DODGE | CHARGER | 2008 | Stop Results: Physical Arrest;Subject Type: Passenger;Search Occurred: No;Evidence Seized: No;Consent To Search: No;Exit Vehicle: No | BLACK | ||||
442094 | 2017-04-20 | 01:16:00 | Broadway & Freret St | 29.9 | -90.1 | 2 | H | 41 | black | male | 2nd District | pedestrian | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | consent | SUSPECT PERSON | Stop Results: No action taken;Subject Type: Pedestrian;Search Occurred: Yes;Search Types: Pat-down;Evidence Seized: No;Legal Basises: Consent to search;Consent To Search: Yes;Exit Vehicle: No | BLACK | |||||
442171 | 2017-04-20 | 13:17:00 | Franklin Ave & Gentilly Blvd | 30.0 | -90.0 | 3 | Y | 27 | black | male | 3rd District | vehicular | 0 | 1 | 0 | citation | 0 | 0 | 0 | 0 | TRAFFIC VIOLATION | GRAY | PONTIAC | GRAND PRIX | 2008 | Stop Results: Citation Issued;Subject Type: Driver;Search Occurred: No;Evidence Seized: No;Consent To Search: No;Exit Vehicle: Yes | BLACK | ||||
442093 | 2017-04-20 | 01:22:00 | S Claiborne Ave & Thalia St | 29.9 | -90.1 | 6 | J | 28 | white | female | 6th District | vehicular | 0 | 0 | 1 | warning | 0 | 0 | 0 | 0 | TRAFFIC VIOLATION | BLUE | JEEP | GRAND CHEROKEE | 2002 | Stop Results: Verbal Warning;Subject Type: Driver;Search Occurred: No;Evidence Seized: No;Consent To Search: No;Exit Vehicle: No | WHITE | ||||
442162 | 2017-04-20 | 13:27:00 | 007XX Saint Charles Ave | 8 | H | 30 | white | male | 8th District | vehicular | 0 | 1 | 0 | citation | 0 | 0 | 0 | 0 | TRAFFIC VIOLATION | YELLOW | DODGE | GRAND CARAVAN | 2013 | Stop Results: Citation Issued;Subject Type: Driver;Search Occurred: No;Evidence Seized: No;Consent To Search: No;Exit Vehicle: No | WHITE | ||||||
442160 | 2017-04-20 | 13:27:00 | Tulane Ave & S Jefferson Davis Pkwy | 30.0 | -90.1 | 1 | L | 22 | black | female | 1st District | vehicular | 0 | 1 | 1 | citation | 0 | 0 | 0 | 0 | TRAFFIC VIOLATION | WHITE | CHEVROLET | OTHER | 2007 | Stop Results: Citation Issued;Stop Results: Verbal Warning;Subject Type: Driver;Search Occurred: No;Evidence Seized: No;Consent To Search: No;Exit Vehicle: No | BLACK | ||||
442159 | 2017-04-20 | 13:31:00 | Diana St & Wagner St | 29.9 | -90.0 | 4 | H | 20 | black | female | 4th District | vehicular | 0 | 0 | 1 | warning | 0 | 0 | 0 | 0 | TRAFFIC VIOLATION | WHITE | NISSAN | ALTIMA | 2003 | Stop Results: Verbal Warning;Subject Type: Driver;Search Occurred: No;Evidence Seized: No;Consent To Search: No;Exit Vehicle: No | BLACK | ||||
442161 | 2017-04-20 | 13:35:00 | S Jefferson Davis Pkwy & Tulane Ave | 30.0 | -90.1 | 1 | L | 28 | black | male | 1st District | vehicular | 0 | 1 | 0 | citation | 0 | 0 | 0 | 0 | 1 | 1 | 1 | consent | TRAFFIC VIOLATION | BLACK | HONDA | ACCORD | 1991 | Stop Results: Citation Issued;Subject Type: Driver;Search Occurred: Yes;Search Types: Vehicle;Evidence Seized: No;Legal Basises: Consent to search;Consent To Search: Yes;Exit Vehicle: Yes | BLACK |
442650 | 2017-04-20 | 13:37:00 | Saint Louis St & N Peters St | 30.0 | -90.1 | 8 | D | 57 | black | male | 8th District | 0 | 1 | 0 | citation | 0 | 0 | 0 | 0 | CRIMINAL VIOLATION | Stop Results: Citation Issued;Subject Type: Pedestrian;Search Occurred: No;Evidence Seized: No;Consent To Search: No;Exit Vehicle: No | BLACK |
If you are working in R to run SQL commands, you may want to use the query output for further analysis or visualizations.
#| output.var: "name_of_variable"
inside the {sql}
chunk.name_of_variable
will then be available to be used in the R environment.Always a good idea to terminate the SQL connection when you are done with it.