library(mdsr)
db <- dbConnect_scidb("airlines") # establish a connection to the database "airlines"SQL basics
$$
$$
In this note we will access a database made available as a companion to the textbook Modern Data Science with R by Baumer, Kaplan, and Horton (2021). This note follows very closely a note prepared by Dr. Will Consagra. More examples of SQL queries on this database can be found here.
A database is a collection of (usually large) data sets, not stored on your local machine, and organized a way that facilitates retrieval of selected subsets of the data. SQL stands for structured query language and is a language used to retrieve data stored in a database. SQL commands are not executed by your local machine, but rather by the machine storing the data, and the results of the “queries” submitted in SQL are sent to your machine. This allows you to work with massive data sets without storing them on your own computer.
The first step in retrieving data with SQL queries is to create a connection to a database. We will do this from within R using the library mdsr, which enables connection to a database called “airlines” stored on an Amazon Web Services server. The following R commands establish the connection:
The airlines database contains information on about 18 million flights between US airports taking place during the years 2013–2015.
Exploring a database
Now that we have created a connection to a database, we can submit SQL queries to retrieve data from it. If we are compiling a .qmd document in Rstudio, we can execute SQL in a code chunk. We just need to open the code chunk with {sql connection=<db>}, where <db> is whatever you have named the database connection.
The data sets in a database are called tables. To see a list of the tables, we can submit the following SQL command:
SHOW TABLES| Tables_in_airlines |
|---|
| airports |
| carriers |
| flights |
| flights_summary |
| planes |
We see that there are five tables, or data sets in the “airlines” database.
In order to get a summary of a particular table, we can use DESCRIBE:
DESCRIBE flights| Field | Type | Null | Key | Default | Extra |
|---|---|---|---|---|---|
| year | smallint(4) | YES | NA | ||
| month | smallint(2) | YES | NA | ||
| day | smallint(2) | YES | NA | ||
| dep_time | smallint(4) | YES | NA | ||
| sched_dep_time | smallint(4) | YES | NA | ||
| dep_delay | smallint(4) | YES | NA | ||
| arr_time | smallint(4) | YES | NA | ||
| sched_arr_time | smallint(4) | YES | NA | ||
| arr_delay | smallint(4) | YES | NA | ||
| carrier | varchar(2) | NO |
DESCRIBE carriers| Field | Type | Null | Key | Default | Extra |
|---|---|---|---|---|---|
| carrier | varchar(7) | NO | PRI | ||
| name | varchar(255) | NO |
DESCRIBE planes| Field | Type | Null | Key | Default | Extra |
|---|---|---|---|---|---|
| tailnum | varchar(6) | NO | PRI | ||
| year | int(11) | YES | NA | ||
| type | text | YES | NA | ||
| manufacturer | text | YES | NA | ||
| model | text | YES | NA | ||
| engines | int(11) | YES | NA | ||
| seats | int(11) | YES | NA | ||
| speed | int(11) | YES | NA | ||
| engine | text | YES | NA |
DESCRIBE airports| Field | Type | Null | Key | Default | Extra |
|---|---|---|---|---|---|
| faa | varchar(3) | NO | PRI | ||
| name | varchar(255) | YES | NA | ||
| lat | decimal(10,7) | YES | NA | ||
| lon | decimal(10,7) | YES | NA | ||
| alt | int(11) | YES | NA | ||
| tz | smallint(4) | YES | NA | ||
| dst | char(1) | YES | NA | ||
| city | varchar(255) | YES | NA | ||
| country | varchar(255) | YES | NA |
The two pdfs airports_table_schema and flights_table_schema give descriptions of the airports and flights tables in the airlines database.
Retrieving data
An SQL query can have the following commands:
SELECT: Set which fields, i.e. variables/columns, from which to retrieve values.FROM: Set table or tables you want to retrieve values from.JOIN: Merge tables together.WHERE: Retrieve records (rows of data) conditionally .GROUP BY: Aggregate records.HAVING: Conditionally output records based on newly created variables.ORDER BY: Sort the records in the output data set.LIMIT: Set the number of records to output.
We will go through each of these.
SELECT and FROM
Every SQL query must have a SELECT and a FROM command.
To get the names of the carriers in the “carriers” table, we can run the following:
SELECT name
FROM carriers| name |
|---|
| Titan Airways |
| Tradewind Aviation |
| Comlux Aviation, AG |
| Master Top Linhas Aereas Ltd. |
| Flair Airlines Ltd. |
| Swift Air, LLC |
| DCA |
| ACM AIR CHARTER GmbH |
| Inter Island Airways, d/b/a Inter Island Air |
| Polar Airlines de Mexico d/b/a Nova Air |
Here we pull several fields from the “flights” table, but we limit the number of records to pull so we don’t get an enormous amount of data:
SELECT carrier, origin, dest, dep_time, year, month, day
FROM flights
LIMIT 20| carrier | origin | dest | dep_time | year | month | day |
|---|---|---|---|---|---|---|
| AA | LAX | DFW | 2 | 2013 | 10 | 1 |
| FL | SFO | ATL | 4 | 2013 | 10 | 1 |
| AA | SFO | DFW | 11 | 2013 | 10 | 1 |
| AA | SEA | ORD | 14 | 2013 | 10 | 1 |
| UA | LAX | IAH | 16 | 2013 | 10 | 1 |
| UA | SFO | IAH | 22 | 2013 | 10 | 1 |
| US | LAX | CLT | 29 | 2013 | 10 | 1 |
| AS | ANC | SEA | 29 | 2013 | 10 | 1 |
| DL | SEA | MSP | 31 | 2013 | 10 | 1 |
| DL | LAX | MSP | 32 | 2013 | 10 | 1 |
We can also construct new variables out of the ones in the table. One way to do this is with CONCAT and AS:
SELECT CONCAT(name,' (',faa,')') AS airport, city AS location
FROM airports
ORDER BY name
LIMIT 10| airport | location |
|---|---|
| Aberdeen Regional Airport (ABR) | Aberdeen |
| Abilene Rgnl (ABI) | Abilene |
| Abraham Lincoln Capital (SPI) | Springfield |
| Acadiana Rgnl (ARA) | Louisiana |
| Adak Airport (ADK) | Adak Island |
| Adams Fld (LIT) | Little Rock |
| Addison (ADS) | Addison |
| Adirondack Regional Airport (SLK) | Saranac Lake |
| Akhiok Airport (AKK) | Akhiok |
| Akiak Airport (AKI) | Akiak |
Note that in the above query we also used AS to rename a field. We also used ORDER BY to sort the resulting table.
WHERE
Now suppose we only want flights departing from Dallas/Forth-Worth (DFW). We can use WHERE as follows:
SELECT carrier, origin, dest, year, month, day
FROM flights
WHERE origin = 'DFW'
LIMIT 20| carrier | origin | dest | year | month | day |
|---|---|---|---|---|---|
| AA | DFW | MIA | 2014 | 10 | 1 |
| US | DFW | CLT | 2013 | 10 | 1 |
| US | DFW | CLT | 2014 | 10 | 1 |
| OO | DFW | IAH | 2013 | 10 | 1 |
| AA | DFW | ORD | 2015 | 10 | 1 |
| AA | DFW | CLT | 2015 | 10 | 1 |
| DL | DFW | ATL | 2013 | 10 | 1 |
| DL | DFW | ATL | 2014 | 10 | 1 |
| UA | DFW | ORD | 2014 | 10 | 1 |
| AA | DFW | MIA | 2013 | 10 | 1 |
We can use AND to add more conditions:
SELECT carrier, origin, dest, year, month, day
FROM flights
WHERE origin = 'DFW' AND year = 2013 AND month = 1 AND day = 15
LIMIT 20| carrier | origin | dest | year | month | day |
|---|---|---|---|---|---|
| US | DFW | CLT | 2013 | 1 | 15 |
| OO | DFW | IAH | 2013 | 1 | 15 |
| AA | DFW | ORD | 2013 | 1 | 15 |
| AA | DFW | MIA | 2013 | 1 | 15 |
| AA | DFW | LGA | 2013 | 1 | 15 |
| DL | DFW | ATL | 2013 | 1 | 15 |
| OO | DFW | LAX | 2013 | 1 | 15 |
| MQ | DFW | HOU | 2013 | 1 | 15 |
| OO | DFW | ORD | 2013 | 1 | 15 |
| US | DFW | PHX | 2013 | 1 | 15 |
We can also use OR. Here we collect all the flight departing or arriving at the Columbia Metropolitan Airport (CAE) on a particular day. Note that we can use parentheses to enforce an order of operations:
SELECT carrier, origin, dest, year, month, day
FROM flights
WHERE (origin = 'CAE' OR dest = 'CAE') AND year = 2013 AND month = 1 AND day = 15
LIMIT 20| carrier | origin | dest | year | month | day |
|---|---|---|---|---|---|
| EV | CAE | ATL | 2013 | 1 | 15 |
| EV | CAE | IAD | 2013 | 1 | 15 |
| MQ | CAE | DFW | 2013 | 1 | 15 |
| EV | CAE | ORD | 2013 | 1 | 15 |
| DL | CAE | ATL | 2013 | 1 | 15 |
| EV | CAE | ATL | 2013 | 1 | 15 |
| MQ | DFW | CAE | 2013 | 1 | 15 |
| EV | CAE | ATL | 2013 | 1 | 15 |
| EV | CAE | ORD | 2013 | 1 | 15 |
| EV | ATL | CAE | 2013 | 1 | 15 |
We can convert the year, month, and day information to an actual date using the STR_TO_DATE function. Let’s consider flights departing from CAE with destination IAH.
SELECT carrier, origin, dest, STR_TO_DATE(CONCAT(year,'/', month,'/', day),"%Y/%m/%d") AS date
FROM flights
WHERE origin = 'CAE' AND dest = 'IAH'
LIMIT 50| carrier | origin | dest | date |
|---|---|---|---|
| EV | CAE | IAH | 2015-10-01 |
| EV | CAE | IAH | 2014-10-01 |
| EV | CAE | IAH | 2013-10-01 |
| EV | CAE | IAH | 2014-10-02 |
| EV | CAE | IAH | 2015-10-02 |
| EV | CAE | IAH | 2013-10-02 |
| EV | CAE | IAH | 2014-10-03 |
| EV | CAE | IAH | 2013-10-03 |
| EV | CAE | IAH | 2015-10-04 |
| EV | CAE | IAH | 2013-10-04 |
Get a list of mile-high airports and sort them by altitude in descending order:
SELECT faa, name, alt
FROM airports
WHERE alt >= 5280
ORDER BY alt DESC
LIMIT 20| faa | name | alt |
|---|---|---|
| TEX | Telluride | 9078 |
| TVL | Lake Tahoe Airport | 8544 |
| ASE | Aspen Pitkin County Sardy Field | 7820 |
| GUC | Gunnison - Crested Butte | 7678 |
| BCE | Bryce Canyon | 7590 |
| ALS | San Luis Valley Regional Airport | 7539 |
| LAR | Laramie Regional Airport | 7284 |
| LAM | Los Alamos Airport | 7171 |
| EVW | Evanston-Uinta CO Burns Fld | 7143 |
| MMH | Mammoth Yosemite Airport | 7128 |
Operators BETWEEN, DISTINCT, and IN
The BETWEEN operator lets us specify a range of values:
SELECT CONCAT(name,' (',faa,')') AS airport, city, lat, lon
FROM airports
WHERE lon BETWEEN -90 AND -80 AND lat BETWEEN 45 AND 50
ORDER BY airport| airport | city | lat | lon |
|---|---|---|---|
| Alpena County Regional Airport (APN) | Alpena | 45.07810 | -83.56030 |
| Charlevoix Municipal Airport (CVX) | Charelvoix | 45.30478 | -85.27533 |
| Chippewa County International Airport (CIU) | Sault Ste Marie | 46.25080 | -84.47240 |
| Delta County Airport (ESC) | Escanaba | 45.72278 | -87.09361 |
| Drummond Island Airport (DRM) | Drummond Island | 46.00931 | -83.74393 |
| Eagle River (EGV) | Eagle River | 45.93233 | -89.26828 |
| Ephraim-Gibraltar Airport (3D2) | Ephraim | 45.13542 | -87.18794 |
| Ford Airport (IMT) | Iron Mountain | 45.81836 | -88.11456 |
| Houghton County Memorial Airport (CMX) | Hancock | 47.16840 | -88.48910 |
| Lakeland (ARV) | Minocqua - Woodruff | 45.92778 | -89.73083 |
Note that we do not have to SELECT the fields used in the WHERE statement.
The DISTINCT operator selects only the unique values of a field. We can use DISTINCT to see that all the airports in the airports table are within the United States.
SELECT DISTINCT country
FROM airports| country |
|---|
| United States |
Here are all the distinct destinations flown to from CAE in the year 2015:
SELECT DISTINCT dest AS Destination
FROM flights
WHERE origin = 'CAE' AND year = 2015| Destination |
|---|
| ATL |
| LGA |
| ORD |
| DFW |
| IAH |
| IAD |
The IN operator can be used in the WHERE statement in order to keep a record if a field has one of several specified values:
SELECT flight, origin, dest,
STR_TO_DATE(CONCAT(year,'/', month,'/', day),"%Y/%m/%d") AS date
FROM flights
WHERE origin = 'CAE' AND dest IN ('IAH','DFW') AND year = 2014
LIMIT 30| flight | origin | dest | date |
|---|---|---|---|
| 3663 | CAE | DFW | 2014-10-01 |
| 3544 | CAE | DFW | 2014-10-01 |
| 4253 | CAE | IAH | 2014-10-01 |
| 3492 | CAE | DFW | 2014-10-01 |
| 3663 | CAE | DFW | 2014-10-02 |
| 3544 | CAE | DFW | 2014-10-02 |
| 4253 | CAE | IAH | 2014-10-02 |
| 3492 | CAE | DFW | 2014-10-02 |
| 3663 | CAE | DFW | 2014-10-03 |
| 3544 | CAE | DFW | 2014-10-03 |
GROUP BY
We can use GROUP BY to aggregate groups of records sharing the same value of a field. Here we compute the total number of flights leaving each airport in 2013:
SELECT origin, COUNT(*) AS Departures
FROM flights
WHERE year = 2013
GROUP BY origin
ORDER BY Departures DESC
LIMIT 20| origin | Departures |
|---|---|
| ATL | 398878 |
| ORD | 306439 |
| DFW | 286427 |
| DEN | 226597 |
| LAX | 224821 |
| IAH | 181071 |
| PHX | 177027 |
| SFO | 168070 |
| DTW | 163550 |
| CLT | 148745 |
We can put list several fields after GROUP BY. Below we obtain the number of flights departing CAE on each day in the year 2013:
SELECT origin, COUNT(*) AS numDepartures, STR_TO_DATE(CONCAT(year,'/', month,'/', day),"%Y/%m/%d") AS theDate
FROM flights
WHERE year = 2013 AND origin = 'CAE'
GROUP BY year, month, day| origin | numDepartures | theDate |
|---|---|---|
| CAE | 11 | 2013-01-01 |
| CAE | 18 | 2013-01-02 |
| CAE | 18 | 2013-01-03 |
| CAE | 18 | 2013-01-04 |
| CAE | 14 | 2013-01-05 |
| CAE | 15 | 2013-01-06 |
| CAE | 19 | 2013-01-07 |
| CAE | 18 | 2013-01-08 |
| CAE | 17 | 2013-01-09 |
| CAE | 19 | 2013-01-10 |
Besides simply counting the records in each group, we can compute, for example the average number of minutes departures were delayed for each carrier:
SELECT origin, carrier, AVG(dep_delay) AS avg_delay
FROM flights
WHERE year = 2013 AND origin = 'CAE'
GROUP BY carrier| origin | carrier | avg_delay |
|---|---|---|
| CAE | 9E | 14.5786 |
| CAE | DL | 5.8581 |
| CAE | EV | 14.1871 |
| CAE | MQ | 7.4855 |
HAVING
We use HAVING to obtain a subset of records, similarly to how we use WHERE, but we let HAVING operate on newly created variables.
Here we select carriers with over 500 flights departing 2013 from any of the airports CAE, GSP, and CHS:
SELECT carrier, COUNT(*) AS departures
FROM flights
WHERE year = 2013 AND origin IN ('CAE','GSP','CHS')
GROUP BY carrier
HAVING departures > 500 -- HAVING uses new variables you have defined
ORDER BY departures DESC
LIMIT 20| carrier | departures |
|---|---|
| EV | 13105 |
| DL | 5552 |
| WN | 4612 |
| MQ | 3469 |
| 9E | 2947 |
| YV | 2345 |
| B6 | 959 |
In the above, we could move everything in the WHERE statement to the HAVING statement, but it will run slower!! So it is best practice to use HAVING only on newly created variables.
Merging tables
We can write SQL commands to merge data sets in an inner join, a left join, or a right join. One has to specify the “left” table, the type of join, the “right” table, and then instructions for how to match records in the left and right tables.
This info goes on the FROM line like this: FROM (left table) (type of join) (right table) ON (conditions for matching records).
Inner join
Suppose we want to pull into the last table the full names of the carriers, which must be retrieved from the carriers table. We can use an inner join to do this. Note that we must now refer to each field relative to its table as “table.field”:
SELECT carriers.name AS carrier, COUNT(*) AS departures
FROM flights JOIN carriers ON flights.carrier = carriers.carrier
WHERE flights.year = 2013 AND flights.origin IN ('CAE','GSP','CHS')
GROUP BY flights.carrier
HAVING departures > 500 -- HAVING uses new variables you have defined| carrier | departures |
|---|---|
| Endeavor Air Inc. | 2947 |
| JetBlue Airways | 959 |
| Delta Air Lines Inc. | 5552 |
| ExpressJet Airlines Inc. | 13105 |
| Envoy Air | 3469 |
| Southwest Airlines Co. | 4612 |
| Mesa Airlines Inc. | 2345 |
In the FROM line we can designate aliases for the tables in order to make our code more concise. Here is the same query as above:
SELECT c.name AS carrier, COUNT(*) AS departures
FROM flights AS f
JOIN carriers AS c ON f.carrier = c.carrier
WHERE f.year = 2013 AND f.origin IN ('CAE','GSP','CHS')
GROUP BY f.carrier
HAVING departures > 500 -- HAVING uses new variables you have defined
ORDER BY departures DESC| carrier | departures |
|---|---|
| ExpressJet Airlines Inc. | 13105 |
| Delta Air Lines Inc. | 5552 |
| Southwest Airlines Co. | 4612 |
| Envoy Air | 3469 |
| Endeavor Air Inc. | 2947 |
| Mesa Airlines Inc. | 2345 |
| JetBlue Airways | 959 |
Returning to the table showing the average departure delay time for each carrier leaving from CAE, we now merge with this the carriers table in order to display the full name of each carrier. We now group the results by the year as well as by the carrier:
SELECT f.origin, f.year, c.name, AVG(f.dep_delay) AS avg_delay
FROM flights AS f
JOIN carriers AS c ON f.carrier = c.carrier
WHERE f.origin = 'CAE'
GROUP BY f.carrier, f.year| origin | year | name | avg_delay |
|---|---|---|---|
| CAE | 2013 | Endeavor Air Inc. | 14.5786 |
| CAE | 2013 | Delta Air Lines Inc. | 5.8581 |
| CAE | 2014 | Delta Air Lines Inc. | 5.1412 |
| CAE | 2015 | Delta Air Lines Inc. | 4.0054 |
| CAE | 2013 | ExpressJet Airlines Inc. | 14.1871 |
| CAE | 2014 | ExpressJet Airlines Inc. | 13.6217 |
| CAE | 2015 | ExpressJet Airlines Inc. | 11.1207 |
| CAE | 2013 | Envoy Air | 7.4855 |
| CAE | 2014 | Envoy Air | 10.3580 |
| CAE | 2015 | Envoy Air | 14.7472 |
Here we obtain a summary of the kinds of planes departing CAE by merging the flights table with the planes table:
SELECT p.model, p.manufacturer, p.seats, COUNT(*) departures
FROM flights AS f
JOIN planes AS p ON f.tailnum = p.tailnum
WHERE f.origin = 'CAE' AND f.year = '2013'
GROUP BY p.model
ORDER BY departures DESC
LIMIT 20| model | manufacturer | seats | departures |
|---|---|---|---|
| CL-600-2B19 | CANADAIR | 55 | 2455 |
| EMB-145LR | EMBRAER | 55 | 537 |
| MD-88 | MCDONNELL DOUGLAS AIRCRAFT CO | 142 | 325 |
| DC-9-51 | MCDONNELL DOUGLAS | 139 | 206 |
| EMB-145XR | EMBRAER | 55 | 156 |
| CL-600-2C10 | BOMBARDIER INC | 80 | 118 |
| CL-600-2D24 | BOMBARDIER INC | 95 | 51 |
| A319-114 | AIRBUS | 145 | 44 |
| A320-211 | AIRBUS INDUSTRIE | 182 | 17 |
| A320-212 | AIRBUS | 182 | 11 |
As another example, we next make a table of flights which has additional information about the airports not found in the flights table—namely the full name of the airport and its altitude, which we must pull from the airports table. We return only flights from airports at altitudes above 1 mile:
SELECT f.origin, a.name, f.dest, f.carrier, a.alt -- use table aliases here which are defined in FROM
FROM flights AS f JOIN airports AS a ON f.origin = a.faa
WHERE a.alt > 5280
LIMIT 20| origin | name | dest | carrier | alt |
|---|---|---|---|---|
| DEN | Denver Intl | JFK | B6 | 5431 |
| DEN | Denver Intl | CLT | US | 5431 |
| DEN | Denver Intl | IAH | UA | 5431 |
| DRO | Durango La Plata Co | DEN | EV | 6685 |
| COS | City Of Colorado Springs Muni | ORD | OO | 6187 |
| DEN | Denver Intl | PHX | US | 5431 |
| CPR | Natrona Co Intl | SLC | OO | 5347 |
| DEN | Denver Intl | IAD | UA | 5431 |
| DEN | Denver Intl | PHX | WN | 5431 |
| ABQ | Albuquerque International Sunport | SLC | OO | 5355 |
We can merge more than two data sets by adding more JOIN commands. Below we list flights departing in January of 2013 from mile-high airports in planes with over 100 seats, along with the full names of the carriers of these flights:
SELECT a.name AS origin, f.dest,
STR_TO_DATE(CONCAT(f.year,'/', f.month,'/', f.day),"%Y/%m/%d") AS date,
CONCAT(LPAD(f.hour,2,'0'),':',LPAD(f.minute,2,'0')) AS sched_dep_time, -- format scheduled departure time as HH:MM
c.name AS carrier, p.model, p.seats
FROM flights AS f
JOIN airports AS a ON f.origin = a.faa
JOIN carriers AS c ON f.carrier = c.carrier
JOIN planes AS p ON f.tailnum = p.tailnum
WHERE a.alt > 5280 AND f.year = '2013' AND f.month = '1' AND p.seats > 100
LIMIT 20| origin | dest | date | sched_dep_time | carrier | model | seats |
|---|---|---|---|---|---|---|
| Denver Intl | PHL | 2013-01-01 | 07:00 | US Airways Inc. | A320-214 | 182 |
| Denver Intl | PHL | 2013-01-02 | 14:15 | US Airways Inc. | A320-214 | 182 |
| Denver Intl | CLT | 2013-01-06 | 02:00 | US Airways Inc. | A320-214 | 182 |
| Denver Intl | PHL | 2013-01-20 | 07:00 | US Airways Inc. | A320-214 | 182 |
| Denver Intl | PHL | 2013-01-07 | 11:05 | US Airways Inc. | A320-214 | 182 |
| Denver Intl | SFO | 2013-01-03 | 14:35 | United Air Lines Inc. | 737-824 | 149 |
| Denver Intl | SFO | 2013-01-07 | 14:35 | United Air Lines Inc. | 737-824 | 149 |
| Yampa Valley | IAH | 2013-01-21 | 14:07 | United Air Lines Inc. | 737-824 | 149 |
| Denver Intl | IAD | 2013-01-27 | 17:50 | United Air Lines Inc. | 737-824 | 149 |
| Denver Intl | OMA | 2013-01-29 | 19:39 | United Air Lines Inc. | 737-824 | 149 |
We can do multiple joins involving the same data sets. For example, if I want a table giving the full name of both the origin and destination airport of several flights, I can join twice on the airports table. The query below is an example of this.
SELECT f.flight, o.name AS origin, d.name AS destination, p.model
FROM flights AS f
JOIN airports AS o ON f.origin = o.faa
JOIN airports AS d ON f.dest = d.faa
JOIN planes AS p on f.tailnum = p.tailnum
WHERE p.model LIKE '737%' AND f.distance < 500 AND f.year = 2013 AND f.month = 1 AND f.day = 15
LIMIT 20| flight | origin | destination | model |
|---|---|---|---|
| 54 | Louis Armstrong New Orleans Intl | George Bush Intercontinental | 737-824 |
| 1116 | Newark Liberty Intl | General Edward Lawrence Logan Intl | 737-824 |
| 1703 | Newark Liberty Intl | General Edward Lawrence Logan Intl | 737-824 |
| 1211 | Denver Intl | Will Rogers World | 737-824 |
| 1290 | Chicago Ohare Intl | Port Columbus Intl | 737-824 |
| 1285 | Cleveland Hopkins Intl | Newark Liberty Intl | 737-724 |
| 1289 | Newark Liberty Intl | General Edward Lawrence Logan Intl | 737-824 |
| 158 | Hartsfield Jackson Atlanta Intl | Tampa Intl | 737-76N |
| 552 | Tampa Intl | Hartsfield Jackson Atlanta Intl | 737-76N |
| 1623 | General Edward Lawrence Logan Intl | Newark Liberty Intl | 737-824 |
The above query used LIKE, which allows you to search for a substring match. LIKE '737%' matches when a string begins with '737', LIKE '%737' would match on strings ending in '737', and LIKE '%737%' would match when a string contained '737' anywhere.
Left and right joins
An inner join, such as we have performed above, will only include in the output table records for which a match was found. Suppose a flight in the flights table had a carrier which does not appear in the carriers table; in this case, the record would be dropped, or deleted. If we wish to keep all the flights regardless of whether the carrier appears in the carriers table, we can do a left join, which will keep all the records in the left data set.
SELECT f.origin, a.name AS airport, a.alt, c.name AS carrier, f.dest
FROM flights AS f
JOIN airports AS a ON f.origin = a.faa
LEFT JOIN carriers AS c ON f.carrier = c.carrier
WHERE a.alt > 5280
LIMIT 20| origin | airport | alt | carrier | dest |
|---|---|---|---|---|
| DEN | Denver Intl | 5431 | JetBlue Airways | JFK |
| DEN | Denver Intl | 5431 | US Airways Inc. | CLT |
| DEN | Denver Intl | 5431 | United Air Lines Inc. | IAH |
| DRO | Durango La Plata Co | 6685 | ExpressJet Airlines Inc. | DEN |
| COS | City Of Colorado Springs Muni | 6187 | SkyWest Airlines Inc. | ORD |
| DEN | Denver Intl | 5431 | US Airways Inc. | PHX |
| CPR | Natrona Co Intl | 5347 | SkyWest Airlines Inc. | SLC |
| DEN | Denver Intl | 5431 | United Air Lines Inc. | IAD |
| DEN | Denver Intl | 5431 | Southwest Airlines Co. | PHX |
| ABQ | Albuquerque International Sunport | 5355 | SkyWest Airlines Inc. | SLC |
A right join works in the same way.