What's in a SQL query?
SELECT * FROM book WHERE price > 15.00 ORDER BY title;
SELECT query is composed of several components, but I won't introduce them in the order they appear in a query because I believe it's a backwards way of thinking.
The first I think about when writing a SQL query is what data my question ultimately applies to. For example, in the query above the expression
price > 15.00 removes all books that cost $15.00 or less. Of course multiple conditions can be provided to slice up the data in any way.
... WHERE price < 10.0 OR price > 100.00
WHERE clause defines the conditions that are used to filter the data.
Conditions most commonly act on each row individually. That is, the statement
price < 10.0 AND price > 100.0 will test the
price in each row and remove the rows that don't pass.
An aggregation is composed of one or more aggregate functions to be applied to the table rows.
Count the number of books
SELECT COUNT(*) FROM book
Calculate the average price of all books
SELECT AVG(price) FROM book
Multiple aggregation calculations
SELECT MIN(price), MAX(price), AVG(price) FROM book
Using aggregate functions like this will always return a single row.
As an aside, most database engines will throw an error if a non-aggregated value is added to the
SELECT caluse, for example
SELECT price, AVG(price) FROM book. SQLite will gladly return an arbitrary value for that
As the name suggests, the
GROUP BY clause enables grouping rows by value. The simple thing to remember is that any columns that appear in the
GROUP BY must also appear in the
Group rows by genre
SELECT genre FROM book GROUP BY genre
The above query returns one row per
genre. Multiple columns can be added to group rows by each combination of values.
Group rows by genre and author
SELECT genre, author FROM book GROUP BY genre, author
An aside, there is a shortcut available to the
GROUP BY statements above using the
Select distinct values
SELECT DISTINCT genre, author FROM book
GROUP BY Aggregations
GROUP BY clause is used with aggregate functions. The aggregate functions are effectively applied to each group individually.
Count of books per genre
SELECT genre, COUNT(*) FROM book GROUP BY genre
Average price of book per genre
SELECT genre, AVG(price) FROM book GROUP BY genre
It is common to ask questions about aggregate calculations. One might think these conditions could just be throw in the
WHERE clause with the others, but there is a separate clause called
HAVING for this.
Average price of book per genre above $10
SELECT genre, AVG(price) FROM book GROUP BY genre HAVING AVG(price) > 10.0
AVG(price) in the
SELECT clause is not required, but included because it is typically desired to have the calculated averages in the result.
The reason is being the
GROUP BY is applied after the data has been filtered by the
WHERE clause. Likewise, the
HAVING clause is applied after the
GROUP BY clause. Again this is because the aggregate calculations apply after each group has been defined, and then the those aggregated rows will be filter further by the
Genres with more than 100 books within the $10-$20 price range
SELECT genre, COUNT(*) FROM book WHERE price BETWEEN 10 AND 20 GROUP BY genre HAVING COUNT(*) > 100
And now... SELECT
I purposefully deferred talking about the
SELECT clause until now because I believe it is important to think about the constraints of a question before worrying about what data you actually want to view.
As stated above, the only real constraint with what must be in the
SELECT clause are the columns that are declared in the
GROUP BY clause and vice versa (excluding aggregate functions of course).
Expressions can also be used to return a representation of the data.
SELECT price, (price * 0.08) AS tax FROM book
Note, parentheses were added for visual clarity. They are typically not required.
This text by no means covers the totality of the Structred Query Language (SQL), but rather presents a way of thinking when attempting to write a query. I have gotten caught up many a time when attempting to deconstruct a natural language question and writing in SQL.
The one topic I left out is regarding table joins. I did this because it can be overwhelming to think about since the number of columns and rows may grow expontentially depending on the type of join.
Again, the way to think about joins is that you are effectively starting with a larger table. When you perform a join, you are effectivaly annotating one table with the data from another. As a quick example:
Count the books per author
SELECT author.name, COUNT(*) FROM book, author WHERE book.author_id = author.id GROUP BY author.name
The join in the above query produces a table that looks like this (the redundant
id columns are the primary keys of each table):
id title price genre author_id id name -- ----- ----- ----- --------- -- ----
Once the table is constructed (from the database engine's point-of-view) the rest of the query components will be applied as expected.