SQL basics for software engineers. Each section will outline some basic SQL syntax and operations. I will use an example to walk through some basic concepts.
For our example data we will work with two tables the first is a list of orders filled by a chip company.
Table orders:
| order_id | order_date | buyer_id | num_bags |
|----------|------------|----------|----------|
| 1 | 2022-03-01 | b1 | 3 |
| 2 | 2022-03-01 | b2 | 12 |
| 3 | 2022-03-01 | b1 | 2 |
...
Table buyer_location:
| buyer_id | state |
|----------|-------|
| b1 | NY |
| b2 | CA |
| b3 | NY |
...
We can use the SELECT LIMIT and FROM keywords to get a better idea about what data is in our database.
SELECT *
FROM orders
LIMIT 10
Results interactive:
| order_id | order_date | buyer_id | num_bags |
|----------|------------|----------|----------|
| 1 | 2022-03-01 | b1 | 3 |
| 2 | 2022-03-01 | b2 | 12 |
| 3 | 2022-03-01 | b1 | 2 |
| 4 | 2022-03-03 | b3 | 16 |
| 5 | 2022-03-21 | b1 | 2 |
| 6 | 2022-04-05 | b1 | 5 |
| 7 | 2022-04-13 | b2 | 9 |
| 8 | 2022-04-14 | b4 | 4 |
| 9 | 2022-04-23 | b1 | 5 |
| 10 | 2022-04-30 | b1 | 2 |
SELECT *
returns all the columns. If we want less data we can select only a sub-set of the fields.
SELECT order_id, order_date
FROM orders
LIMIT 10
Results interactive:
| order_id | order_date |
|----------|------------|
| 1 | 2022-03-01 |
| 2 | 2022-03-01 |
| 3 | 2022-03-01 |
| 4 | 2022-03-03 |
| 5 | 2022-03-21 |
| 6 | 2022-04-05 |
| 7 | 2022-04-13 |
| 8 | 2022-04-14 |
| 9 | 2022-04-23 |
| 10 | 2022-04-30 |
To get the most recent 10 orders we can modify our query:
SELECT *
FROM orders
ORDER BY order_date DESC
LIMIT 10
Results interactive:
| order_id | order_date | buyer_id | num_bags |
|----------|------------|----------|----------|
| 19 | 2022-06-16 | b1 | 9 |
| 18 | 2022-06-14 | b7 | 4 |
| 17 | 2022-06-03 | b1 | 23 |
| 16 | 2022-05-22 | b2 | 18 |
| 15 | 2022-05-14 | b6 | 3 |
| 14 | 2022-05-08 | b2 | 5 |
| 12 | 2022-05-06 | b5 | 12 |
| 13 | 2022-05-06 | b1 | 14 |
| 11 | 2022-05-02 | b2 | 30 |
| 10 | 2022-04-30 | b1 | 2 |
To change direction you can use:
We might want to get all the orders for one buyer_id
. The WHERE clause defines a boolean filter to restrict query results
SELECT *
FROM orders
WHERE buyer_id='b2'
Results interactive:
| order_id | order_date | buyer_id | num_bags |
|----------|------------|----------|----------|
| 2 | 2022-03-01 | b2 | 12 |
| 7 | 2022-04-13 | b2 | 9 |
| 11 | 2022-05-02 | b2 | 30 |
| 14 | 2022-05-08 | b2 | 5 |
| 16 | 2022-05-22 | b2 | 18 |
Get total number of orders; COUNT() gets the number of occurrences.
SELECT COUNT(*)
FROM orders
Results interactive:
| COUNT(*) |
|----------|
| 19 |
Get total number of bags sold; SUM() adds the values.
SELECT SUM(num_bags)
FROM orders
Results:
| SUM(num_bags) |
|---------------|
| 178 |
Get average number of bags per order.
SELECT AVG(num_bags)
FROM orders
Results:
| AVG(num_bags) |
|-------------------|
| 9.368421052631579 |
Get total number of bags for each buyer_id
.
SELECT buyer_id, SUM(num_bags)
FROM orders
GROUP BY buyer_id
Results interactive:
| buyer_id | SUM(num_bags) |
|----------|---------------|
| b1 | 65 |
| b2 | 74 |
| b3 | 16 |
| b4 | 4 |
| b5 | 12 |
| b6 | 3 |
| b7 | 4 |
Show order info only for buyers in NY.
SELECT *
FROM orders
LEFT JOIN buyer_location
ON orders.buyer_id = buyer_location.buyer_id
WHERE state='NY'
Results interactive:
| order_id | order_date | buyer_id | num_bags | buyer_id | state |
|----------|------------|----------|----------|----------|-------|
| 1 | 2022-03-01 | b1 | 3 | b1 | NY |
| 3 | 2022-03-01 | b1 | 2 | b1 | NY |
| 4 | 2022-03-03 | b3 | 16 | b3 | NY |
| 5 | 2022-03-21 | b1 | 2 | b1 | NY |
| 6 | 2022-04-05 | b1 | 5 | b1 | NY |
| 9 | 2022-04-23 | b1 | 5 | b1 | NY |
| 10 | 2022-04-30 | b1 | 2 | b1 | NY |
| 13 | 2022-05-06 | b1 | 14 | b1 | NY |
| 17 | 2022-06-03 | b1 | 23 | b1 | NY |
| 19 | 2022-06-16 | b1 | 9 | b1 | NY |
Combine the above JOIN with a group by to get the number of bags ordered for each state.
WITH orders_and_state AS(
SELECT *
FROM orders
LEFT JOIN buyer_location
ON orders.buyer_id = buyer_location.buyer_id
)
SELECT state, SUM(num_bags)
FROM orders_and_state
GROUP BY state
Results interactive:
| state | SUM(num_bags) |
|-------|---------------|
| CA | 86 |
| NY | 81 |
| PA | 3 |
| TX | 4 |
| WA | 4 |
This blog post covered some common SQL operations and provided examples.
Get all the buyers from states that contain the letter ‘A’:
SELECT *
FROM buyer_location
WHERE state LIKE '%A%'
Results interactive:
| buyer_id | state |
|----------|-------|
| b2 | CA |
| b5 | CA |
| b6 | PA |
| b7 | WA |
Get total number of bags ordered each month
You can generates months info by parsing the order_date
:
SELECT order_id,
order_date,
strftime('%m', order_date) AS month
FROM orders
Results:
| order_id | order_date | month |
|----------|------------|-------|
| 1 | 2022-03-01 | 03 |
| 2 | 2022-03-01 | 03 |
| 3 | 2022-03-01 | 03 |
Now we just need to GROUP BY to get the totals:
SELECT strftime('%Y', order_date) AS year,
strftime('%m', order_date) AS month,
SUM(num_bags)
FROM orders
GROUP BY year, month
Results interactive:
| year | month | SUM(num_bags) |
|------|-------|---------------|
| 2022 | 03 | 35 |
| 2022 | 04 | 25 |
| 2022 | 05 | 82 |
| 2022 | 06 | 36 |
SQLime is an SQLite playground that runs in your browser. It allows me to upload example databases and code samples into an environment so you can run. SQLite is probably more limited than the runtime you are used to in a production system, but working within these limitations allows me to distribute examples to you.
Different databases are optimized for different access patterns.