PearceCodes

SQL Prequel

2022-04-20 - Tags:

SQL basics for software engineers. Each section will outline some basic SQL syntax and operations. I will use an example to walk through some basic concepts.

For our example data we will work with two tables the first is a list of orders filled by a chip company.

Table orders:

| order_id | order_date | buyer_id | num_bags |
|----------|------------|----------|----------|
| 1        | 2022-03-01 | b1       | 3        |
| 2        | 2022-03-01 | b2       | 12       |
| 3        | 2022-03-01 | b1       | 2        |
 ...

Table buyer_location:

| buyer_id | state |
|----------|-------|
| b1       | NY    |
| b2       | CA    |
| b3       | NY    |
 ...

Sample data - SELECT FROM

We can use the SELECT LIMIT and FROM keywords to get a better idea about what data is in our database.

SELECT *
FROM orders
LIMIT 10

Results interactive:

| order_id | order_date | buyer_id | num_bags |
|----------|------------|----------|----------|
| 1        | 2022-03-01 | b1       | 3        |
| 2        | 2022-03-01 | b2       | 12       |
| 3        | 2022-03-01 | b1       | 2        |
| 4        | 2022-03-03 | b3       | 16       |
| 5        | 2022-03-21 | b1       | 2        |
| 6        | 2022-04-05 | b1       | 5        |
| 7        | 2022-04-13 | b2       | 9        |
| 8        | 2022-04-14 | b4       | 4        |
| 9        | 2022-04-23 | b1       | 5        |
| 10       | 2022-04-30 | b1       | 2        |

SELECT * returns all the columns. If we want less data we can select only a sub-set of the fields.

SELECT order_id, order_date
FROM orders
LIMIT 10

Results interactive:

| order_id | order_date |
|----------|------------|
| 1        | 2022-03-01 |
| 2        | 2022-03-01 |
| 3        | 2022-03-01 |
| 4        | 2022-03-03 |
| 5        | 2022-03-21 |
| 6        | 2022-04-05 |
| 7        | 2022-04-13 |
| 8        | 2022-04-14 |
| 9        | 2022-04-23 |
| 10       | 2022-04-30 |

Sorting - ORDER

To get the most recent 10 orders we can modify our query:

SELECT *
FROM orders
ORDER BY order_date DESC
LIMIT 10

Results interactive:

| order_id | order_date | buyer_id | num_bags |
|----------|------------|----------|----------|
| 19       | 2022-06-16 | b1       | 9        |
| 18       | 2022-06-14 | b7       | 4        |
| 17       | 2022-06-03 | b1       | 23       |
| 16       | 2022-05-22 | b2       | 18       |
| 15       | 2022-05-14 | b6       | 3        |
| 14       | 2022-05-08 | b2       | 5        |
| 12       | 2022-05-06 | b5       | 12       |
| 13       | 2022-05-06 | b1       | 14       |
| 11       | 2022-05-02 | b2       | 30       |
| 10       | 2022-04-30 | b1       | 2        |

To change direction you can use:

Filter data - WHERE

We might want to get all the orders for one buyer_id. The WHERE clause defines a boolean filter to restrict query results

SELECT *
FROM orders
WHERE buyer_id='b2'

Results interactive:

| order_id | order_date | buyer_id | num_bags |
|----------|------------|----------|----------|
| 2        | 2022-03-01 | b2       | 12       |
| 7        | 2022-04-13 | b2       | 9        |
| 11       | 2022-05-02 | b2       | 30       |
| 14       | 2022-05-08 | b2       | 5        |
| 16       | 2022-05-22 | b2       | 18       |

Basic aggregations - COUNT(), SUM(), and AVG()

Get total number of orders; COUNT() gets the number of occurrences.

SELECT COUNT(*)
FROM orders

Results interactive:

| COUNT(*) |
|----------|
| 19       |

Get total number of bags sold; SUM() adds the values.

SELECT SUM(num_bags)
FROM orders

Results:

| SUM(num_bags) |
|---------------|
| 178           |

Get average number of bags per order.

SELECT AVG(num_bags)
FROM orders

Results:

| AVG(num_bags)     |
|-------------------|
| 9.368421052631579 |

Group aggregations - GROUP BY

Get total number of bags for each buyer_id.

SELECT buyer_id, SUM(num_bags)
FROM orders
GROUP BY buyer_id

Results interactive:

| buyer_id | SUM(num_bags) |
|----------|---------------|
| b1       | 65            |
| b2       | 74            |
| b3       | 16            |
| b4       | 4             |
| b5       | 12            |
| b6       | 3             |
| b7       | 4             |

Joining data - JOIN

Show order info only for buyers in NY.

SELECT *
FROM orders
LEFT JOIN buyer_location
ON orders.buyer_id = buyer_location.buyer_id
WHERE state='NY'

Results interactive:

| order_id | order_date | buyer_id | num_bags | buyer_id | state |
|----------|------------|----------|----------|----------|-------|
| 1        | 2022-03-01 | b1       | 3        | b1       | NY    |
| 3        | 2022-03-01 | b1       | 2        | b1       | NY    |
| 4        | 2022-03-03 | b3       | 16       | b3       | NY    |
| 5        | 2022-03-21 | b1       | 2        | b1       | NY    |
| 6        | 2022-04-05 | b1       | 5        | b1       | NY    |
| 9        | 2022-04-23 | b1       | 5        | b1       | NY    |
| 10       | 2022-04-30 | b1       | 2        | b1       | NY    |
| 13       | 2022-05-06 | b1       | 14       | b1       | NY    |
| 17       | 2022-06-03 | b1       | 23       | b1       | NY    |
| 19       | 2022-06-16 | b1       | 9        | b1       | NY    |

Combining queries - WITH AS

Combine the above JOIN with a group by to get the number of bags ordered for each state.

WITH orders_and_state AS( 
    SELECT *
    FROM orders
    LEFT JOIN buyer_location
    ON orders.buyer_id = buyer_location.buyer_id
)
SELECT state, SUM(num_bags)
FROM orders_and_state
GROUP BY state

Results interactive:

| state | SUM(num_bags) |
|-------|---------------|
| CA    | 86            |
| NY    | 81            |
| PA    | 3             |
| TX    | 4             |
| WA    | 4             |

Conclusion

This blog post covered some common SQL operations and provided examples.

Appendix

String operations

Get all the buyers from states that contain the letter ‘A’:

SELECT *
FROM buyer_location
WHERE state LIKE '%A%'

Results interactive:

| buyer_id | state |
|----------|-------|
| b2       | CA    |
| b5       | CA    |
| b6       | PA    |
| b7       | WA    |

Date operations

Get total number of bags ordered each month

You can generates months info by parsing the order_date:

SELECT order_id, 
    order_date,
    strftime('%m', order_date) AS month
FROM orders

Results:

| order_id | order_date | month |
|----------|------------|-------|
| 1        | 2022-03-01 | 03    |
| 2        | 2022-03-01 | 03    |
| 3        | 2022-03-01 | 03    |

Now we just need to GROUP BY to get the totals:

SELECT strftime('%Y', order_date) AS year,
  strftime('%m', order_date) AS month, 
  SUM(num_bags)
FROM orders
GROUP BY year, month

Results interactive:

| year | month | SUM(num_bags) |
|------|-------|---------------|
| 2022 | 03    | 35            |
| 2022 | 04    | 25            |
| 2022 | 05    | 82            |
| 2022 | 06    | 36            |

SQLime

SQLime is an SQLite playground that runs in your browser. It allows me to upload example databases and code samples into an environment so you can run. SQLite is probably more limited than the runtime you are used to in a production system, but working within these limitations allows me to distribute examples to you.

OLAP vs OLTP

Different databases are optimized for different access patterns.