A Live Developer Journal

SQL the sequel

Notes based on Head First SQL

Scenario: We have a large bookshelf and a stack of movies. Our task is to group the movies together into categories with the help of an SQL table, which we will then use to know where to place our movies on the bookshelf (can also be scaled to a movie store). One of the problems we need to solve, is designing a table that works out how to group movies that belong to multiple categories.

If a movie is placed into one of those categories, then it can be confusing if you expected to find it in a different category, or were unsure which category to look for in the first place.

At the moment, we have a table that contains a movie title, and a column each for each of the categories a movie can belong to. If the movie does belong to a category, the value is 'T' for true, and if it doesn't, the value is set to 'F' for false.

Category column

Before addressing the which category should this movie be shelved inot, we are first going to write an SQL statement to make this table a little better

We can use the CASE expression to check an existing column's value against a condition, if it meets the condition, then the new column is filled with a specified value.

In our movie example, we will change the category of the movie from 'T' to the name of the category if it was marked as a 'T'. If no 'T' values are found, then we will set our category name to 'misc'.

NOTE: I don't like the following case statement. It's hard coded, inflexible and has way too much duplication, so for me this is just a learning exercise and not something I'd do in a real project without a very good reason.


UPDATE move_table
SET category =
CASE
  WHEN drama = 'T' THEN 'drama'
  WHEN comedy = 'T' THEN 'comedy'
  WHEN action = 'T' THEN 'action'
  WHEN gore = 'T' THEN 'horror'
  WHEN scifi = 'T' THEN 'scifi'
  WHEN for_kids = 'T' THEN 'family'
  WHEN cartoon = 'T' THEN 'family'
  ELSE 'misc'
END;

The order matters when using this approach, because the category the movie ends up in is the first one that has a 'T' value. If not in order, then a movie might be added to one category when a different one would be a better fit even though technically it belongs to both genres.

Just writing out that explaination above makes me dislike this approach even more. We should make it so that the people adding the movies into the database don't have to think about it. There is too much room for error here. Better off adding it manually than doing it this way. It's also impractical to make the order matter when the order in which movies come and go are unpredictable.

Another problem is that if we have an R-rated cartoon, then it might end up being categorised as a cartoon instead of a more appropriate (and kid friendly) category.

Another problem (there are many) is that this approach makes it difficult to change things. So if we decide to introduce new categories where for example we want R-rated movies to be seperate from G and PG-rated movies within the same category, then this is difficult to make happen.

OH GOOD, this was an example to show how not to do SQL, that's good. I thought it was odd as the rest of the book was pretty good. Happy to have picked up on all the 'ick' moments mentioned (ranted about).

This is why a good table design is important, they can get out of hand pretty quickly.

Order by

The ORDER BY statement allows you to alphabetically order any column.


SELECT title, category
FROM movie_table
WHERE
category = 'family'
ORDER BY title;

The query above returns a list of movie titles that belong to the 'family category' in alphabetical order.

SQL rules of order


SELECT title, category, purchased
FROM movie_table
ORDER BY category, purchased, title;

The query above returns a list of movie titles, the category they belong to as well as when they were purchased. The list is ordered by category in alphabetical order, and then the movies inside each category are ordered by oldest purchase. In other words, the query above shows the oldest to most recently purchased movies in each category.

If there are two movies in the same category with the same purchase date, then the titles will be sorted alphabetically.

ASC and DESC sort order

By default, records are sorted in ascending order (A-Z) and 1-99,999, which is why the purchase dates are returned oldest to newest.

To change the order that any column in sorted by, you can add the ASC or DESC keyword directly after the column name in your ORDER BY statement.

The following query returns the purchase date of the movies from ascending (oldest) to descending (newest) order:


SELECT title, purchased
FROM movie_table
ORDER BY title ASC, purchased DESC;

Adding totals

The sum function works by totalling the values in a column designated by parentheses.


SELECT SUM(sales)
FROM cookie_sales
WHERE first_name = 'Nicole';

The query above calculates the number of cookie sales that Nicole made. More technically, we use the sum function to add together all of the numeric values contained within the 'sales' column inside of the 'cookie_sales' table, where the first_name column contains the value (name) 'Nicole'.

Adding multiple totals

In the following query, we SUM each of the girl sprout sales at the same time. We do this by adding a GROUP BY to our SUM statement, which groups all of the first name values for each girl and totals the sales for this group.


SELECT first_name, SUM(sales)
FROM cookie_sales
GROUP BY first_name
ORDER BY SUM(sales) DESC;

The query above returns a list of Girl Sprout names and the total cookie sales they made in order of who sold the most.

Total by highest average


SELECT first_name, AVG(sales)
FROM cookie_sales
GROUP BY first_name;

In the query above, we add together each of the girl sprout girl sales and use the AVG function which then divides the sales by the total number of values to find the average values for that group.

MIN and MAX


SELECT first_name, MAX(sales)
FROM cookie_sales
GROUP BY first_name;

In the query above, we use the MAX function to find the highest value in a column (the highest number of cookie sales in a day). We can use the MIN function to find the lowest value.

Count records that match a condition


SELECT COUNT(sale_date)
FROM cookie_sales;

The query above returns the number of days that cookies were sold on. If the value in this column is NULL, then it isn't counted. The problem with this is that if there are more than one records with the same date, then they will both be counted. So it won't be an accurate measure of how many days the cookies were sold on with this query alone.

Count records that match a condition - NO duplicates


SELECT DISTINCT sale_date
FROM cookie_sales
ORDER BY sale_date;

In the query above, we return a list of dates that cookies were sold on, making sure that there are no duplicates. The dates are ordered from oldest to most recent sale date.


SELECT COUNT(DISTINCT sale_date)
FROM cookie_sales;

The query above returns the number of days that cookies were sold on, without duplicates if there are more than one record sharing the same date that cookies were sold on.

Limit the number of results


SELECT first_name, SUM(sales)
FROM cookie_sales
GROUP BY first_name
ORDER BY SUM(sales) DESC
LIMIT 2;

The query above returns a list of girl sprout names and their total cookie sale amounts from highest to lowest. The number of results listed have been limited to 2 using the LIMIT keyword. This is useful if you only want to know who won 2nd/3rd/4th/5th etc place. This is really useful when you are working with large amounts of data, for example a large music database where you only want to see the top 100 songs played in order of popularity.

You can also make the limit more specific. If you only wanted to see the 2nd place winner and not the first, you would write LIMIT 1,1 SQL starts counting at 0, which is why we have written 1,1 to get the second value. The first value is which record to start from (including itself), and the second value is which record to end on (including itself).

More about table design

Your table design should do the heavy lifting for you. Don't write convoluted queries to 'get around' badly designed tables.

Think outside of the single table

We need to move the non-atomic columns in our table into new tables. We need more tables that can work with the current table to allow us to associate each person with more than one interest.

EXAMPLE: In our old clown tracking table, we stored the clown's information (name), name of location, last seen and activities all in one table.

We can split this table into five seperate tables:

Database schema

A representation of all the structures, such as tables and columns, in your database, along with how they connect, is known as a schema. Creating a visual representation of your database can help you see how things connect when you're writing your queries. However, your schema can also be written in text format.

BEFORE

🔑 = this column is a primary key

✈️ 🔑 = this column is a foreign key

my_contacts
contact_id 🔑
last_name
first_name
phone
email
gender
birthday
profession
city
state
status
interests
seeking

Splitting a single big table into two tables.

  1. Move the interests column into a new table. It will hold all of the interests, each in an individual column, from the my_contacts table.
  2. Add columns that will let us identify which interests belong to which person in the contacts table.
interests
int_id 🔑
interest
contact_id ✈️ 🔑

The foreign key

The FOREIGN KEY is a column in a table that references the primary key of another table. In this case, it tells you which interest belongs to each person in the my_contacts table.

We can make sure that a foreign key contains a meaningful value, one that exists in the parent table, by using a constraint. Adding a foreign key gives you a few advantages, like getting error messages if you violate the rules, which will stop you doing anything accidentally to break the table.

Adding a constrain means that you can only put values in the child table's foreign key that already exist in the parent table.

Creating a table with a foreign key

To create a table with a foreign key, you need to know the name of the primary key in the parent table as well as the name of the parent table.


CREATE TABLE interests (
int_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
interest VARCHAR(50) NOT NULL,
contact_id INT NOT NULL,
CONSTRAINT my_contacts_contact_id_fk
FOREIGN KEY (contact_id)
REFERENCES my_contacts (contact_id);

When you enter DESC interests; to see the structure of a table that has a foreign key in it, you will see 'MUL' in the 'Key' column which means that multiple occurrences of the same value may be stored in this column. This is what allows us to keey track of multiple interests for each contact_id in my_contacts.

The foreign key constraint makes sure that if you try to delete the row in a primary key table or change the primary key value, then you will get an error if the primary key value is a foreign key constraint in another table. To delete a row from your table that has a primary key if it shows up in another table as a foreign key, you must first remove it from the foreign key row first.

Other constraints besides the foreign key

Relationships between tables

We need to associate lots of people with lots of interests. This is one of three possible patterns you will see again and again with your data: one-to-one, one-to-many and many-to-many. Once you have identified the pattern your data matches, coming up with the design of yur table is easy

One-to-one

In the one-to-one pattern, a record in table A can have at most ONE matching record in table B. One example of this is where table A contains your name, and table B contains your salary details and social security number to isolate them in order to keep them more secure.

Both tables with contain your ID number. The employee_id in the parent table is a primary key, the employee_id in the child table is a foreign key.

When to use one-to-one tables:

One-to-many

One-to-many means that a record in table A may have many matching records in table B, but a record in table B can only match one record in table A.

For example, in a contacts table (table A), each person has only one 'prof_id' (proffession id), but more than one person in the contacts list may have the same prof_id. For example, the prof id for Programmer may show up more than once in my_contacts, but each person in my contacts will only have one prof id.

In this case, we moved the profession column to a new child table and changed the proffession column in the parent table to a foreign key called 'prof_id'. Since it's a one-to-many relationship, we can use the prof_id in both tabels to allow us to connect them.

Because the prof id can repeat (have many matching records), this can't be a primary key, it is a foreign key because it references a key from another table.

Many-to-many

Many children own many toys. More than one child can have a toy that is of the same brand.

To create a many-to-many table, we need two one-to-many tables that link to a junction table. A junction table contains the primary key columns of the two tables we want to relate to each other.

The children table contains two columns, a 'child_id 🔑' column, and a 'child' column.

The toy table also contains two columns, a 'toy_id 🔑' column and a 'toy_name' column.

The junction table contains two columns, a 'child_id ✈️ 🔑' and a 'toy_id ✈️ 🔑' column.

Composite keys

Composite keys are like primary keys, but whereas the primary key uses only one column, a composite key uses multiple columns.

So far, we have talked about how data in a table relates to other tables, but not how the columns in a table relate to each other.

A composite key is a primary key composed of multiple columns, creating a unique key.

While there may be duplicate data in each of the individual columns, the assumption is that together, both columns will form a unique pair.

Imagine have a table of superheroes, where the columns are: name, power and weakness. We assume that when combined, the name and power columns will create a unique composite key.

Dependencies

Functional dependencies

When a column's data must change when another column's data is modified, the first column is functionally dependent on the second.

For example, if we have two columns, a name and an initial column, when we change one of them, we must change the other.

A quick way to describe a function dependency is to write this: T.x ->; T.y (a good idea to map dependencies in not SQL projects too - personal opinion).

super_heroes.name ->; super_heroes.initials. This means: In the super heroes relational table, the initial column in functionally dependent on the name column.

Non-dependent

A non-dependent column does not change if another colum changes, it stands alone.

Partial functional dependency

A partial functional dependency means that a non-key column is dependent on some, but not all of the columns in a composite primary key. In our superheroes table, the initials column is partially dependent on name, because if the superhero's name changes, the initials value will too, but if the power changes, and not the same, our superhero's initials will stay the same.

Transistive functional dependency

We also need to consider how each non-key column relates to the others. If an arch-enemy moves to a different city, it doesn't change his arch enemy id. If changing any of the non-key columns might cause any of the other columns to change, you have a transitive dependency.

For example, if the arch enemy id changes, then the arch enemy city could but may not for definite change alnog with it.

A transitive functional dependency is when any non-key column is related to any of the other non-key columns.

Recap 1NF, 2NF and 3NF rules that you want all of your tables to follow

1NF

  1. Columns contain only atomic values
  2. No repeating groups of data

2NF

  1. Be in 1NF
  2. Have no partial or functional dependencies

3NF

  1. Be in 2NF
  2. Have no transitive dependencies

A table follows 1NF AND 2NF rules if ALL the columns in the table are part of the primary key OR it has a single column primary key.