Week 6

Updated on 28 Dec 2018

Table Design II

This unit will concentrate mainly on appropriate field types, primary keys and indexes.

Student_id Student_Name Unit Final_Grade Age Grade_Description
S0001234 James Jones COMP306 HD 12 High Distinction
S0001235 Sarah Smith COMP306 PA 10 Pass
S0002345 Bobby Brown COMP306 PA 10 Pass
S0002234 Tracy Taylor COMP306 CR 11 Credit
S0001244 Luke Lang COMP306 WW 12 Fail

The data shown above might have been extracted from a MySQL database. If that were the case, then the table structure might look something like this:

Field Type Attributes NULL
Student_ID varchar(8) No
Student_Name varchar(30) No
Unit varchar(7) No
Final_Grade varchar(2)
Age tinyint(2) unsigned
Grade_Description varchar(30)

Defining the column structures is only part of designing a proper database table. We still need to look at defining a primary key and indexes if appropriate.

KEYS

Keys are pieces of data that help to identify a row of information in a table (a row is also called a record). There are two types of keys, primary and foreign.

Questions:

  • What is a primary key? What is a foreign key?

Primary Keys & Indexes

Primary keys are very important for database tables because they are used to help identify a row of information in a table. A primary key is a unique identifier that follows these rules:

  • Always have a value
  • Have a value that remains the same (never changes)
  • Have a unique value for each record in the table.

An example of a primary key used by the government would be your Tax File Number, ABN etc. Invoice numbers from the University, Energy and communication companies could also be used as primary keys.

On the database table, the primary key is automatically an index. An index on a database table is very similar to an index of a book. In a book index, special keywords are indexed to specific page numbers. This allows you to find the topic you’re looking for very quickly. Without an index you’d have to search every page of the book until you found the topic you were searching for…

Question:

  • Do you understand how a book index works?

Indexes

A database table is very similar to a book index. Without an index, the database would need to search every record until it found the records that you wanted. On a table that held very few records this won’t pose a problem, however as your table grows (records are added) performance issues will arise.

The index of a book contains specific keywords. Common language use words are not in the index. Words like THE, THAT, THERE, IS, A etc are not found in the index because they are used so prolifically. In fact not only are you likely to find these words on each page, chances are those words would be used in the construction of each sentence on every page of the book.

This is also applicable to database tables. Remember that the primary key must be unique, and it is also good practice if the fields where you’ve defined indexes are near to unique also. (Unlike the primary key, you can define an index on a column that won’t be unique).

A field that stored a person’s gender would be inappropriate to be indexed because like the book example, 1 in every 2 records are likely to match your search. It would be like indexing THE in a book!

If we know that a field is appropriate to index, how do we know if it needs to be indexed? The best approach is to look at the likely SQL statements that are going to be executed. Fields that appear in the ORDER BY or WHERE clauses are likely candidates for indexing.

In these examples, likely candidates for indexing would be the columns Student_Name and Age, although I’m not sure Age would be appropriate to index anyway.

SELECT   Student_id, Student_Name
FROM     Student_table
ORDER BY Student_Name
SELECT   Student_Name, Final_Grade
FROM     Student_table
WHERE    Age = 10

Normalization

Normalization is a set of rules that define the organization of data within the database. In database design text books you’ll come across terms such as 1st Normal Form, 2nd Normal Form and 3rd Normal Form. Normal Forms are the process that is followed for normalizing a database.

Questions:

  • Have you heard of the term Normalization?
  • What does a Normalized database look like?

1st Normal Form

Student_id Student_Name Unit Final_Grade Age Grade_Description
S0001234 James Jones COMP306 HD 12 High Distinction
S0001235 Sarah Smith COMP306 PA 10 Pass
S0002345 Bobby Brown COMP306 PA 10 Pass
S0002234 Tracy Taylor COMP306 CR 11 Credit
S0001244 Luke Lang COMP306 WW 12 Fail

The data shown above might have been extracted from a MySQL database. If that were the case, then the table structure might look something like this:

Field Type Attributes NULL
Student_ID varchar(8) No
Student_Name varchar(30) No
Unit varchar(7) No
Final_Grade varchar(2)
Age tinyint(2) unsigned
Grade_Description varchar(30)

For a database to be in First Normal Form, each column must contain only ONE value. This database is not in 1NF because we are storing two values in the Student_Name column. First and Last Name!

The Grade Description stores only one value, even though some descriptions may contain more than one word. Therefore for this database to be 1NF compliant, the table structure would need to look like this:

Field Type Attributes NULL
Student_ID varchar(8) No
Last_Name varchar(30) No
First_Name varchar(30) No
Unit varchar(7) No
Final_Grade varchar(2)
Age tinyint(2) unsigned
Grade_Description varchar(30)

2nd Normal Form

The database must first be in 1NF before you can proceed to the 2NF. In the 2NF, every column in a table that is not a key MUST relate to the primary key. This is a rule that encompasses “The concept of having a table is to allow data to be broken down into logical, smaller, more manageable units.”.

“…address details on the addresses table … registration information (the units you’re enrolled in) on the registration table.” (lecture notes last week).

The above quotes is what 2NF is all about. So if we have a look at our table with the Student_Id column as the primary key we should be able to see that the database is not 2NF compliant.

Questions:

  • What fields are DIRECTLY related to the Student_ID?
  • What fields could be ‘pushed’ off onto their own table?
  • Looking at repeated sections of data is another clue for creating another table. Can you see where this might be applicable for our 1NF table?

Applying the rules for the 2nd normal form, we could have a database design like this (NB none of the tables are linked yet):

STUDENT_TABLE

Field Type Attributes NULL
Student_ID varchar(8) Primary Key No
Last_Name varchar(30) No
First_Name varchar(30) No
Age tinyint(2) unsigned

UNIT_TABLE

Field Type Attributes NULL
Unit_ID smallint(3) Primary Key No
Unit_Code varchar(7) No
Unit_Description varchar(45)

GRADE_TABLE

Field Type Attributes NULL
Grade_ID tinyint(2) Primary Key No
Grade_Code varchar(2) No
Grade_Description varchar(45)

Questions:

  • Do the table designs for the 2NF make sense?
  • How would you link the tables? (E.g. Do we add a Unit_ID to the Student_Table or add a Student_ID to the Unit_Table?)

Trying to link the tables together as they are doesn’t make much sense. Whether we add a Student_ID column to the Unit_Table or add a Unit_ID to the Student_Table; neither option ‘feels’ right. It doesn’t feel right because the relationship doesn’t exist, and we are violating the 2NF principles.

In the case where something doesn’t feel right then we should look at creating a new table that links the 2 tables together.

Registration_TABLE

Field Type Attributes NULL
Registration_ID integer auto_increment (Primary Key) No
Student_ID varchar(8) Foreign Key No
Unit_ID smallint(3) Foreign Key No
Grade_ID tinyint(2) Foreign Key No

Primary Keys, as with indexes can be defined for single or multiple columns.

Questions:

  • What business rule can be deduced from the definition of the Primary Key on the Registration Table?
  • What else might need to be considered for this table?
  • What business rule could be deduced if we had no Registration_ID field, and the Student_ID and Unit_ID fields formed the primary key?

SQL Functions

The SQL language also defines a set of functions that can be used when querying the data. The types of functions available to MySQL include:

  • Text Functions
  • Concatenation and Aliases
  • Numeric Functions
  • Date and Time Functions (formatting)
  • Encryption
  • Grouping

The use of functions is generally used with SELECT statements as shown in the examples below.

SELECT   Student_ID, Last_Name, MIN(age) 
FROM     Student_Table
GROUP BY Student_ID
SELECT   COUNT(Student_ID)
FROM     Student_Table
SELECT   age, COUNT(age)
FROM     Student_Table
GROUP BY age

Questions:

  • Do any of the queries not make sense?
  • What sort of data do the above queries return?

Table Joins

When we extract data from a database we will most likely be extracting fields from more than one table. In this situation we will have to JOIN the tables. Tables are usually linked with a Foreign Key on one table with a Primary Key on another.

If we wanted to SELECT the Last_Name, First_name (from the Student table) and Unit_Code (from the Units table) we would need to write an SQL statement that looks like this:

SELECT CONCAT_WS(' ', ST.first_name, ST.last_name) as full_name,
       UT.unit_code
FROM   Student_Table ST, 
       Unit_Table UT, 
       Registration_Table RT
WHERE  ST.Student_ID = RT.Student_ID
AND    RT.Unit_ID = UT.Unit_ID

In this query we are matching the Student_ID from the Student_Table to the Student_ID on the Registration_Table AND matching the Unit_ID from the Registration_Table to the Unit_ID on the Unit_Table.

Questions:

  • What would happen if we didn’t include AND RT.UNIT_ID = …
  • What is a column Alias? What is a table alias, and why would one be necessary?

DML Statements

Not only will we be retrieving information from a database, but we will also be INSERT(ing), UPDATE(ing) or DELETE(ing) records from a database. In general, the same sort of queries that were written for single table SELECT statements can be modified for INSERT, UPDATE or DELETE.

Here are some DML examples.

UPDATE student_table
SET    age = 12
WHERE  student_first_name = 'Brent';
DELETE FROM registration_table
WHERE       unit_id = 2;
INSERT INTO grade_table
VALUES (0, 'VHD', 'Very High Distinction');

Questions:

  • What do each of the DML statements do?
  • How would we write an INSERT statement that didn’t need to insert data into every field?
  • Why is the WHERE clause so important in the UPDATE and DELETE statements?
  • What preventative measure could you perform before executing an UPDATE or DELETE statement?

PHP and MySQL

When we combine variables from POST or GET with our queries, then we can start to build some really useful web pages. Have a look at this modified version of the previous SQL statements.

if(isset($_POST['submit']))
  {
  if(!empty($_POST['last_name']))
    $last_name = $_POST['last_name'];
  else
    $last_name = FALSE;

  if(!empty($_POST['age']) && is_numeric($_POST['age']))
    $age = $_POST['age'];
  else
    $age = FALSE;

  if($last_name && $age)
    {
    $query = "UPDATE student_table
              SET    age = $age
              WHERE  student_last_name = '$last_name'";

    $result = mysqli_query($dbc, $query);
    
    if($result)
      echo '<p>Information Updated Successfully</p>';
    else
      echo '<p>ERROR: Could not update</p>';
    }  
  else  //-- if($last_name && $age)
    echo '<p>The last name and age field must be ...</p>';
  
  } //-- if(isset($_POST['submit']))

Questions:

  • What assumptions have been made with regards to the POST variables?
  • Why is $last_name in quotes, but not $age in the query?