Week 5

Updated on 28 Dec 2018

Databases

Every business has data which requires some organized method, or mechanism for maintaining the data. This mechanism is referred to as a database management system (DBMS).

In the early years of computing, databases were a flat file system stored on a main-frame. In the early 1980’s, a new model of storing data was developed. This new method was known as a relational database (RDBMS), and today after more than 25 years the model is still in use today.

Along with the relational database, came a new language to access and manipulate the data. The language was approved for the American National Standards Institute (ANSI) in 1986, and later for ISO in 1987. The standard was based on an implementation from IBM.

Questions:

  • What is a relational database?
  • What is the name of the language referred to in this slide?
  • What are the advantages and disadvantages of having a standard?

In a relational database, the database is divided into logical units called tables where tables are generally related to each other. The concept of having a table is to allow data to be broken down into logical, smaller, more manageable units.

For example, the university stores your address details on the addresses table and your registration information (the units you’re enrolled in) on the registration table. The two tables would be linked together by a unique identifier.

The language used with relational databases, SQL, has 6 major categories into which the commands can be grouped.

  • DDL, Data Definition Language
  • DML, Data Manipulation Language
  • DQL, Data Query Language
  • DCL, Data Control Language
  • Data administration commands
  • Transactional control commands

In this unit we will focus on the SQL commands associated with querying the database and inserting / updating records in the database.

SQL

DML: Data Manipulation Language

  • INSERT
  • UPDATE
  • DELETE

DQL: Data Query Language

  • SELECT

Although there are entire books written for SQL, we’ll be concentrating on the 4 commands mentioned here for this course

Questions:

  • The SQL statements are fairly self explanatory. What do you think each of the listed commands do?
  • SQL commands are similar to functions. What would be a fundamental difference between DML statements and the SELECT statement?

Transactions

A transaction is the propagation of one or more changes to the database via DML commands. All transactions have a beginning and an end, and can be saved or undone. When a transaction is executed and completes successfully, the target table is not immediately changed, although it may appear so according to the output.

When a transaction successfully completes, there are transactional control commands that are used to finalize the transaction. Either saving the changes made by the transaction to the database or reversing the changes made to the database.

Transactions are controlled via the transactional control commands (part of the SQL language), with the most popular commands: COMMIT and ROLLBACK.

We will not be using transactions during the lab exercises, however it is important you understand the theory behind transactions.

Questions:

  • What do you think COMMIT and ROLLBACK do?
  • What would be an advantage of using transactions?

Field Types

Most databases have specific data types that can be associated with various columns in a table. For example, the column where you are storing the age of a person would probably only accept numeric values. The column where you are storing the name of a person would most definitely be able to store character values. In most circumstances you would specify the width of the column, i.e. the total number of characters permissible for any one value.

The following are some of the field types available in MySQL.

  • CHAR Fixed length from 0 to 255 characters
  • VARCHAR Variable length from 0 to 255 characters
  • TEXT Maximum length 65,535 characters
  • MEDIUMTEXT Maximum length 16,777,215 characters
  • DATETIME 8 bytes
  • TINYINT -128 to 127 or 0 to 255 (1 byte)
  • SMALLINT -32,768 to 32767 or 0 to 65,535 (2 bytes)
  • MEDIUMINT -8,388,608 to 8,388,607 or 0 to 16,777,215 (3 bytes)
  • INT -2,147,483,648 to 2,147,483,647 or 0 to 4,294,967,295 (4 bytes)

There is one other type of number field that can be defined for a column. DOUBLE or FLOAT. These types allow for a number with a decimal. There are also a couple of variants of the DATETIME field type as well, such as DATE (stores only the date), TIME (stores the time only) and TIMESTAMP which is similar to the DATETIME except for the Format.

Earlier I mentioned that a Relational Database stored its data in a collection of tables. The best way to visualize a table is to imagine the data displayed in a Spreadsheet as shown below.

Student_id Student_Name Unit Final_Grade Age Grade_Description
S0001234 James Jones COMP306 HD 12 High Distinction
S0001235 Sarah Smith COMP306 PA 10 Pass
S0002345 Bobby Brown COMP306 PA 10 Pass
S0002234 Tracy Taylor COMP306 CR 11 Credit
S0001244 Luke Lang COMP306 WW 12 Fail

Questions:

  • Assuming the data came from a single table, how would you define each of the columns for the data extract above?
  • Why do the field names use underscores and not spaces?
  • Do you know what an index is? What columns would you put an index on?
  • Does this table make sense?

SQL Statements

Statement:

SELECT   Student_id, Student_Name
FROM     Student_table
ORDER BY Student_Name

Output:

Student_id Student_Name
S0002345 Bobby Brown
S0001234 James Jones
S0001244 Luke Lang
S0001235 Sarah Smith
S0002234 Tracy Taylor

Statement:

SELECT   Student_Name, Final_Grade
FROM     Student_table
WHERE    Age > 10

Output:

Student_id Final_Grade
James Jones HD
Tracy Taylor CR
Luke Lang WW

Statement:

SELECT    Student_id, Student_Name, Final_Grade, Age
FROM      Student_table
WHERE     ((Age = 10 AND Final_Grade IN ('HD', 'CR', 'PA'))
OR        (Age > 10 AND Final_Grade IN ('HD', 'DI')))
ORDER BY  Age, Student_Name

Output:

Student_id Student_Name Final_Grade Age
S0002345 Bobby Brown PA 10
S0001235 Sarah Smith PA 10
S0001234 James Jones HD 12

Writing DQL statements is a fairly straight forward exercise. First SELECT the fields you wish to select. Next select the table(s) where you wish to get the data FROM. Next write your WHERE clause with optional AND and OR statements for each additional clause. Finally use the ORDER BY command to determine how you want the data sorted. In essence:

SELECT   column names
FROM     table(s)
WHERE    conditions
ORDER BY column names

At a minimum you need the SELECT and FROM commands. You can use the short-hand asterisk * in the SELECT statement if you want to select ALL columns.

You can only use the column names from the SELECT in the ORDER BY. I.E. you can’t ORDER BY a column you haven’t SELECTed, however you can use an unselected column in the WHERE clause.

Questions:

  • What is the difference between OR, AND. Is there a difference between AND, OR in PHP and SQL?

Table Design

An important aspect of creating dynamic, database driven websites is the ability to create proper database tables. Important aspects of table design include primary keys, indexes, constraints and appropriate field types. Some advanced RDBMS systems also provide triggers on tables and / or rows for DML statements.

This unit will concentrate mainly on appropriate field types, primary keys and indexes.

Student_id Student_Name Unit Final_Grade Age Grade_Description
S0001234 James Jones COMP306 HD 12 High Distinction
S0001235 Sarah Smith COMP306 PA 10 Pass
S0002345 Bobby Brown COMP306 PA 10 Pass
S0002234 Tracy Taylor COMP306 CR 11 Credit
S0001244 Luke Lang COMP306 WW 12 Fail

The data shown above might have been extracted from a MySQL database. If that were the case, then the table structure might look something like this:

Field Type Attributes NULL
Student_ID varchar(8) No
Student_Name varchar(30) No
Unit varchar(7) No
Final_Grade varchar(2)
Age tinyint(2) unsigned
Grade_Description varchar(30)

Questions:

  • Is this a good table design? If not, where could we make improvements?

PHP and MySQL

Incorporating SQL statements within PHP is very easy. In fact MySQL support is built into the PHP library via an offical extension, and once a Database connection has been established the following code is all you need to extract information from a database and display it on the web.

$query = "SELECT   Student_id, Student_Name
          FROM     Student_table
          ORDER BY Student_Name";

$result = mysqli_query($dbc, $query);

while($row = mysqli_fetch_array($result, MYSQLI_ASSOC))
  {
  echo '<p>' . $row['Student_id'] . '&nbsp;&nbsp;' . 
               $row['Student_Name'] . '</p>';
  }

The mysqli_query function executes the query that is passed to it, and returns the result set. mysqli_fetch_array returns one record at a time from the result set.

Questions:

  • What does the HTML code   do?
  • What would happen if mysqli_query couldn’t execute the SQL statement?

PHP and MySQL – with OOP

The previous example demonstrated the mysqli extension using procedural style code. The next code snippet shows the same example using the OOP version of the library.

$query = "SELECT   Student_id, Student_Name
          FROM     Student_table
          ORDER BY Student_Name";

$result = $dbc->query($query);

while($row = $result->fetch_array(MYSQLI_ASSOC))
  {
  echo '<p>' . $row['Student_id'] . '&nbsp;&nbsp;' . 
               $row['Student_Name'] . '</p>';
  }

The code looks similar, but instead of passing objects to functions, we’re invoking the associated method of that object. For instance, instead of passing the database connection object to the query function as shown below (first example).

$result = mysqli_query($dbc, $query);

We simply invoke that method as part of the object.

$result = $dbc->query($query);

The same thing occurs with fetching the array (…fetch_array…), and all the other mysqli functions.