Chapter 16
Chapter 16
Database
Normalization
Now that you understand the Data Definition Language
(DDL) portion of SQL, it's time to apply that new knowledge to a lesson on database
theory. Today you learn about the concept of data normalization. You develop a working
definition of data normalization and learn about the advantages of normalizing your
databases. You also explore each of the five rules of data normalization, including
reasons for applying these rules. When you have completed today's lesson, you will be able
to identify ways to use data normalization to improve database integrity and performance.
Throughout today's lesson, you normalize a real
database using the data definition SQL statements you learned about on Day 13
("Creating Databases with SQL") and Day 15 ("Updating Databases with
SQL"), and by using Visual Basic's Visdata application that you learned about in the
first week (see Day 7, "Using the Visdata Program").
The topic of data normalization could easily take up
an entire guide--and there are several excellent guides on it. This lesson approaches data
normalization from a practical standpoint rather than a theoretical standpoint.Here you
focus on two particular questions: What are the rules? How can these rules help me improve
my Visual Basic database applications? To start, let's develop a working definition of
data normalization and talk about why it can improve your Visual Basic applications.
What Is Data
Normalization?
Data normalization is a process of refining database
structures to improve the speed at which data can be accessed and to increase database
integrity. This is not easy. Very often, optimizing a table for speed is not the same as
optimizing for integrity. Putting a database together involves discovering the data
elements you need and then creating a set of tables to hold those elements. The tables and
fields you define make up the structure of the database. The structure you decide upon
affects the performance of your database programs. Some database layouts can improve
access speed. For example, placing all related information in a single table allows your
programs to locate all needed data by looking in one place. On the other hand, you can lay
out your database in a way that improves data integrity. For example, placing all the
invoice line item data in one table and the invoice address information in another table
prevents users from deleting complete addresses when they remove invoice line items from
the database. Well-normalized databases strike a balance between speed and integrity.
High-speed tables have few index constraints and can
have several, sometimes repetitive, fields in a single record. The few constraints make
updates, insertions, and deletions faster. The repetitive fields make it easier to load
large amounts of data in a single SQL statement instead of finding additional, related
data in subsidiary tables linked through those slower index constraints.
Databases built for maximum integrity have many
small data tables. Each of these tables can have several indexes--mostly foreign keys
referencing other tables in the database. If a table is built with high integrity in mind,
it is difficult to add invalid data to the database without firing off database error
messages. Of course, all that integrity checking eats precious ticks off the microchip
clock.
Good data normalization results in data tables that
make sense in a fundamental way. Well-normalized tables are easy to understand when you
look at them. It is easy to see what kind of data they store and what types of updates
need to be performed. Usually, it is rather easy to create data entry routines and simple
reports directly from well-normalized tables. In fact, the rule of thumb is this: If it's
hard to work with a data table, it probably needs more normalization work.
For the rest of this lesson, you use the Visdata
application to build data tables. If you have not already looked at the lesson on Day 7,
turn there first for information on how to use Visdata to maintain relational databases.
A Typical
Database Before Normalization
To illustrate the process of normalization, let's
start with an existing database table. The database NORMDAT1.MDB can be found in
the TYSDBVB5\SOURCE\DATA directory of the CD that shipped with this guide. Load
this into the Visdata application and open the Table1 data table in design mode. Your
screen should look something like the one in Figure 16.1.
Figure 16.1. Displaying
Table1 before normalization.
This data table holds information about employees of a small company. The table contains
fields for the employee ID and employee name, and the ID, name, and location of the
department to which this employee is currently assigned. It also includes fields for
tracking the employee's job skills, including the skill code, the name, the department in
which the skill was learned, and the ability level that the employee has attained for the
designated skill. Up to three different skills can be maintained for each employee.
This table is rather typical of those you find in
existing record-oriented databases. It is designed to quickly give users all the available
information on a single employee. It is also a fairly simple task to build a data entry
form for this data table. The single form can hold the employee and department fields at
the top of the form and the three skill field sets toward the bottom of the form. Figure
16.2 shows a simple data form for this table generated by Visdata.
Access to the information in the table is fast and
the creation of a data entry screen is easy. So this is a well-normalized table, right?
Wrong. Three of the five rules of normalization that you learn in the rest of this lesson
are broken, and the other two are in jeopardy! Some of the problems are obvious, some are
not. Let's go through each of the five rules of normalization and see how applying these
rules can improve the data table.
Figure 16.2. The data
entry form for Table1.
Rule 1: Eliminate Repeating Groups
The first area in which Table1 needs some work is in
the repeating skill fields. Why include columns in the data table called SkillCode1,
SkillCode2, SkillCode3, or SkillName1, SkillName2, SkillName3, and so forth? You want to
be able to store more than one set of skills for an employee, right? But what if you want
to store data on more than three skills acquired by a single employee? What if most of the
employees only have one or two skills, and very few have three skills? Why waste the blank
space for the third skill? Even more vexing, how easy will it be to locate all employees
in the data table that have a particular skill?
NOTE: The first rule of data normalization
states that you should make a separate table for each set of related columns and give each
table a primary key. Databases that adhere to this first rule of normalization are said to
be in the First Normal Form.
The first rule of data normalization is to eliminate
repeating groups of data in a data table. Repeating groups of data, such as the skill
fields (SkillCodeX, SkillNameX, SkillDeptIDX, and SkillLevelX), usually indicates the need
for an additional table. Creating the related table greatly improves readability and
allows you to keep as few or as many skill sets for each employee as you need without
wasting storage space.
The fields that relate to employee skills need to be
separated from the others in the table. You don't need to put all 12 skill fields in the
new table, though. You only need one of each of the unique data fields. The new database
now has not one, but two data tables. One, called Skills, contains only the skill fields.
The other table, called Employees, contains the rest of the fields. Table 16.1 shows how
the two new tables look.
Table 16.1. Eliminating
repeating data.
Skills Table |
Employees Table |
EmpID |
EmpID |
SkillCode |
EmpName |
SkillName |
DeptID |
SkillDeptID |
DeptName |
SkillLevel |
DeptLocation |
Notice that the first field in both tables is the EmpID field. This field is used to
relate the two tables. Each record in the Skill table contains the employee ID and all
pertinent data on a single job skill (code, name, department learned, and ability level).
If a single employee has several skills, there is a single record in the Skill table for
each job skill acquired by an employee. For example, if a single employee has acquired
five skills, there are five records with the same employee ID in the Skills table.
Each record in the Skills table must contain a valid
value in the EmpID field or it should be rejected. In other words, each time a record is
added to the Skills table, the value in the EmpID field should be checked against values
in the EmpID field of the Employees table. If no match is found, the Skills record must be
corrected before it is written to the database. You remember from the discussion of SQL
Data Definition Language statements on Day 13 that this is a FOREIGN KEY
CONSTRAINT. The field EmpID in the Skills table is a foreign key that references the field
EmpID in the Employees table. Also, the EmpID field in the Employees table should be a
primary field to make sure that each record in the Employee table has a unique EmpID
value.
Now that you know the fields and index constraints
you need, you can use SQL DDL to create two new tables. If you have not already done so,
start the Visdata application and open the NORMDAT1.MDB database. Now you create
two new tables that bring the database into compliance with the first rule of data
normalization.
First, create the table that holds
all the basic employee data. This table has all the fields that were in the Table1 table,
minus the skill fields. Using the information in Table 16.1 as a guide, enter an SQL DDL
statement in the SQL window of Visdata that creates the Employees data table. Your SQL
statement should resemble Listing 16.1.
Listing 16.1.
Creating the Employees table.
CREATE TABLE Employees
(EmpID TEXT(5),
EmpName TEXT(30),
DeptID TEXT(5),
DeptName TEXT(20),
DeptLocation TEXT(20),
CONSTRAINT PKEmpID PRIMARY KEY (EmpID));
The EmpID field has been designated as a primary key field. This guarantees that no two
records in the Employees data table can have the same EmpID value. You can use the EmpID
field in the next table you create (the Skills table) as the reference field that links
the two tables. Because you are using the EmpID field as a link, it must be a unique value
in the Employees table in order to maintain database integrity. What you are doing here is
setting up a one-to-many relationship between the Employees table (the one-side) and the
Skills table (the many-side). Any time you establish a one-to-many relationship, you must
make sure that the reference field (in this case, the EmpID field) is unique on the
one-side of the relationship.
Now that you have built the Employees
table, you can create the table that holds all the skills data. Use the information in
Table 16.1 to write an SQL DDL statement that creates a table called Skills. Make sure the
new table has the field EmpID and that the EmpID field is built with the correct index
constraint to enforce one-to-many database integrity. Your SQL statement should look like
the one in Listing 16.2.
Listing 16.2.
Creating the Skills table.
CREATE TABLE Skills
(EmpID TEXT(5),
SkillCode TEXT(5),
SkillName TEXT(20),
SkillDeptID TEXT(5),
SkillLevel INTEGER,
CONSTRAINT PKSkills PRIMARY KEY (SkillCode,EmpID), |
CONSTRAINT FKEmpID FOREIGN KEY (EmpID)
REFERENCES Employees(EmpID)); You can see in Listing 16.2 that you have used the FOREIGN
KEY_REFERENCES syntax to establish and maintain the table relationship. As you
remember from the SQL lessons on Day 13 and Day 15, the FOREIGN KEY_REFERENCES
syntax makes sure that any entry in the Skills.EmpID field can be found in the related
Employees.EmpID field. If users enter a value in the Skills.EmpID field that cannot be
found in any Employees.EmpID field, Visual Basic automatically issues a database error
message. This message is generated by Visual Basic, not by your program.
That is how you build tables that adhere to the
first rule of data normalization. To see how these tables look when they have live data in
them, use Visdata to load the TYSDBVB5\ SOURCE\DATA\NORMDAT2.MDB database. This
database contains the Employees and Skills tables with data already loaded into them.
Figure 16.3 shows how Visdata displays the two new tables that have live data.
Figure 16.3. The new
Employees and Skills tables from NORMDAT2.MDB.
NOTE: Before continuing with today's lesson,
load the NORMDAT2.MDB database into Visdata.
Rule 2:
Eliminate Redundant Data
Another aspect of the Skills table also needs
attention. Although moving the repeating skills fields into a separate table improves the
database, you still have work to do. The Skills table contains redundant data. That is,
data is stored in several places in the database. Redundant data in your database can lead
to serious database integrity problems. It's best to eliminate as many occurrences of
redundant data as possible.
NOTE: The second rule of data normalization
states that if a column depends only on part of a multivalued key, you remove it to a
separate table. In other words, if you need to fill in two fields in order to truly
identify the record (JobID and JobName), but only one of those fields is needed to perform
a lookup in the table, you need a new table. Databases that conform to this rule are said
to be in the Second Normal Form.
For example, the Skills table includes a field
called SkillCode. This field contains a code that identifies the specific skill (or
skills) each employee has acquired. If two employees have gained the same skill, that
skill appears twice in the Skills file. The same table also includes a field called
SkillName. This field contains a meaningful name for the skill represented by the value in
the SkillCode field. This name is much more readable and informative than the SkillCode
value. In essence, these two fields contain the same data, represented slightly
differently. This is the dreaded redundant data you have to eliminate!
Before you jump into fixing things,
first review the details regarding redundant data and how it can adversely affect the
integrity of your database.
Update
Integrity Problems
When you keep copies of data elements in several
rows in the same table or in several different tables (such as job names to go with job ID
codes), you have a lot of work ahead of you when you want to modify the copied data. If
you fail to update one or more of these copies, you can ruin the integrity of your
database. Redundant data can lead to what are known as update integrity problems.
Imagine that you have built a huge
database of employee skills using the tables you built in the preceding section. All is
going great when, suddenly, the Human Resources Department informs you that it has
designed a new set of names for the existing skill codes. You now have to go through the
entire database and update all the records in the Skills table, searching out the old
skill name and updating the SkillName field with the new skill name. Because this is an
update for the entire data table, you have to shut down the database until the job is
complete in order to make sure no one is editing records while you're performing this
update. Also, you probably have to change some Visual Basic code that you built to verify
the data entry. All in all, it's a nasty job. If that isn't enough, how about a little
power outage in the middle of your update run? Now you have some records with the old
names, and some with the new names. Things are really messed up!
Delete
Integrity Problems
Although the update integrity problem
is annoying, you can suffer through most of those problems. In fact, almost all database
programmers have had to face similar problems before. The more troublesome integrity
problem resulting from redundant data comes not during updates, but during deletes. Let's
assume you have properly handled the mass update required by the Human Resources
Department. Then you discover that there is only one employee in the entire database that
has the SkillCode S099 (Advanced Customer Service course). No other employee has attained
this high level of training. Now, that employee is leaving the organization. When you
delete the employee record from the file, you delete the only reference to the Advanced
Customer Service course! There is no longer any record of the Advanced Customer Service
course in your entire database, which is a real problem.
The
Normalization Solution
The way to reduce these kinds of data integrity
problems is to pull out the redundant data and place it in a separate table. You need a
single table, called SkillMaster, that contains only the SkillCode and the SkillName data
fields. This table is linked to the Skills table through the SkillCode field. Now, when
the Human Resources department changes the skill names, you only need to update a single
record--the one in the SkillMaster table. Because the Skills table is linked to the
SkillMaster table, when you delete the employee with the certification for SkillCode S099,
you don't delete the last reference to the skill. It's still in the SkillMaster table.
TIP: Another plus to this type of table
separation is in speeding data entry. With only one field to enter, and especially a brief
code, data entry operators can more quickly fill in fields on the table's form.
Also, you now have a single table that lists all the
unique skills that can be acquired by your employees. You can produce a Skills list for
employees and managers to review. If you add fields that group the skills by department,
you can even produce a report that shows all the skills by department. This would be very
difficult if you were stuck with the file structure you developed in the preceding
section.
So let's redefine the Skills table and the
SkillMaster table to conform to the second rule of data normalization. Table 16.2 shows
the fields you need for the two tables.
Table 16.2. The field
list for the Skills and SkillMaster tables.
EmpSkills |
SkillMaster |
EmpID |
SkillCode |
SkillCode |
SkillName |
SkillDeptID |
|
SkillLevel |
|
You can see that you have renamed the Skills table to EmpSkills to better reflect its
contents. You have also moved the SkillName field out of the EmpSkills table and created
SkillMaster, a small table that contains a list of all the valid skills and their
descriptive names. Now you have the added bonus of being able to add a FOREIGN KEY
constraint to the EmpSkills table. This improves database integrity without adding any
additional programming code!
Listing 16.3 shows the two SQL DDL
statements that create the EmpSkills and the SkillMaster data tables. Note the use of FOREIGN
KEY constraints in the EmpSkills table.
Listing
16.3. Creating the SkillMaster and EmpSkills tables.
CREATE TABLE SkillMaster
(SkillCode TEXT(5),
SkillName TEXT(20),
CONSTRAINT PKSkillMaster PRIMARY KEY (SkillCode))
CREATE TABLE EmpSkills
(EmpID TEXT(5),
SkillCode TEXT(5),
SkillDeptID TEXT(5),
SkillLevel INTEGER,
CONSTRAINT PKSkills PRIMARY KEY (SkillCode,EmpID),
CONSTRAINT FKEmpID2 FOREIGN KEY (EmpID)
REFERENCES Employees(EmpID),
CONSTRAINT FKSkillCode FOREIGN KEY (SkillCode)
REFERENCES SkillMaster(SkillCode)); |
Use Visdata to add these two new tables to the NORMDAT2.MDB database. The
database TYSDBVB5\SOURCE\DATA\NORMDAT3.MDB contains a complete database with the
data tables Employees, EmpSkills, and SkillMaster fully populated with data. This is
demonstrated in Figure 16.4.
You now have a database that conforms to the first
two rules of data normalization. You have eliminated repeating data and redundant data.
You have one more type of data to eliminate from your tables. You handle that in the
following section.
Figure 16.4. The new
Employees, EmpSkills, and SkillMaster tables.
NOTE: Before continuing with the lesson, load
the NORMDAT3.MDB database into Visdata.
Rule 3:
Eliminate Columns Not Dependent on the Primary Key
By now, you're probably getting the idea. You are
looking for hints in the table structure that lead you into traps further down the road.
Will this table be easy to update? What happens if you delete records from this table? Is
it easy to get a comprehensive list of all the unique records in this table? Asking
questions like these can uncover problems that are not so apparent when you first build a
table.
When you are building a data table, you should also
be concerned about whether a field describes additional information about the key field.
In other words, is the field you are about to add to this table truly related to the key
field? If not, the field in question should not be added to the table. It probably needs
to be in its own table. This process of removing fields that do not describe the key field
is how you make your data tables conform to the third rule of data
normalization--eliminate columns not dependent on keys.
NOTE: The third rule of data normalization
states that if a column does not fully describe the index key, that column should be moved
to a separate table. In other words, if the columns in your table don't really need to be
in this table, they probably need to be somewhere else. Databases that follow this rule
are known to be in the Third Normal Form.
In these database examples, you have data describing
the various departments in the company stored in the Employees table. Although the DeptID
field is important to the Employees description (it describes the department to which the
employee belongs), the department-specific data should not be stored with the employee
data. Yes, you need another table. This table should contain only department-specific data
and be linked to the Employees table through the DeptID field. Table 16.3 lists the
modified Employees table and the new Departments table.
Table 16.3. The modified
Employees table and the new Departments table.
Employees |
Departments |
EmpID |
DeptID |
EmpName |
DeptName |
DeptID |
DeptLocation |
Notice that the Employees table is much simpler now that you have eliminated all unrelated
fields. Use Visdata to construct SQL DDL statements that create the new Departments table
and then modify the Employees table and the EmpSkills table to increase database integrity
(yes, more foreign keys!). First, use the SQL DDL in Listing 16.4 to create the
Departments table. Check your work against Figure 16.5.
Listing
16.4. Creating the Departments table.
CREATE TABLE Departments
(DeptID TEXT(5),
DeptName TEXT(20),
DeptLocation TEXT(20),
CONSTRAINT PKDeptID PRIMARY KEY (DeptID)) |
Now alter the Employees table. You need to do two things:
- Remove the DeptName column from the table.
- Add a FOREIGN KEY constraint to enforce
referential integrity on the Employees.DeptID field.
Listing 16.5 contains the SQL DDL
statements to create the modified Employees table.
Listing
16.5. Creating the new Employees table.
CREATE TABLE Employees
(EmpID TEXT(5),
EmpName TEXT(30),
DeptID TEXT(5),
CONSTRAINT PKEmpID PRIMARY KEY (EmpID),
CONSTRAINT FKEmpDept FOREIGN KEY (DeptID)
REFERENCES Departments(DeptID)) |
Now you need to modify the EmpSkills table to add the referential integrity check on the
EmpSkills.SkillDeptID field. The new SQL DDL should look like Listing 16.6.
Listing
16.6. Creating the new EmpSkills table.
CREATE TABLE EmpSkills2
(EmpID TEXT(5),
SkillCode TEXT(5),
SkillDeptID TEXT(5),
SkillLevel INTEGER,
CONSTRAINT PKEmpSkill2 PRIMARY KEY (SkillCode,EmpID),
CONSTRAINT FKSkillMast FOREIGNKEY (SkillCode)
REFERENCES SkillMaster(SkillCode),
CONSTRAINT FKSkillDept FOREIGN KEY (SkillDeptID)
REFERENCES Departments(DeptID)); |
The database NORMDAT4.MDB contains a complete set of tables that conform to the
third rule of data normalization. Use Visdata to load NORMDAT4.MDB and review the
data tables. Attempt to add some data that does not follow the integrity rules. Try
deleting records. This shows you how Visual Basic issues database error messages when you
try to save a record that breaks the referential integrity rules.
The first three rules of data
normalization involve the elimination of repeating, redundant, or unrelated data fields.
The last two rules involve isolating multiple relationships to improve overall database
integrity. The first three rules are usually all that you need to produce well-designed
databases. However, there are times when additional normalization can improve the quality
of your database design. In the next two sections, you learn rules 4 and 5 of data
normalization.
Figure 16.5. The
Departments table added to NORMDAT4.MDB.
Do Not Store Calculated Data in Your Tables
It is important to note here that one of the results
of the third rule of data normalization is that you should not store calculated fields in
a data table. Calculated fields are fields that contain derived data such as year-to-date
totals, a line in the invoice table that contains the totals of several other rows in the
invoice table, and so forth. Calculated fields do not describe the primary key. Calculated
fields are derived data. It is a bad practice to store derived data in live data tables.
Derived data can easily fall out of sync with the
individual rows that make up the total data. What happens if the individual rows that add
up to the total are altered or deleted? How do you make sure the row that holds the total
is updated each time any line item row is changed? Storing derived data might seem to be
faster, but it is not easier. And dealing with derived data opens your database to
possible update and delete integrity problems each time a user touches either the prime
data rows or the total data rows. Calculated data should not be stored. It should always
be computed using the prime data at the time it is needed.
NOTE: Before continuing with this lesson,
load the NORMDAT4.MDB database into Visdata.
Rule 4:
Isolate Independent Multiple Relationships
The fourth rule of data normalization concerns the
handling of independent multiple relationships. This rule is applied whenever you have
more than one one-to-many relationship on the same data table. The relationship between
the Employees table and the EmpSkills table is a one-to-many relationship. There can be
many EmpSkills records related to one Employee record. Let's add an additional attribute
of employees to create a database that has more than a single one-to-many relationship.
Assume that the Human Resources Department has
decided it needs more than just the skill names and skill levels attained for each
employee. Human Resources also wants to add the level of education attained by the
employee for that skill. For example, if the employee has an accounting skill and has an
associate's degree in guidekeeping, Human Resources wants to store the degree information,
too. If an employee has been certified as an electrician and works in the Maintenance
Department, the Human Resources group wants to know that.
The first thing you might want to do is add a new
column to the EmpSkills table--maybe a field called Degree, maybe even a field for
YearCompleted. This makes sense because each skill might have an associated education
component. It makes sense, but it is not a good idea. What about the employee who is
currently working in the Customer Service Department but has an accounting degree? Just
because the employee has a degree does not mean that employee has the skills to perform a
particular job or is working in a position directly related to his or her degree. The
degree and the job skills are independent of each other. Therefore, even though the skills
data and the degree data are related, they should be isolated in separate tables and
linked through a foreign key relationship.
NOTE: The fourth rule of data normalization
dictates that no table can contain two or more one-to-many or many-to-many relationships
that are not directly related. In other words, if the data element is important (the
college degree) but not directly related to other elements in the record (the customer
service rep with an accounting degree), you need to move the college degree element to a
new table. Databases that follow this rule are in the Fourth Normal Form.
Table 16.4 shows a sample Training table that can be
used to hold the education information for each employee. Now the Human Resources
department can keep track of education achievements independent of acquired job skills.
Note that the EmpID directly connects the two relationships. If the Training table has
only one entry per employee, the two relationships are a one-to-one relationship between
the Employees table and the Training table, and a one-to-many relationship between the
Employees table and the EmpSkills table. Of course, if any employee has more than one
degree, both relationships become one-to-many.
Table 16.4. The sample
training data table.
EmpID
Degree
YearCompleted InstitutionName |
Listing 16.7 is a sample SQL DDL statement that creates the Training data table with the
proper relationship constraint. Enter this statement in the SQL window of Visdata while
you have the NORMDAT4.MDB database open. Check your results against Figure 16.6.
Listing
16.7. Creating the Training table.
CREATE TABLE Training
(EmpID TEXT(5),
Degree TEXT(20),
YearCompleted INTEGER,
InstitutionName TEXT(30),
CONSTRAINT PKTraining PRIMARY KEY (EmpID,Degree),
CONSTRAINT FKEmpTrn FOREIGN KEY (EmpID)
REFERENCES Employees (EmpID)) |
The database NORMDAT5.MDB contains a complete version of the database normalized
up to the fourth rule of data normalization. Use Visdata to open the database and review
the table structure.
NOTE: Before continuing with the lesson, load
the NORMDAT5.MDB database into Visdata.
Figure 16.6. The Training
table shows the degree achievements for the Employees table.
Rule 5: Isolate Related Multiple Relationships
The last remaining rule of data normalization covers
the handling of related multiple relationships in a database. Unlike the fourth rule,
which deals with independent, one-to-many, multiple relationships, the fifth rule is used
to normalize related, many-to-many multiple relationships. Related, many-to-many multiple
relationships do not occur frequently in databases. However, when they do come up, these
types of data relations can cause a great deal of confusion and hassle when you're
normalizing your database. You won't invoke this rule often, but when you do it pays off!
Imagine that the Maintenance Department decides it
wants to keep track of all the large equipment used on the shop floor by various
departments. It uses this data to keep track of where the equipment is located. The
Maintenance Department also wants to keep a list of suppliers for the equipment in cases
of repair or replacement. When you were a novice, you might have decided to design a
single table that held the department ID, equipment name, and supplier name. But, as I'm
sure you have guessed by now, that is not the correct response. What if the Maintenance
Department has more than one supplier for the same type of equipment? What if a single
supplier provides more than one of the types of equipment used in the plant? What if some
departments are restricted in the suppliers they can use to repair or replace their
equipment?
NOTE: The fifth rule of data normalization
dictates that you should isolate related multiple relationships within a database. In
other words, if several complex relationships exist in your database, separate each of the
relationships into its own table. Databases that adhere to this rule are known to be in
the Fifth Normal Form.
The following list shows the relationships that have
been exposed in this example:
- Each department can have several pieces of equipment.
- Each piece of equipment can have more than one
supplier.
- Each supplier can provide a variety of pieces of
equipment.
- Each department can have a restricted list of
suppliers.
Although each of the preceding business rules are
simple, putting them all together in the database design is tough. It's the last item that
really complicates things. There is more than one way to solve this kind of puzzle. The
one suggested here is just one of the many possibilities.
First, you need to expose all the tables that you
need to contain the data. The preceding list describes two one-to-many relationships
(department to equipment, and department to supplier, with restrictions) and one
many-to-many relationship (equipment to supplier, supplier to equipment). Each of those
relationships can be expressed in simple tables. Two additional tables not mentioned, but
certainly needed, are a table of all the equipment in the building (regardless of its
location) and a table of all the suppliers (regardless of their department affiliation).
Table 16.5 shows sample field layouts for the required tables. The Equipment and Supplier
tables are shortened in this example. If you were designing these tables for a real
database project, you would add several other fields.
Table 16.5. The Fifth
Rule sample data tables.
Equipment |
Supplier |
EquipID |
SupplierID |
EquipName |
SupplierName |
DatePurchased |
SupplierAddress |
Listing 16.8 contains the SQL DDL statements to create these tables. Figure 16.7 shows the
results of executing these statements.
Listing
16.8. Creating the Equipment and the Supplier tables.
CREATE TABLE Equipment
(EquipID TEXT (10),
EquipName TEXT(30),
DatePurchased DATE,
CONSTRAINT PKEquipID PRIMARY KEY (EquipID))
CREATE TABLE Supplier
(SupplierID TEXT (10),
SupplierName TEXT(30),
SupplierAddress MEMO,
CONSTRAINT PKSupplier PRIMARY KEY (SupplierID)) |
Figure 16.7. Supplier and
Equipment tables in NORMDAT6.MDB.
The next two data tables describe the relationships between Supplier and Equipment and
between Supplier and Departments. You remember that departments can be restricted to
certain suppliers when repairing or replacing equipment. By setting up a table such as the
DeptSupplier table described next, you can easily maintain a list of valid suppliers for
each department. Similarly, as new suppliers are discovered for equipment, they can be
added to the EquipSupplier table. Refer to Table 16.6 for a sample list of fields.
Table 16.6.
EquipSupplier and DeptSupplier tables.
EquipSupplier |
DeptSupplier |
EquipID |
DeptID |
SupplierID |
SupplierID |
These two tables are short because they are only needed to enforce expressed simple
relationships between existing data tables. Creating small tables such as these is a handy
way to reduce complex relationships to more straightforward ones. It is easier to create
meaningful CONSTRAINT clauses when the tables are kept simple, too. The SQL DDL
statements for these two tables appear in Listing 16.9. The result of executing these
statements in Visdata appears in Figure 16.8.
Figure 16.8. EquipSupplier
and DeptSupplier tables.
Listing 16.9. Creating the EquipSupplier and DeptSupplier
tables.
CREATE TABLE EquipSupplier
(EquipID TEXT(10),
SupplierID TEXT(10),
CONSTRAINT PKEqSpl PRIMARY KEY (EquipID,SupplierID),
CONSTRAINT FKEqSplEquip FOREIGN KEY (EquipID)
REFERENCES Equipment(EquipID),
CONSTRAINT FKEqSplSupplier FOREIGN KEY (SupplierID)
REFERENCES Supplier(SupplierID))
CREATE TABLE DeptSupplier
(DeptID TEXT(5),
SupplierID TEXT(10),
CONSTRAINT PKDeptSpl PRIMARY KEY (DeptID,SupplierID),
CONSTRAINT FKDptSplDept FOREIGN KEY (DeptID)
REFERENCES Departments(DeptID),
CONSTRAINT FKDptSplSupplier FOREIGN KEY (SupplierID)
REFERENCES Supplier(SupplierID)) |
Notice that, in these two tables, the CONSTRAINT definitions are longer than the
field de-finitions. This is common when you begin to use the power database integrity
aspects of SQL databases.
Finally, you need a single table that expresses the
Equipment-Supplier-Department relationship. This table shows which department has which
equipment supplied by which supplier. More importantly, you can build this final table
with tight constraints that enforce all these business rules. Both the Department-Supplier
relationship and the Equipment-Supplier relationship are validated before the record is
saved to the database. This is a powerful data validation tool--all without writing any
Visual Basic code! Table 16.7 and the SQL DDL statement in Listing 16.10 show how this
table can be constructed. See Figure 16.9 to review the results of executing these
statements.
Table 16.7. The
Department-Equipment-Supplier data table.
DeptID
EquipID SupplierID |
Listing 16.10. Creating the DeptEqpSuplr table.
CREATE TABLE DeptEqpSuplr
(DeptID TEXT(5),
EquipID TEXT(10),
SupplierID TEXT(10),
CONSTRAINT PFDeptEq PRIMARY KEY (DeptID, EquipID),
CONSTRAINT FKEqSupl FOREIGN KEY (EquipID,SupplierID)
REFERENCES EquipSupplier(EquipID,SupplierID),
CONSTRAINT FKDeptSupl FOREIGN KEY (DeptID,SupplierID)
REFERENCES DeptSupplier(DeptID,SupplierID)) |
The Microsoft Access database NORMDAT6.MDB contains a set of live data for the
tables described in this section. Use Visdata to open the database and review the table
structure. Try adding or deleting records in ways that would break integrity rules. Notice
that none of the last three tables defined (EquipSupplier, DeptSupplier, and DeptEqpSuplr)
allow edits on any existing record. This is because you defined the primary key as having
all the fields in a record. Because you cannot edit a primary key value, you must first
delete the record, and then add the modified version to the data table.
Figure 16.9. The
EquipSupplier, DeptSupplier, and DeptEqpSuplr tables.
Summary
In today's lesson, you learned how to improve
database integrity and access speed using the five rules of data normalization. You
learned the following five rules:
- Rule 1: Eliminate Repeating Groups. If you have a set
of fields that have the same name followed by a number (Skill1, Skill2, Skill3, and so
forth), remove these repeating groups, create a new table for the repeating data, and
relate it to the key field in the first table.
- Rule 2: Eliminate Redundant Data. Don't store the
same data in two different locations. This can lead to update and delete errors. If
equivalent data elements are entered in two fields, remove the second data element, create
a new master table with the element and its partner as a key field, and then place the key
field as a relationship in the locations that formerly held both data elements.
- Rule 3: Eliminate Columns Not Dependent on Keys. If
you have data elements that are not directly related to the primary key of the table,
these elements should be removed to their own data table. Only store data elements that
are directly related to the primary key of the table. This particularly includes derived
data or other calculations.
- Rule 4: Isolate Independent Multiple Relationships.
Use this rule to improve database design when you are dealing with more than one
one-to-many relationship in the database. Before you add a new field to a table, ask
yourself whether this field is really dependent upon the other fields in the table. If
not, create a new table with the independent data.
- Rule 5: Isolate Related Multiple Relationships. Use
this rule to improve database design when you are dealing with more than one many-to-many
relationship in the database. If you have database rules that require multiple references
to the same field or sets of fields, isolate the fields into smaller tables and construct
one or more link tables that contain the required constraints that enforce database
integrity.
Quiz
- 1. Is it a good idea to optimize your database
strictly for speed?
2. What is meant by the term First Normal Form?
3. Explain how the second rule of data normalization differs from the first rule of
normalization.
4. Should you include fields in a data table that are the calculated results of
other fields in the same table?
5. When would you invoke the fourth rule of data normalization?
6. When would you invoke the fifth rule of data normalization?
Exercises
- 1. As a computer consultant, you have landed a
contract to build a customer tracking system for your local garage. After several days of
interviews with the owner, mechanics, and staff members, you have determined that the
following data fields should be included in your database. Many of the customers of this
garage have more than one automobile. Therefore, you are requested to leave room for
tracking two cars per customer.
Use these fields: CustomerID, CustomerName, Address, City, State, Zip, Phone,
SerialNumber, License, VehicleType1, Make1, Model1, Color1, Odometer1, VehicleType2,
Make2, Model2, Color2, Odometer2.
Optimize this data into tables using the rules of data normalization discussed in today's
lesson. Identify all primary and foreign keys.
2. Write the SQL statements that create the tables you designed in Exercise 1.
-