Database in VB Free Tutorial

Lean Data_VB
	First Program
	Creating Database
	Database Objects
	Data Entry Forms
	Input Validation
	Crystal Reports
	Visdata Program
	Selecting Data
	Micro-Jet Engine
	Visual Basic Code
	Graphs
	List Boxes, Grids
	SQL
	Error Handling
	Updating Database
	Normalization
	Multiuser
	Data Control
	ODBC
	Replication
	Security

Free Tutorials

HTML

Learn HTML

Learn CSS

Learn XML

Learn WML

Database

Learn Access

Learn Data-VB

Learn Oracle

Learn SQL

Programming

Learn C++

Learn JavaScript

Learn Vbscript

Learn VisualBasic

Chapter 16

Database Normalization

Chapter 16 Database Normalization

Now that you understand the Data Definition Language (DDL) portion of SQL, it's time to apply that new knowledge to a lesson on database theory. Today you learn about the concept of data normalization. You develop a working definition of data normalization and learn about the advantages of normalizing your databases. You also explore each of the five rules of data normalization, including reasons for applying these rules. When you have completed today's lesson, you will be able to identify ways to use data normalization to improve database integrity and performance.

Throughout today's lesson, you normalize a real database using the data definition SQL statements you learned about on Day 13 ("Creating Databases with SQL") and Day 15 ("Updating Databases with SQL"), and by using Visual Basic's Visdata application that you learned about in the first week (see Day 7, "Using the Visdata Program").

The topic of data normalization could easily take up an entire guide--and there are several excellent guides on it. This lesson approaches data normalization from a practical standpoint rather than a theoretical standpoint.Here you focus on two particular questions: What are the rules? How can these rules help me improve my Visual Basic database applications? To start, let's develop a working definition of data normalization and talk about why it can improve your Visual Basic applications.

What Is Data Normalization?

Data normalization is a process of refining database structures to improve the speed at which data can be accessed and to increase database integrity. This is not easy. Very often, optimizing a table for speed is not the same as optimizing for integrity. Putting a database together involves discovering the data elements you need and then creating a set of tables to hold those elements. The tables and fields you define make up the structure of the database. The structure you decide upon affects the performance of your database programs. Some database layouts can improve access speed. For example, placing all related information in a single table allows your programs to locate all needed data by looking in one place. On the other hand, you can lay out your database in a way that improves data integrity. For example, placing all the invoice line item data in one table and the invoice address information in another table prevents users from deleting complete addresses when they remove invoice line items from the database. Well-normalized databases strike a balance between speed and integrity.

High-speed tables have few index constraints and can have several, sometimes repetitive, fields in a single record. The few constraints make updates, insertions, and deletions faster. The repetitive fields make it easier to load large amounts of data in a single SQL statement instead of finding additional, related data in subsidiary tables linked through those slower index constraints.

Databases built for maximum integrity have many small data tables. Each of these tables can have several indexes--mostly foreign keys referencing other tables in the database. If a table is built with high integrity in mind, it is difficult to add invalid data to the database without firing off database error messages. Of course, all that integrity checking eats precious ticks off the microchip clock.

Good data normalization results in data tables that make sense in a fundamental way. Well-normalized tables are easy to understand when you look at them. It is easy to see what kind of data they store and what types of updates need to be performed. Usually, it is rather easy to create data entry routines and simple reports directly from well-normalized tables. In fact, the rule of thumb is this: If it's hard to work with a data table, it probably needs more normalization work.

For the rest of this lesson, you use the Visdata application to build data tables. If you have not already looked at the lesson on Day 7, turn there first for information on how to use Visdata to maintain relational databases.

A Typical Database Before Normalization

To illustrate the process of normalization, let's start with an existing database table. The database NORMDAT1.MDB can be found in the TYSDBVB5\SOURCE\DATA directory of the CD that shipped with this guide. Load this into the Visdata application and open the Table1 data table in design mode. Your screen should look something like the one in Figure 16.1.

Figure 16.1. Displaying Table1 before normalization.

This data table holds information about employees of a small company. The table contains fields for the employee ID and employee name, and the ID, name, and location of the department to which this employee is currently assigned. It also includes fields for tracking the employee's job skills, including the skill code, the name, the department in which the skill was learned, and the ability level that the employee has attained for the designated skill. Up to three different skills can be maintained for each employee.

This table is rather typical of those you find in existing record-oriented databases. It is designed to quickly give users all the available information on a single employee. It is also a fairly simple task to build a data entry form for this data table. The single form can hold the employee and department fields at the top of the form and the three skill field sets toward the bottom of the form. Figure 16.2 shows a simple data form for this table generated by Visdata.

Access to the information in the table is fast and the creation of a data entry screen is easy. So this is a well-normalized table, right? Wrong. Three of the five rules of normalization that you learn in the rest of this lesson are broken, and the other two are in jeopardy! Some of the problems are obvious, some are not. Let's go through each of the five rules of normalization and see how applying these rules can improve the data table.

Figure 16.2. The data entry form for Table1.

Rule 1: Eliminate Repeating Groups

The first area in which Table1 needs some work is in the repeating skill fields. Why include columns in the data table called SkillCode1, SkillCode2, SkillCode3, or SkillName1, SkillName2, SkillName3, and so forth? You want to be able to store more than one set of skills for an employee, right? But what if you want to store data on more than three skills acquired by a single employee? What if most of the employees only have one or two skills, and very few have three skills? Why waste the blank space for the third skill? Even more vexing, how easy will it be to locate all employees in the data table that have a particular skill?

NOTE: The first rule of data normalization states that you should make a separate table for each set of related columns and give each table a primary key. Databases that adhere to this first rule of normalization are said to be in the First Normal Form.

The first rule of data normalization is to eliminate repeating groups of data in a data table. Repeating groups of data, such as the skill fields (SkillCodeX, SkillNameX, SkillDeptIDX, and SkillLevelX), usually indicates the need for an additional table. Creating the related table greatly improves readability and allows you to keep as few or as many skill sets for each employee as you need without wasting storage space.

The fields that relate to employee skills need to be separated from the others in the table. You don't need to put all 12 skill fields in the new table, though. You only need one of each of the unique data fields. The new database now has not one, but two data tables. One, called Skills, contains only the skill fields. The other table, called Employees, contains the rest of the fields. Table 16.1 shows how the two new tables look.

Table 16.1. Eliminating repeating data.

*Skills Table*	*Employees Table*
EmpID	EmpID
SkillCode	EmpName
SkillName	DeptID
SkillDeptID	DeptName
SkillLevel	DeptLocation

Notice that the first field in both tables is the EmpID field. This field is used to relate the two tables. Each record in the Skill table contains the employee ID and all pertinent data on a single job skill (code, name, department learned, and ability level). If a single employee has several skills, there is a single record in the Skill table for each job skill acquired by an employee. For example, if a single employee has acquired five skills, there are five records with the same employee ID in the Skills table.

Each record in the Skills table must contain a valid value in the EmpID field or it should be rejected. In other words, each time a record is added to the Skills table, the value in the EmpID field should be checked against values in the EmpID field of the Employees table. If no match is found, the Skills record must be corrected before it is written to the database. You remember from the discussion of SQL Data Definition Language statements on Day 13 that this is a FOREIGN KEY CONSTRAINT. The field EmpID in the Skills table is a foreign key that references the field EmpID in the Employees table. Also, the EmpID field in the Employees table should be a primary field to make sure that each record in the Employee table has a unique EmpID value.

Now that you know the fields and index constraints you need, you can use SQL DDL to create two new tables. If you have not already done so, start the Visdata application and open the NORMDAT1.MDB database. Now you create two new tables that bring the database into compliance with the first rule of data normalization.

First, create the table that holds all the basic employee data. This table has all the fields that were in the Table1 table, minus the skill fields. Using the information in Table 16.1 as a guide, enter an SQL DDL statement in the SQL window of Visdata that creates the Employees data table. Your SQL statement should resemble Listing 16.1.

Listing 16.1. Creating the Employees table.

CREATE TABLE Employees
   (EmpID TEXT(5),
    EmpName TEXT(30),
    DeptID TEXT(5),
    DeptName TEXT(20),
    DeptLocation TEXT(20),

CONSTRAINT PKEmpID PRIMARY KEY (EmpID)); The EmpID field has been designated as a primary key field. This guarantees that no two records in the Employees data table can have the same EmpID value. You can use the EmpID field in the next table you create (the Skills table) as the reference field that links the two tables. Because you are using the EmpID field as a link, it must be a unique value in the Employees table in order to maintain database integrity. What you are doing here is setting up a one-to-many relationship between the Employees table (the one-side) and the Skills table (the many-side). Any time you establish a one-to-many relationship, you must make sure that the reference field (in this case, the EmpID field) is unique on the one-side of the relationship.

Now that you have built the Employees table, you can create the table that holds all the skills data. Use the information in Table 16.1 to write an SQL DDL statement that creates a table called Skills. Make sure the new table has the field EmpID and that the EmpID field is built with the correct index constraint to enforce one-to-many database integrity. Your SQL statement should look like the one in Listing 16.2.

Listing 16.2. Creating the Skills table.

CREATE TABLE Skills
   (EmpID TEXT(5),
    SkillCode TEXT(5),
    SkillName TEXT(20),
    SkillDeptID TEXT(5),
    SkillLevel INTEGER,
    CONSTRAINT PKSkills PRIMARY KEY (SkillCode,EmpID),

CONSTRAINT FKEmpID FOREIGN KEY (EmpID) REFERENCES Employees(EmpID)); You can see in Listing 16.2 that you have used the FOREIGN KEY_REFERENCES syntax to establish and maintain the table relationship. As you remember from the SQL lessons on Day 13 and Day 15, the FOREIGN KEY_REFERENCES syntax makes sure that any entry in the Skills.EmpID field can be found in the related Employees.EmpID field. If users enter a value in the Skills.EmpID field that cannot be found in any Employees.EmpID field, Visual Basic automatically issues a database error message. This message is generated by Visual Basic, not by your program.

That is how you build tables that adhere to the first rule of data normalization. To see how these tables look when they have live data in them, use Visdata to load the TYSDBVB5\ SOURCE\DATA\NORMDAT2.MDB database. This database contains the Employees and Skills tables with data already loaded into them. Figure 16.3 shows how Visdata displays the two new tables that have live data.

Figure 16.3. The new Employees and Skills tables from NORMDAT2.MDB.

NOTE: Before continuing with today's lesson, load the NORMDAT2.MDB database into Visdata.

Rule 2: Eliminate Redundant Data

Another aspect of the Skills table also needs attention. Although moving the repeating skills fields into a separate table improves the database, you still have work to do. The Skills table contains redundant data. That is, data is stored in several places in the database. Redundant data in your database can lead to serious database integrity problems. It's best to eliminate as many occurrences of redundant data as possible.

NOTE: The second rule of data normalization states that if a column depends only on part of a multivalued key, you remove it to a separate table. In other words, if you need to fill in two fields in order to truly identify the record (JobID and JobName), but only one of those fields is needed to perform a lookup in the table, you need a new table. Databases that conform to this rule are said to be in the Second Normal Form.

For example, the Skills table includes a field called SkillCode. This field contains a code that identifies the specific skill (or skills) each employee has acquired. If two employees have gained the same skill, that skill appears twice in the Skills file. The same table also includes a field called SkillName. This field contains a meaningful name for the skill represented by the value in the SkillCode field. This name is much more readable and informative than the SkillCode value. In essence, these two fields contain the same data, represented slightly differently. This is the dreaded redundant data you have to eliminate!

Before you jump into fixing things, first review the details regarding redundant data and how it can adversely affect the integrity of your database.

Update Integrity Problems

When you keep copies of data elements in several rows in the same table or in several different tables (such as job names to go with job ID codes), you have a lot of work ahead of you when you want to modify the copied data. If you fail to update one or more of these copies, you can ruin the integrity of your database. Redundant data can lead to what are known as update integrity problems.

Imagine that you have built a huge database of employee skills using the tables you built in the preceding section. All is going great when, suddenly, the Human Resources Department informs you that it has designed a new set of names for the existing skill codes. You now have to go through the entire database and update all the records in the Skills table, searching out the old skill name and updating the SkillName field with the new skill name. Because this is an update for the entire data table, you have to shut down the database until the job is complete in order to make sure no one is editing records while you're performing this update. Also, you probably have to change some Visual Basic code that you built to verify the data entry. All in all, it's a nasty job. If that isn't enough, how about a little power outage in the middle of your update run? Now you have some records with the old names, and some with the new names. Things are really messed up!

Delete Integrity Problems

Although the update integrity problem is annoying, you can suffer through most of those problems. In fact, almost all database programmers have had to face similar problems before. The more troublesome integrity problem resulting from redundant data comes not during updates, but during deletes. Let's assume you have properly handled the mass update required by the Human Resources Department. Then you discover that there is only one employee in the entire database that has the SkillCode S099 (Advanced Customer Service course). No other employee has attained this high level of training. Now, that employee is leaving the organization. When you delete the employee record from the file, you delete the only reference to the Advanced Customer Service course! There is no longer any record of the Advanced Customer Service course in your entire database, which is a real problem.

The Normalization Solution

The way to reduce these kinds of data integrity problems is to pull out the redundant data and place it in a separate table. You need a single table, called SkillMaster, that contains only the SkillCode and the SkillName data fields. This table is linked to the Skills table through the SkillCode field. Now, when the Human Resources department changes the skill names, you only need to update a single record--the one in the SkillMaster table. Because the Skills table is linked to the SkillMaster table, when you delete the employee with the certification for SkillCode S099, you don't delete the last reference to the skill. It's still in the SkillMaster table.

TIP: Another plus to this type of table separation is in speeding data entry. With only one field to enter, and especially a brief code, data entry operators can more quickly fill in fields on the table's form.

Also, you now have a single table that lists all the unique skills that can be acquired by your employees. You can produce a Skills list for employees and managers to review. If you add fields that group the skills by department, you can even produce a report that shows all the skills by department. This would be very difficult if you were stuck with the file structure you developed in the preceding section.

So let's redefine the Skills table and the SkillMaster table to conform to the second rule of data normalization. Table 16.2 shows the fields you need for the two tables.

Table 16.2. The field list for the Skills and SkillMaster tables.

EmpSkills	SkillMaster
EmpID	SkillCode
SkillCode	SkillName
SkillDeptID
SkillLevel

You can see that you have renamed the Skills table to EmpSkills to better reflect its contents. You have also moved the SkillName field out of the EmpSkills table and created SkillMaster, a small table that contains a list of all the valid skills and their descriptive names. Now you have the added bonus of being able to add a FOREIGN KEY constraint to the EmpSkills table. This improves database integrity without adding any additional programming code!

Listing 16.3 shows the two SQL DDL statements that create the EmpSkills and the SkillMaster data tables. Note the use of FOREIGN KEY constraints in the EmpSkills table.

Listing 16.3. Creating the SkillMaster and EmpSkills tables.

CREATE TABLE SkillMaster
   (SkillCode TEXT(5),
    SkillName TEXT(20),
    CONSTRAINT PKSkillMaster PRIMARY KEY (SkillCode))
CREATE TABLE EmpSkills
   (EmpID TEXT(5),
    SkillCode TEXT(5),
    SkillDeptID TEXT(5),
    SkillLevel INTEGER,
    CONSTRAINT PKSkills PRIMARY KEY (SkillCode,EmpID),
    CONSTRAINT FKEmpID2 FOREIGN KEY (EmpID)
       REFERENCES Employees(EmpID),
    CONSTRAINT FKSkillCode FOREIGN KEY (SkillCode)
REFERENCES SkillMaster(SkillCode));

Use Visdata to add these two new tables to the NORMDAT2.MDB database. The database TYSDBVB5\SOURCE\DATA\NORMDAT3.MDB contains a complete database with the data tables Employees, EmpSkills, and SkillMaster fully populated with data. This is demonstrated in Figure 16.4.

You now have a database that conforms to the first two rules of data normalization. You have eliminated repeating data and redundant data. You have one more type of data to eliminate from your tables. You handle that in the following section.

Figure 16.4. The new Employees, EmpSkills, and SkillMaster tables.

NOTE: Before continuing with the lesson, load the NORMDAT3.MDB database into Visdata.

Rule 3: Eliminate Columns Not Dependent on the Primary Key

By now, you're probably getting the idea. You are looking for hints in the table structure that lead you into traps further down the road. Will this table be easy to update? What happens if you delete records from this table? Is it easy to get a comprehensive list of all the unique records in this table? Asking questions like these can uncover problems that are not so apparent when you first build a table.

When you are building a data table, you should also be concerned about whether a field describes additional information about the key field. In other words, is the field you are about to add to this table truly related to the key field? If not, the field in question should not be added to the table. It probably needs to be in its own table. This process of removing fields that do not describe the key field is how you make your data tables conform to the third rule of data normalization--eliminate columns not dependent on keys.

NOTE: The third rule of data normalization states that if a column does not fully describe the index key, that column should be moved to a separate table. In other words, if the columns in your table don't really need to be in this table, they probably need to be somewhere else. Databases that follow this rule are known to be in the Third Normal Form.

In these database examples, you have data describing the various departments in the company stored in the Employees table. Although the DeptID field is important to the Employees description (it describes the department to which the employee belongs), the department-specific data should not be stored with the employee data. Yes, you need another table. This table should contain only department-specific data and be linked to the Employees table through the DeptID field. Table 16.3 lists the modified Employees table and the new Departments table.

Table 16.3. The modified Employees table and the new Departments table.

*Employees*	*Departments*
EmpID	DeptID
EmpName	DeptName
DeptID	DeptLocation

Notice that the Employees table is much simpler now that you have eliminated all unrelated fields. Use Visdata to construct SQL DDL statements that create the new Departments table and then modify the Employees table and the EmpSkills table to increase database integrity (yes, more foreign keys!). First, use the SQL DDL in Listing 16.4 to create the Departments table. Check your work against Figure 16.5.

Listing 16.4. Creating the Departments table.

CREATE TABLE Departments
(DeptID TEXT(5),
DeptName TEXT(20),
DeptLocation TEXT(20),
CONSTRAINT PKDeptID PRIMARY KEY (DeptID))

Now alter the Employees table. You need to do two things:

Remove the DeptName column from the table.
Add a FOREIGN KEY constraint to enforce referential integrity on the Employees.DeptID field.

Listing 16.5 contains the SQL DDL statements to create the modified Employees table.

Listing 16.5. Creating the new Employees table.

CREATE TABLE Employees
   (EmpID TEXT(5),
    EmpName TEXT(30),
    DeptID TEXT(5),
    CONSTRAINT PKEmpID PRIMARY KEY (EmpID),
    CONSTRAINT FKEmpDept FOREIGN KEY (DeptID)
REFERENCES Departments(DeptID))

Now you need to modify the EmpSkills table to add the referential integrity check on the EmpSkills.SkillDeptID field. The new SQL DDL should look like Listing 16.6.

Listing 16.6. Creating the new EmpSkills table.

CREATE TABLE EmpSkills2
   (EmpID TEXT(5),
    SkillCode TEXT(5),
    SkillDeptID TEXT(5),
    SkillLevel INTEGER,
    CONSTRAINT PKEmpSkill2 PRIMARY KEY (SkillCode,EmpID),
    CONSTRAINT FKSkillMast FOREIGNKEY (SkillCode)
       REFERENCES SkillMaster(SkillCode),
    CONSTRAINT FKSkillDept FOREIGN KEY (SkillDeptID)
REFERENCES Departments(DeptID));

The database NORMDAT4.MDB contains a complete set of tables that conform to the third rule of data normalization. Use Visdata to load NORMDAT4.MDB and review the data tables. Attempt to add some data that does not follow the integrity rules. Try deleting records. This shows you how Visual Basic issues database error messages when you try to save a record that breaks the referential integrity rules.

The first three rules of data normalization involve the elimination of repeating, redundant, or unrelated data fields. The last two rules involve isolating multiple relationships to improve overall database integrity. The first three rules are usually all that you need to produce well-designed databases. However, there are times when additional normalization can improve the quality of your database design. In the next two sections, you learn rules 4 and 5 of data normalization.

Figure 16.5. The Departments table added to NORMDAT4.MDB.

Do Not Store Calculated Data in Your Tables

It is important to note here that one of the results of the third rule of data normalization is that you should not store calculated fields in a data table. Calculated fields are fields that contain derived data such as year-to-date totals, a line in the invoice table that contains the totals of several other rows in the invoice table, and so forth. Calculated fields do not describe the primary key. Calculated fields are derived data. It is a bad practice to store derived data in live data tables.

Derived data can easily fall out of sync with the individual rows that make up the total data. What happens if the individual rows that add up to the total are altered or deleted? How do you make sure the row that holds the total is updated each time any line item row is changed? Storing derived data might seem to be faster, but it is not easier. And dealing with derived data opens your database to possible update and delete integrity problems each time a user touches either the prime data rows or the total data rows. Calculated data should not be stored. It should always be computed using the prime data at the time it is needed.

NOTE: Before continuing with this lesson, load the NORMDAT4.MDB database into Visdata.

Rule 4: Isolate Independent Multiple Relationships

The fourth rule of data normalization concerns the handling of independent multiple relationships. This rule is applied whenever you have more than one one-to-many relationship on the same data table. The relationship between the Employees table and the EmpSkills table is a one-to-many relationship. There can be many EmpSkills records related to one Employee record. Let's add an additional attribute of employees to create a database that has more than a single one-to-many relationship.

Assume that the Human Resources Department has decided it needs more than just the skill names and skill levels attained for each employee. Human Resources also wants to add the level of education attained by the employee for that skill. For example, if the employee has an accounting skill and has an associate's degree in guidekeeping, Human Resources wants to store the degree information, too. If an employee has been certified as an electrician and works in the Maintenance Department, the Human Resources group wants to know that.

The first thing you might want to do is add a new column to the EmpSkills table--maybe a field called Degree, maybe even a field for YearCompleted. This makes sense because each skill might have an associated education component. It makes sense, but it is not a good idea. What about the employee who is currently working in the Customer Service Department but has an accounting degree? Just because the employee has a degree does not mean that employee has the skills to perform a particular job or is working in a position directly related to his or her degree. The degree and the job skills are independent of each other. Therefore, even though the skills data and the degree data are related, they should be isolated in separate tables and linked through a foreign key relationship.

NOTE: The fourth rule of data normalization dictates that no table can contain two or more one-to-many or many-to-many relationships that are not directly related. In other words, if the data element is important (the college degree) but not directly related to other elements in the record (the customer service rep with an accounting degree), you need to move the college degree element to a new table. Databases that follow this rule are in the Fourth Normal Form.

Table 16.4 shows a sample Training table that can be used to hold the education information for each employee. Now the Human Resources department can keep track of education achievements independent of acquired job skills. Note that the EmpID directly connects the two relationships. If the Training table has only one entry per employee, the two relationships are a one-to-one relationship between the Employees table and the Training table, and a one-to-many relationship between the Employees table and the EmpSkills table. Of course, if any employee has more than one degree, both relationships become one-to-many.

Table 16.4. The sample training data table.

EmpID

Degree

YearCompleted InstitutionName

Listing 16.7 is a sample SQL DDL statement that creates the Training data table with the proper relationship constraint. Enter this statement in the SQL window of Visdata while you have the NORMDAT4.MDB database open. Check your results against Figure 16.6.

Listing 16.7. Creating the Training table.

CREATE TABLE Training
   (EmpID TEXT(5),
    Degree TEXT(20),
    YearCompleted INTEGER,
    InstitutionName TEXT(30),
    CONSTRAINT PKTraining PRIMARY KEY (EmpID,Degree),
    CONSTRAINT FKEmpTrn FOREIGN KEY (EmpID)
REFERENCES Employees (EmpID))

The database NORMDAT5.MDB contains a complete version of the database normalized up to the fourth rule of data normalization. Use Visdata to open the database and review the table structure.

NOTE: Before continuing with the lesson, load the NORMDAT5.MDB database into Visdata.

Figure 16.6. The Training table shows the degree achievements for the Employees table.

Rule 5: Isolate Related Multiple Relationships

The last remaining rule of data normalization covers the handling of related multiple relationships in a database. Unlike the fourth rule, which deals with independent, one-to-many, multiple relationships, the fifth rule is used to normalize related, many-to-many multiple relationships. Related, many-to-many multiple relationships do not occur frequently in databases. However, when they do come up, these types of data relations can cause a great deal of confusion and hassle when you're normalizing your database. You won't invoke this rule often, but when you do it pays off!

Imagine that the Maintenance Department decides it wants to keep track of all the large equipment used on the shop floor by various departments. It uses this data to keep track of where the equipment is located. The Maintenance Department also wants to keep a list of suppliers for the equipment in cases of repair or replacement. When you were a novice, you might have decided to design a single table that held the department ID, equipment name, and supplier name. But, as I'm sure you have guessed by now, that is not the correct response. What if the Maintenance Department has more than one supplier for the same type of equipment? What if a single supplier provides more than one of the types of equipment used in the plant? What if some departments are restricted in the suppliers they can use to repair or replace their equipment?

NOTE: The fifth rule of data normalization dictates that you should isolate related multiple relationships within a database. In other words, if several complex relationships exist in your database, separate each of the relationships into its own table. Databases that adhere to this rule are known to be in the Fifth Normal Form.

The following list shows the relationships that have been exposed in this example:

Each department can have several pieces of equipment.
Each piece of equipment can have more than one supplier.
Each supplier can provide a variety of pieces of equipment.
Each department can have a restricted list of suppliers.

Although each of the preceding business rules are simple, putting them all together in the database design is tough. It's the last item that really complicates things. There is more than one way to solve this kind of puzzle. The one suggested here is just one of the many possibilities.

First, you need to expose all the tables that you need to contain the data. The preceding list describes two one-to-many relationships (department to equipment, and department to supplier, with restrictions) and one many-to-many relationship (equipment to supplier, supplier to equipment). Each of those relationships can be expressed in simple tables. Two additional tables not mentioned, but certainly needed, are a table of all the equipment in the building (regardless of its location) and a table of all the suppliers (regardless of their department affiliation). Table 16.5 shows sample field layouts for the required tables. The Equipment and Supplier tables are shortened in this example. If you were designing these tables for a real database project, you would add several other fields.

Table 16.5. The Fifth Rule sample data tables.

*Equipment*	*Supplier*
EquipID	SupplierID
EquipName	SupplierName
DatePurchased	SupplierAddress

Listing 16.8 contains the SQL DDL statements to create these tables. Figure 16.7 shows the results of executing these statements.

Listing 16.8. Creating the Equipment and the Supplier tables.

CREATE TABLE Equipment
   (EquipID TEXT (10),
    EquipName TEXT(30),
    DatePurchased DATE,
    CONSTRAINT PKEquipID PRIMARY KEY (EquipID))
CREATE TABLE Supplier
   (SupplierID TEXT (10),
    SupplierName TEXT(30),
    SupplierAddress MEMO,
CONSTRAINT PKSupplier PRIMARY KEY (SupplierID))

Figure 16.7. Supplier and Equipment tables in NORMDAT6.MDB.

The next two data tables describe the relationships between Supplier and Equipment and between Supplier and Departments. You remember that departments can be restricted to certain suppliers when repairing or replacing equipment. By setting up a table such as the DeptSupplier table described next, you can easily maintain a list of valid suppliers for each department. Similarly, as new suppliers are discovered for equipment, they can be added to the EquipSupplier table. Refer to Table 16.6 for a sample list of fields.

Table 16.6. EquipSupplier and DeptSupplier tables.

*EquipSupplier*	*DeptSupplier*
EquipID	DeptID
SupplierID	SupplierID

These two tables are short because they are only needed to enforce expressed simple relationships between existing data tables. Creating small tables such as these is a handy way to reduce complex relationships to more straightforward ones. It is easier to create meaningful CONSTRAINT clauses when the tables are kept simple, too. The SQL DDL statements for these two tables appear in Listing 16.9. The result of executing these statements in Visdata appears in Figure 16.8.

Figure 16.8. EquipSupplier and DeptSupplier tables.

Listing 16.9. Creating the EquipSupplier and DeptSupplier tables.

CREATE TABLE EquipSupplier
   (EquipID TEXT(10),
    SupplierID TEXT(10),
    CONSTRAINT PKEqSpl PRIMARY KEY (EquipID,SupplierID),
    CONSTRAINT FKEqSplEquip FOREIGN KEY (EquipID)
       REFERENCES Equipment(EquipID),
    CONSTRAINT FKEqSplSupplier FOREIGN KEY (SupplierID)
       REFERENCES Supplier(SupplierID))
CREATE TABLE DeptSupplier
   (DeptID TEXT(5),
    SupplierID TEXT(10),
    CONSTRAINT PKDeptSpl PRIMARY KEY (DeptID,SupplierID),
    CONSTRAINT FKDptSplDept FOREIGN KEY (DeptID)
        REFERENCES Departments(DeptID),
    CONSTRAINT FKDptSplSupplier FOREIGN KEY (SupplierID)
REFERENCES Supplier(SupplierID))

Notice that, in these two tables, the CONSTRAINT definitions are longer than the field de-finitions. This is common when you begin to use the power database integrity aspects of SQL databases.

Finally, you need a single table that expresses the Equipment-Supplier-Department relationship. This table shows which department has which equipment supplied by which supplier. More importantly, you can build this final table with tight constraints that enforce all these business rules. Both the Department-Supplier relationship and the Equipment-Supplier relationship are validated before the record is saved to the database. This is a powerful data validation tool--all without writing any Visual Basic code! Table 16.7 and the SQL DDL statement in Listing 16.10 show how this table can be constructed. See Figure 16.9 to review the results of executing these statements.

Table 16.7. The Department-Equipment-Supplier data table.

DeptID

EquipID SupplierID

Listing 16.10. Creating the DeptEqpSuplr table.

CREATE TABLE DeptEqpSuplr
   (DeptID TEXT(5),
    EquipID TEXT(10),
    SupplierID TEXT(10),
    CONSTRAINT PFDeptEq PRIMARY KEY (DeptID, EquipID),
    CONSTRAINT FKEqSupl FOREIGN KEY (EquipID,SupplierID)
        REFERENCES EquipSupplier(EquipID,SupplierID),
    CONSTRAINT FKDeptSupl FOREIGN KEY (DeptID,SupplierID)
REFERENCES DeptSupplier(DeptID,SupplierID))

The Microsoft Access database NORMDAT6.MDB contains a set of live data for the tables described in this section. Use Visdata to open the database and review the table structure. Try adding or deleting records in ways that would break integrity rules. Notice that none of the last three tables defined (EquipSupplier, DeptSupplier, and DeptEqpSuplr) allow edits on any existing record. This is because you defined the primary key as having all the fields in a record. Because you cannot edit a primary key value, you must first delete the record, and then add the modified version to the data table.

Figure 16.9. The EquipSupplier, DeptSupplier, and DeptEqpSuplr tables.

Summary

In today's lesson, you learned how to improve database integrity and access speed using the five rules of data normalization. You learned the following five rules:

Rule 1: Eliminate Repeating Groups. If you have a set of fields that have the same name followed by a number (Skill1, Skill2, Skill3, and so forth), remove these repeating groups, create a new table for the repeating data, and relate it to the key field in the first table.
Rule 2: Eliminate Redundant Data. Don't store the same data in two different locations. This can lead to update and delete errors. If equivalent data elements are entered in two fields, remove the second data element, create a new master table with the element and its partner as a key field, and then place the key field as a relationship in the locations that formerly held both data elements.
Rule 3: Eliminate Columns Not Dependent on Keys. If you have data elements that are not directly related to the primary key of the table, these elements should be removed to their own data table. Only store data elements that are directly related to the primary key of the table. This particularly includes derived data or other calculations.
Rule 4: Isolate Independent Multiple Relationships. Use this rule to improve database design when you are dealing with more than one one-to-many relationship in the database. Before you add a new field to a table, ask yourself whether this field is really dependent upon the other fields in the table. If not, create a new table with the independent data.
Rule 5: Isolate Related Multiple Relationships. Use this rule to improve database design when you are dealing with more than one many-to-many relationship in the database. If you have database rules that require multiple references to the same field or sets of fields, isolate the fields into smaller tables and construct one or more link tables that contain the required constraints that enforce database integrity.

Quiz

1. Is it a good idea to optimize your database strictly for speed?

2. What is meant by the term First Normal Form?

3. Explain how the second rule of data normalization differs from the first rule of normalization.

4. Should you include fields in a data table that are the calculated results of other fields in the same table?

5. When would you invoke the fourth rule of data normalization?

6. When would you invoke the fifth rule of data normalization?

Exercises

1. As a computer consultant, you have landed a contract to build a customer tracking system for your local garage. After several days of interviews with the owner, mechanics, and staff members, you have determined that the following data fields should be included in your database. Many of the customers of this garage have more than one automobile. Therefore, you are requested to leave room for tracking two cars per customer.
Use these fields: CustomerID, CustomerName, Address, City, State, Zip, Phone, SerialNumber, License, VehicleType1, Make1, Model1, Color1, Odometer1, VehicleType2, Make2, Model2, Color2, Odometer2.
Optimize this data into tables using the rules of data normalization discussed in today's lesson. Identify all primary and foreign keys.

2. Write the SQL statements that create the tables you designed in Exercise 1.

2019 Soft Lookup Corp. Privacy Statement