|
table1 and table2 are different data tables in the same database. columnA represents a single column in each of the tables. Use Visdata to enter and execute the following SQL statement. Compare your result set to the one in Figure 8.11. SELECT Titles.Title, Publishers.Name FROM Publishers, Titles WHERE Publishers.PubID =Titles.PubID
This is the first SQL statement you have encountered today that lists columns from two different tables. When selecting columns from more than one table, it is good programming practice to precede the column name with the table name and join the two with the period (.). As long as the column name is unique among all columns in the tables from which you are selecting, SQL does not require you to use the table.column syntax. But it is a good habit to do so, especially when you are building SQL statements in Visual Basic code. You should also notice that the WHERE clause comparison columns (Publishers.PubID and Titles.PubID) were not included in the SELECT portion of the statement. You do not have to include the column in the SELECT portion of the statement to use it in the WHERE portion of the statement, as long as the column already exists in the underlying table. Combining tables using the WHERE clause always returns a nonupdateable result set. You cannot update the columns in a view created in this manner. If you want to link tables together and also be able to update the underlying tables for that view, you need to use the JOIN clause, which is covered later today. You can combine the link-type and limit-type versions of the WHERE clause in a single SQL SELECT_FROM statement. Execute the following statement and compare your results to those in Figure 8.12.
The preceding SQL statement selects only those records in which the PubID columns match and the PubID values are between 5 and 10. You can use the WHERE clause to link more than two data tables. The linking column for table1 and table2 does not have to be the same column for table2 and table3. Execute the following statement and review your results against those in Figure 8.13.
You might have noticed that SQL assigns column names to the result sets. There are times when these assigned names can be misleading or incomplete. You can use the AS keyword to rename the columns in the result set. The following SQL statement is one example of using the AS keyword in the SELECT statement to rename the column headers of the result set. This renaming does not affect the original column names in the underlying tables. Execute the following SQL statement and compare your results to those in Figure 8.14.
Now that you know how to use the SELECT_FROM statement to select the desired rows and columns from data tables, read about how to use SQL functions to calculate and manipulate data within your selected columns and rows. SQL Aggregate FunctionsThe SQL standards define a core set of functions that are present in all SQL-compliant systems. These functions are known as aggregate functions. Aggregate functions are used to quickly return computed results of numeric data stored in a column. The SQL aggregate functions available through the Microsoft Access Jet database engine are
The following SQL statement illustrates all five of the SQL aggregate functions. Enter and execute this statement, and check your results against Figure 8.15. SELECT COUNT(Units) AS UnitCount, AVG(Units) AS UnitAvg, SUM(Units) AS UnitSum, MIN(Units) AS UnitMin, MAX(Units) AS UnitMax FROM guideSales
SELECT COUNT(Units) AS UnitCount, AVG(Units) AS UnitAvg, SUM(Units) AS UnitSum, MIN(Units) AS UnitMin, MAX(Units) AS UnitMax FROM guideSales WHERE Qtr = 1
|
SELECT Name,
City+","+StateProv+" "+Zip AS ADDRESS FROM Publishers |
Figure 8.18. Using Visual
Basic syntax to combine columns.
You can also use Visual Basic functions as part of the WHERE clause in an SQL
statement. The following example (Figure 8.19) returns only rows that have the letter a as
the second character in the Name column.
SELECT Name FROM Publishers WHERE Mid$(Name,2,1)="a"
Figure 8.19. Using Visual
Basic functions in an SQL WHERE clause.
Even though using familiar Visual Basic functions and syntax is very handy, it has its
drawbacks. Chief among them is the fact that after you create an SQL statement that uses
Visual Basic-specific portions, your code is no longer portable. If you ever move the SQL
statements to another database engine (such as SQL Server), you must remove the Visual
Basic-specific portions of the SQL statements and replace them with something else that
will work with the database engine you are using. This will not be an issue if you plan to
stick with the Microsoft Access Jet engine for all your database access.
Another possible drawback that you'll encounter if you use Visual Basic-specific syntax in your SQL statements is that of speed. Extensive use of Visual Basic-specific code within SQL statements results in a slight performance hit. The speed difference is minor, but it should be considered.
It is better to use as few Visual Basic-specific functions in your SQL statements as possible. You will not limit the portability of your code, and you will not suffer from unduly slow processing of the SQL statements.
NOTE: You can't use user-defined functions within your SQL statements when you use the Microsoft Access Jet database engine from within Visual Basic. You can only use the built-in SQL functions and the predefined Visual Basic functions.
Now that you know how to create basic SQL SELECT_FROM statements and you know how to use the built-in SQL functions, return to the basic SELECT_FROM statement and add a few more enhancements to your SQL tool kit.
There are times when you select data from a table that has more than one occurrence of the rows you are trying to collect. For example, you want to get a list of all the customers that have at least one order on file in the Orders table. The problem is that some customers have several orders in the table. You don't want to see those names appear more than once in your result set. You can use the DISTINCT keyword to make sure that you do not get duplicates of the same customer in your result set.
Enter and execute the following statement. As a test, execute the same SQL statement without the DISTINCT clause and compare the result sets. Refer to Figure 8.20 as an example.
SELECT DISTINCT AUID FROM Titles ORDER BY AUID
Figure 8.20. Using the DISTINCT
keyword to remove duplicates from a result set.
If you include more than one column in the SELECT list, all columns are used to
evaluate the uniqueness of the row. Execute and compare the result sets of the following
two SQL statements. Refer to Figure 8.21 as a guide.
SELECT DISTINCT Title
FROM guideSales
SELECT DISTINCT Title, Units FROM guideSales
Figure 8.21. Using DISTINCT
on multiple columns.
Notice that the first SQL statement returns a single record for each Title in the data
table. The second SQL statement returns more records for each Title because there are
distinct Units values for each Title.
There are also times when you want to collect data on all rows that are distinct in any of the fields. Instead of using the DISTINCT keyword and listing all the fields in the table, you can use the DISTINCTROW keyword. The following SQL statement (see Figure 8.22) uses DISTINCTROW to return the same records as the SQL statement in the previous example.
SELECT DISTINCTROW * FROM guideSales ORDER BY Title
Figure 8.22. Using DISTINCTROW
in an SQL statement.
Both the DISTINCT and DISTINCTROW keywords enable you to limit the
contents of the result set based on the uniqueness of one or more columns in the data
table. In the next section, you'll learn how you can limit the contents of the result set
to the records with the highest numeric values in selected columns.
You can use the TOP n or TOP n PERCENT SQL keywords to limit the number of records in your result set. Suppose you want to get a list of the five top-selling guides in a data table. You can use the TOP n clause to get just that. TOP n returns the first n number of records. If you have two records of the same value, SQL returns both records. For the previous example, if the fifth and sixth records were both equal, the result set would contain six records, not just five.
When you use the TOP clause, you must also use the ORDER BY clause to make sure that your result set is sorted. If you do not use the ORDER BY clause, you receive an arbitrary set of records because SQL first executes the ORDER BY clause and then selects the TOP n records you requested. Without the ORDER BY clause, it is quite likely that you will not get the results you intended. If a WHERE clause is present, SQL performs the WHERE clause, the ORDER BY clause, and then the TOP n clause. As you can see, failure to use the ORDER BY clause most certainly returns garbage in your result set (see Figure 8.23).
SELECT TOP 5 * FROM guideSales ORDER BY Sales DESC
Figure 8.23. Using TOP
n to limit the result set.
Notice that the preceding example uses the DESC keyword in the ORDER BY
clause. Whether you use the DESC or ASC ORDER BY format, the result set
still contains the first n records in the table (based on the sort). Also note that the
result set contains more than five records, because several records have the same Sales
value.
The TOP n PERCENT version returns not the top five records, but the top five percent of the records in the underlying data table. The results of the following SQL statement (see Figure 8.24) contain several more records than the result set shown previously.
SELECT TOP 5 PERCENT * FROM guideSales ORDER BY Sales
Figure 8.24. Using TOP
n PERCENT to limit the result set.
One of the more powerful SQL clauses is the GROUP BY_HAVING clause. This clause lets you use the SQL aggregate functions discussed earlier today to easily create result sets that contain a list of subtotals of the underlying data table. For example, you might want to be able to create a data set that contains a list of Titles and the total Units sold, by Title. The following SQL statement (see Figure 8.25) can do that:
SELECT Title, SUM(Units) AS UnitsSold FROM guideSales GROUP BY Title
Figure 8.25. Using GROUP
BY to create subtotals.
The GROUP BY clause requires that all numeric columns in the SELECT column list be a part of an SQL aggregate function (SUM, AVG, MIN, MAX, and COUNT). Also, you cannot use the * as part of the SELECT column list when you use the GROUP BY clause.
What if you wanted to get a list of all the guide titles that sold more than 100 units for the year? The first thought would be to use a WHERE clause:
SELECT Titles, SUM(Units) AS UnitsSold WHERE Sum(Units) > 100 GROUP BY Units
However, if you try to run this SQL statement, you discover that SQL does not allow aggregate functions within the WHERE clause. You really want to use a WHERE clause after the aggregate function has created a resulting column. In plain English, the query needs to perform the following steps:
Luckily, you don't have to actually write all this in a series of SQL statements. You can get the same results by adding the HAVING keyword to the GROUP BY clause. The HAVING clause acts the same as the WHERE clause, except that the HAVING clause acts upon the resulting columns created by the GROUP BY clause, not the underlying columns. The following SQL statement (see Figure 8.26) returns only the Titles that have sold more than 100 units in the last year:
SELECT Title, SUM(Units) AS UnitsSold FROM guideSales GROUP BY Title HAVING SUM(Units)>100
Figure 8.26. Using the HAVING
clause with GROUP BY.
The columns used in the HAVING clause do not have to be the same columns listed in the SELECT clause. The contents of the HAVING clause follow the same rules as those for the contents of the WHERE clause. You can use logical operators AND, OR, and NOT, and you can include VB-specific functions as part of the HAVING clause. The following SQL statement (see Figure 8.27) returns sales in dollars for all titles that have more than 100 units sold and whose titles have the letter a as the second letter in the title:
SELECT Title, SUM(Sales) AS SalesAmt FROM guideSales GROUP BY Title HAVING SUM(Units)>100 AND Mid$(Title,2,1)="a"
Figure 8.27. Using a
complex HAVING clause.
The JOIN clause is a very powerful optional SQL clause. Remember when you learned how to link two tables together using WHERE table1.column1 = table2.column1? The only problem with using the WHERE clause is that the result set is not updateable. What if you need to create an updateable result set that contains columns from more than one table? You use JOIN.
There are three types of JOIN clauses in Microsoft Access Jet SQL:
The following sections describe each form of JOIN and how it is used in your programs. The INNER JOIN The INNER JOIN can be used to create a result set that contains only those records that have an exact match in both tables. Enter and execute the following SQL statement (see Figure 8.28):
SELECT PublisherComments.Comments, Publishers.Name, Publishers.StateProv FROM PublisherComments INNER JOIN Publishers ON PublisherComments.PubID = Publishers.PubID
Figure 8.28. Using the INNER
JOIN SQL clause.
NOTE: PublisherComments was used as the name for the table used in the preceding example. When creating a Microsoft Access database, we could easily have named the table "Publisher Comments" (note the space), in which case, we would have had to enclose the table name in brackets in the preceding query, like so:
[Publisher Comments]
This is a good time to point out that it is a bad idea to use embedded spaces as table names. Not only do you need to include brackets around the name in a query, but also the Wizard available to up-size Access data files to Microsoft SQL Server does not work successfully on tables with spaces embedded in their names.
The preceding SQL statement returns all the records from the Publisher table that have a PubID that matches a PubID in the [Publisher Comments] table. This type of JOIN returns all the records that reside within both tables--thus, an INNER JOIN.
This is handy if you have two tables that you know are not perfectly matched against a single column and you want to create a result set that contains only those rows that match on both sides. The INNER JOIN also works well when you have a parent table (such as a CustomerTable) and a child table (such as a ShipAddressTable) with a one-to-one relationship. Using an INNER JOIN, you can quickly create a list of all CustomerTable records that have a corresponding ShipAddressTable record on file.
INNER JOINs work best when you create a JOIN on a column that is unique in both tables. If you use a table that has more than one occurrence of the JOIN column, you'll get a row for each occurrence in the result set. This might be undesirable. The following example illustrates the point (see Figure 8.29):
SELECT Titles.Title,guideSales.Units FROM Titles INNER JOIN guideSales ON Titles.Title = guideSales.Title
In the previous example, the table guideSales has numerous entries for each title (one for each quarter recorded), so the result of the INNER JOIN returns each Title multiple times. The LEFT JOIN The LEFT JOIN is one of the two outer joins in the SQL syntax. Although INNER JOIN returns only those rows that have corresponding values in both tables, the outer joins return all the records from one side of the join, whether or not there is a corresponding match on the other side of the join. The LEFT JOIN clause returns all the records from the first table on the list (the left-most table) and any records on the right side of the table that have a matching column value. Figure 8.30 shows the same SQL query that was shown in Figure 8.28.
SELECT Publishers.Name,PublisherComments.Comments FROM Publishers LEFT JOIN PublisherComments ON Publishers.PubID = PublisherComments.PubID
Figure 8.29. Using an INNER
JOIN on a non-unique column.
Figure 8.30. Using the LEFT JOIN clause.
Notice that the result set has blank comments in several places. The LEFT JOIN is
handy when you want a list of all the records in the master table and any records in the
dependent table that are on file. The RIGHT JOIN The RIGHT JOIN works the same as
the LEFT JOIN except that the result set is based on the second (right-hand)
table in the JOIN statement. You can use the RIGHT JOIN in the same
manner you would use the LEFT JOIN.
Another powerful SQL clause is the UNION clause. This SQL keyword lets you create a union between two tables or SQL queries that contain similar, but unrelated, data. A UNION query is handy when you want to collate information from two queries into a single result set. Because UNION queries return nonupdateable result sets, they are good for producing on-screen displays, reports, and base data for generating graphs and charts.
For example, if you have a customer table and a vendor table, you might want to get a list of all vendors and customers who live in the state of Ohio. You could write an SQL statement to select the rows from the Customers table. Then write an SQL statement to select the rows from the Vendors table. Combine the two SQL statements into a single SQL phrase using the UNION keyword. Now you can get a single result set that contains the results of both queries.
In the following SQL statement (see Figure 8.31), you are creating a result set that contains all Publishers and Buyers located in the state of New York.
SELECT Name, City, StateProv, Zip FROM
Publishers WHERE StateProv='NY' UNION SELECT Name, City, StateProv, Zip FROM Buyers WHERE StateProv='NY' ORDER BY Zip |
Figure 8.31. An example
of a UNION query.
A note of caution when using the UNION query. To keep the same number of data
columns, SQL does a data type override to insert results into columns that are not the
same data types. The UNION query uses the column names of the first SQL query in
the statement and creates a result set that displays the data even if data types must be
altered to do so.
Each portion of the UNION query must have the same number of columns. If the first query results in six displayable columns, the query on the other side of the UNION statement must also result in six columns. If there is not an equal number of columns on each side of a UNION query, you receive an SQL error message.
You can also use UNION queries on the same table. The following SQL statement (see Figure 8.32) shows how you can use SQL to return the top-selling titles and the bottom-selling titles in the same result set:
SELECT SUM(Sales) AS TotSales,Title
FROM guideSales GROUP BY Title HAVING SUM(Sales)>4000 UNION SELECT SUM(Sales) AS TotSales,Title FROM guideSales GROUP BY Title HAVING SUM(Sales)<1000 ORDER BY TotSales |
Figure 8.32. Using UNION
on the same data table.
You can use Visual Basic stored queries (QueryDefs) as replacements for the complete SQL
statement on either side of a UNION keyword. You can also link several SQL
queries together with successive UNION keywords.
The last SQL statement covered today is the TRANSFORM_PIVOT statement. This is a very powerful SQL tool that enables you to create result sets that contain summarized data in a form known as a crosstab query. Instead of trying to explain a crosstab query, let's look at a sample problem.
Suppose you have a data table that contains information on guide titles and sales by quarter (sound familiar?). You have been asked to produce a view set that lists each guide title down the left side and each quarter across the top with the sales figures for each quarter to the right of the guide title. The only problem is that your data table has a single record for each quarter for each guide. For example, if guide A has sales in three quarters this year, you have three rows in your data table. If guide B has sales for four quarters, you have four rows, and so on. How can you produce a view that lists the quarters as columns instead of as rows?
You can accomplish this with a complicated set of subsequent SQL statements that produces temporary views, merges them together, and so on. Thanks to the folks who invented the Microsoft Access Jet database engine, however, you can use the TRANSFORM_PIVOT statement instead. You can produce the entire result set in one SQL statement using TRANSFORM_PIVOT. The following SQL statement shows how this can be done. See Figure 8.33 for a sample result set.
TRANSFORM SUM(guideSales.Sales) SELECT Title FROM guideSales GROUP BY Title PIVOT guideSales.Qtr
Figure 8.33. The TRANSFORM_PIVOT
example.
Notice the form of the TRANSFORM_PIVOT statement. It starts with the TRANSFORM keyword, not the SELECT keyword. Notice that a single SQL aggregate function immediately follows the TRANSFORM keyword. This is required, even if no real totaling will be performed. After the TRANSFORM aggregate function clause, you have the standard SELECT_FROM clause. Notice that the preceding example did not include the guidesales.Sales column in the SELECT statement because it will be produced by the TRANSFORM_PIVOT clause automatically. The GROUP BY clause is required in order to tell SQL how to treat the successive rows that will be handled for each guideSales.Title. Finally, add the PIVOT keyword, followed by the column that you want to use, as the set of headers that follow out to the right of the GROUP BY clause.
TRANSFORM_PIVOT uses the data in the PIVOT column as column headers in the result set. You will have as many columns in your result set as you have unique values in your PIVOT column. This is important to understand. Using columns that contain a limited set of data (such as months of the year) produces valuable result sets. However, using a column that contains unique data (such as the CustomerID column) produces a result set with an unpredictable number of columns.
The nice thing about TRANSFORM_PIVOT is that it is easy to produce several different views of the same data by just changing the PIVOT column. For example, what if you wanted to see the guide sales results by guideSales.SaleRep instead of by guideSales.Qtr? All you have to do is change the PIVOT field. See the following code example and Figure 8.34.
TRANSFORM SUM(guideSales.Sales) SELECT Title FROM guideSales GROUP BY Title PIVOT guideSales.SalesRep
Figure 8.34. Changing the
PIVOT field.
Notice, in Figure 8.34, that you can see a column with the header <>. When Microsoft Access Jet ran the SQL statement, it discovered some records that had no value in the guideSales.SaleRep column. SQL automatically created a new column (<>) to hold these records and make sure they were not left out of the result set.
Even though TRANSFORM_PIVOT is a powerful SQL tool, there is one drawback to its widespread use in your programs. The TRANSFORM_PIVOT clause is not an ANSI-SQL clause. Microsoft added this clause as an extension of the ANSI-SQL command set. If you use it in your programs, you will not be able to port your SQL statements to other back-end databases that do not support the TRANSFORM_PIVOT SQL clause. Despite this drawback, you will find TRANSFORM_PIVOT a very valuable SQL tool when it comes to producing result sets for summary reports, data graphs, and charts.
Visual Basic 5 allows for the use of nested SELECT queries. These are often referred to as SQL subqueries and are literally queries contained within queries. Nested SELECT queries can prove to be useful when you want to perform a query based upon the results of another query.
To demonstrate the use of a SQL subquery, let's start Visdata and open the SUBQRY.MDB database that can be found in the \\TYSDBVB5\SOURCE\DATA directory of the CD that shipped with this guide. This database contains a sample listing of authors, publishers, and guide sales activity (notice that this database is very similar to the guideS5.MDB database) for a fictitious publisher. Our goal in this exercise is to extract the phone numbers of all the authors who sold more than 500 guides in the first quarter.
As you examine the table structure of this database in Visdata, you notice the guideSales table contains the sales records by quarter, but the phone number is contained in the Authors table. We therefore need to query the guideSales table to find all the authors who sold more than 500 guides in the first quarter, and then use that result set to find the writers' phone numbers in the Authors table. We need to build a nested SELECT query.
To do this, enter the following code into the SQL Statement window of Visdata. Execute your statement. The result set should look similar to Figure 8.35.
SELECT * FROM Authors WHERE AUID IN (SELECT AUID FROM guidesales WHERE Sales>500 AND Qtr=1)
Figure 8.35. The results
of an SQL subquery.
An SQL subquery has three main components--the comparison, the expression, and the SQL statement. The comparison in our example is the SELECT FROM Authors query. The expression is the IN keyword. The SQL statement is the SELECT statement within the parentheses.
NOTE: The SQL statement on which you base the comparison statement must be a SELECT statement. This statement must also be enclosed in parentheses and is referred to as a subquery.
In the exercise, the SELECT statement contained within the parentheses (the sqlstatement, or subquery) is executed first to determine which authors sold more than 500 guides in the first quarter. The SELECT statement outside of the parentheses (the main query) is then executed upon the result set created by the subquery. The IN keyword instructs the SELECT FROM Authors statement to take only those records that were extracted by the subquery.
Other keywords that can be used in the expression include ANY and ALL. Also, numeric expression such as > and < can be used in conjunction with the keyword to make comparisons.
For example, if you use the syntax > ANY in comparing the main query with the subquery (WHERE AUID > ANY), your result set displays all records from the main query that have a value greater than any value extracted from the subquery. Using > ALL (WHERE AUID > All) extracts only those records that are greater in value than every record extracted by the subquery.
Without the ability to use nested SQL statements, the preceding exercise would have required you to perform a JOIN on the two tables, or build a table to store the subqueries result set, and then execute the main query on the table. The use of nested SQL SELECT statements can be a great time saver.
Today you learned how to create basic SQL statements that select data from existing tables. You learned that the most fundamental form of the SQL statement is the SELECT_FROM clause. This clause is used to select one or more columns from a table and display the results of that statement in a result set, or view.
You also learned about the optional clauses that you can add to the SELECT_FROM clause:
You also learned about additional SQL keywords that you can use to control the contents of the result set:
Finally, you learned about the SQL subquery, and how to nest SELECT statements to extract data from a table based upon the results of another SELECT statement.
As a corporate MIS staff member, you are given the task of assisting the Accounting Department in extracting data from its accounts payable and accounts receivable systems. As part of your analysis, you determine that these systems possess the following data tables and fields:
CustomerMaster
CustomerIDName
Address
City
State
Zip
Phone
CustomerTypeCustomerType
CustomerType
DescriptionOpenInvoice
InvoiceNoCustomerID
Date
Description
AmountSuppliers
SupplierIDName
Address
City
State
Zip
Phone
Use this information to answer the questions that follow: