Today's topic is clauses--not the kind that distribute presents during the holidays, but the ones you use with a SELECT statement. By the end of the Chapter you will understand and be able to use the following clauses:
To get a feel for where these functions fit in, examine the general syntax for a SELECT statement:
SELECT [DISTINCT | ALL] { * | { [schema.]{table | view | snapshot}.* | expr } [ [AS] c_alias ] [, { [schema.]{table | view | snapshot}.* | expr } [ [AS] c_alias ] ] ... } FROM [schema.]{table | view | snapshot}[@dblink] [t_alias] [, [schema.]{table | view | snapshot}[@dblink] [t_alias] ] ... [WHERE condition ] [GROUP BY expr [, expr] ... [HAVING condition] ] [{UNION | UNION ALL | INTERSECT | MINUS} SELECT command ] [ORDER BY {expr|position} [ASC | DESC] [, {expr|position} [ASC | DESC]] ...]
NOTE: In my experience with SQL, the ANSI standard is really more of an ANSI "suggestion." The preceding syntax will generally work with any SQL engine, but you may find some slight variations.
NOTE: You haven't yet had to deal with a complicated syntax diagram. Because many people find syntax diagrams more puzzling than illuminating when learning something new, this guide has used simple examples to illustrate particular points. However, we are now at the point where a syntax diagram can help tie the familiar concepts to today's new material.
Don't worry about the exact syntax--it varies slightly from implementation to implementation anyway. Instead, focus on the relationships. At the top of this statement is SELECT, which you have used many times in the last few days. SELECT is followed by FROM, which should appear with every SELECT statement you typed. (You learn a new use for FROM tomorrow.) WHERE, GROUP BY, HAVING, and ORDER BY all follow. (The other clauses in the diagram--UNION, UNION ALL, INTERSECT, and MINUS--were covered in Chapter 3, "Expressions, Conditions, and Operators.") Each clause plays an important part in selecting and manipulating data.
NOTE: We have used two implementations of SQL to prepare today's examples. One implementation has an SQL> prompt and line numbers (Personal Oracle7), and the other (Borland's ISQL) does not. You will also notice that the output displays vary slightly, depending on the implementation.
Using just SELECT and FROM, you are limited to returning every row in a table. For example, using these two key words on the CHECKS table, you get all seven rows:
SQL> SELECT * 2 FROM CHECKS;
CHECK# PAYEE AMOUNT REMARKS -------- -------------------- -------- ------------------ 1 Ma Bell 150 Have sons next time 2 Reading R.R. 245.34 Train to Chicago 3 Ma Bell 200.32 Cellular Phone 4 Local Utilities 98 Gas 5 Joes Stale $ Dent 150 Groceries 16 Cash 25 Wild Night Out 17 Joans Gas 25.1 Gas
7 rows selected.
With WHERE in your vocabulary, you can be more selective. To find all the checks you wrote with a value of more than 100 dollars, write this:
SQL> SELECT * 2 FROM CHECKS 3 WHERE AMOUNT > 100;
The WHERE clause returns the four instances in the table that meet the required condition:
CHECK# PAYEE AMOUNT REMARKS -------- -------------------- -------- ------------------ 1 Ma Bell 150 Have sons next time 2 Reading R.R. 245.34 Train to Chicago 3 Ma Bell 200.32 Cellular Phone 5 Joes Stale $ Dent 150 Groceries
WHERE can also solve other popular puzzles. Given the following table of names and locations, you can ask that popular question, Where's Waldo?
SQL> SELECT * 2 FROM PUZZLE;
NAME LOCATION -------------- -------------- TYLER BACKYARD MAJOR KITCHEN SPEEDY LIVING ROOM WALDO GARAGE LADDIE UTILITY CLOSET ARNOLD TV ROOM
6 rows selected.
SQL> SELECT LOCATION AS "WHERE'S WALDO?" 2 FROM PUZZLE 3 WHERE NAME = 'WALDO';
WHERE'S WALDO? -------------- GARAGE
Sorry, we couldn't resist. We promise no more corny queries. (We're saving those for that SQL bathroom humor guide everyone's been wanting.) Nevertheless, this query shows that the column used in the condition of the WHERE statement does not have to be mentioned in the SELECT clause. In this example you selected the location column but used WHERE on the name, which is perfectly legal. Also notice the AS on the SELECT line. AS is an optional assignment operator, assigning the alias WHERE'S WALDO? to LOCATION. You might never see the AS again, because it involves extra typing. In most implementations of SQL you can type
SQL> SELECT LOCATION "WHERE'S WALDO?" 2 FROM PUZZLE 3 WHERE NAME ='WALDO';
and get the same result as the previous query without using AS:
WHERE'S WALDO? -------------- GARAGE
After SELECT and FROM, WHERE is the third most frequently used SQL term.
STARTING WITH is an addition to the WHERE clause that works exactly like LIKE(<exp>%). Compare the results of the following query:
SELECT PAYEE, AMOUNT, REMARKS FROM CHECKS WHERE PAYEE LIKE('Ca%');
PAYEE AMOUNT REMARKS ==================== =============== ============== Cash 25 Wild Night Out Cash 60 Trip to Boston Cash 34 Trip to Dayton
with the results from this query:
SELECT PAYEE, AMOUNT, REMARKS FROM CHECKS WHERE PAYEE STARTING WITH('Ca');
PAYEE AMOUNT REMARKS ==================== =============== ============== Cash 25 Wild Night Out Cash 60 Trip to Boston Cash 34 Trip to Dayton
The results are identical. You can even use them together, as shown here:
SELECT PAYEE, AMOUNT, REMARKS FROM CHECKS WHERE PAYEE STARTING WITH('Ca') OR REMARKS LIKE 'G%';
PAYEE AMOUNT REMARKS ==================== =============== =============== Local Utilities 98 Gas Joes Stale $ Dent 150 Groceries Cash 25 Wild Night Out Joans Gas 25.1 Gas Cash 60 Trip to Boston Cash 34 Trip to Dayton Joans Gas 15.75 Gas
WARNING: STARTING WITH is a common feature of many implementations of SQL. Check your implementation before you grow fond of it.
From time to time you will want to present the results of your query in some kind of order. As you know, however, SELECT FROM gives you a listing, and unless you have defined a primary key (see Chapter 10, "Creating Views and Indexes"), your query comes out in the order the rows were entered. Consider a beefed-up CHECKS table:
SQL> SELECT * FROM CHECKS;
CHECK# PAYEE AMOUNT REMARKS -------- -------------------- -------- ------------------ 1 Ma Bell 150 Have sons next time 2 Reading R.R. 245.34 Train to Chicago 3 Ma Bell 200.32 Cellular Phone 4 Local Utilities 98 Gas 5 Joes Stale $ Dent 150 Groceries 16 Cash 25 Wild Night Out 17 Joans Gas 25.1 Gas 9 Abes Cleaners 24.35 X-Tra Starch 20 Abes Cleaners 10.5 All Dry Clean 8 Cash 60 Trip to Boston 21 Cash 34 Trip to Dayton 11 rows selected.
You're going to have to trust me on this one, but the order of the output is exactly the same order as the order in which the data was entered. After you read Chapter 8, "Manipulating Data," and know how to use INSERT to create tables, you can test how data is ordered by default on your own.
The ORDER BY clause gives you a way of ordering your results. For example, to order the preceding listing by check number, you would use the following ORDER BY clause:
SQL> SELECT * 2 FROM CHECKS 3 ORDER BY CHECK#;
CHECK# PAYEE AMOUNT REMARKS -------- -------------------- -------- ------------------ 1 Ma Bell 150 Have sons next time 2 Reading R.R. 245.34 Train to Chicago 3 Ma Bell 200.32 Cellular Phone 4 Local Utilities 98 Gas 5 Joes Stale $ Dent 150 Groceries 8 Cash 60 Trip to Boston 9 Abes Cleaners 24.35 X-Tra Starch 16 Cash 25 Wild Night Out 17 Joans Gas 25.1 Gas 20 Abes Cleaners 10.5 All Dry Clean 21 Cash 34 Trip to Dayton
11 rows selected.
Now the data is ordered the way you want it, not the way in which it was entered. As the following example shows, ORDER requires BY; BY is not optional.
SQL> SELECT * FROM CHECKS ORDER CHECK#; SELECT * FROM CHECKS ORDER CHECK# * ERROR at line 1: ORA-00924: missing BY keyword
What if you want to list the data in reverse order, with the highest number or letter first? You're in luck! The following query generates a list of PAYEEs that stars at the end of the alphabet:
SQL> SELECT * 2 FROM CHECKS 3 ORDER BY PAYEE DESC; CHECK# PAYEE AMOUNT REMARKS -------- -------------------- -------- ------------------ 2 Reading R.R. 245.34 Train to Chicago 1 Ma Bell 150 Have sons next time 3 Ma Bell 200.32 Cellular Phone 4 Local Utilities 98 Gas 5 Joes Stale $ Dent 150 Groceries 17 Joans Gas 25.1 Gas 16 Cash 25 Wild Night Out 8 Cash 60 Trip to Boston 21 Cash 34 Trip to Dayton 9 Abes Cleaners 24.35 X-Tra Starch 20 Abes Cleaners 10.5 All Dry Clean 11 rows selected.
The DESC at the end of the ORDER BY clause orders the list in descending order instead of the default (ascending) order. The rarely used, optional keyword ASC appears in the following statement:
SQL> SELECT PAYEE, AMOUNT 2 FROM CHECKS 3 ORDER BY CHECK# ASC;
PAYEE AMOUNT -------------------- --------- Ma Bell 150 Reading R.R. 245.34 Ma Bell 200.32 Local Utilities 98 Joes Stale $ Dent 150 Cash 60 Abes Cleaners 24.35 Cash 25 Joans Gas 25.1 Abes Cleaners 10.5 Cash 34 11 rows selected.
The ordering in this list is identical to the ordering of the list at the beginning of the section (without ASC) because ASC is the default. This query also shows that the expression used after the ORDER BY clause does not have to be in the SELECT statement. Although you selected only PAYEE and AMOUNT, you were still able to order the list by CHECK#.
You can also use ORDER BY on more than one field. To order CHECKS by PAYEE and REMARKS, you would query as follows:
SQL> SELECT * 2 FROM CHECKS 3 ORDER BY PAYEE, REMARKS;
CHECK# PAYEE AMOUNT REMARKS -------- -------------------- -------- ------------------ 20 Abes Cleaners 10.5 All Dry Clean 9 Abes Cleaners 24.35 X-Tra Starch 8 Cash 60 Trip to Boston 21 Cash 34 Trip to Dayton 16 Cash 25 Wild Night Out 17 Joans Gas 25.1 Gas 5 Joes Stale $ Dent 150 Groceries 4 Local Utilities 98 Gas 3 Ma Bell 200.32 Cellular Phone 1 Ma Bell 150 Have sons next time 2 Reading R.R. 245.34 Train to Chicago
Notice the entries for Cash in the PAYEE column. In the previous ORDER BY, the CHECK#s were in the order 16, 21, 8. Adding the field REMARKS to the ORDER BY clause puts the entries in alphabetical order according to REMARKS. Does the order of multiple columns in the ORDER BY clause make a difference? Try the same query again but reverse PAYEE and REMARKS:
SQL> SELECT * 2 FROM CHECKS 3 ORDER BY REMARKS, PAYEE;
CHECK# PAYEE AMOUNT REMARKS -------- -------------------- -------- -------------------- 20 Abes Cleaners 10.5 All Dry Clean 3 Ma Bell 200.32 Cellular Phone 17 Joans Gas 25.1 Gas 4 Local Utilities 98 Gas 5 Joes Stale $ Dent 150 Groceries 1 Ma Bell 150 Have sons next time 2 Reading R.R. 245.34 Train to Chicago 8 Cash 60 Trip to Boston 21 Cash 34 Trip to Dayton 16 Cash 25 Wild Night Out 9 Abes Cleaners 24.35 X-Tra Starch 11 rows selected.
As you probably guessed, the results are completely different. Here's how to list one column in alphabetical order and list the second column in reverse alphabetical order:
SQL> SELECT * 2 FROM CHECKS 3 ORDER BY PAYEE ASC, REMARKS DESC; CHECK# PAYEE AMOUNT REMARKS -------- -------------------- -------- ------------------ 9 Abes Cleaners 24.35 X-Tra Starch 20 Abes Cleaners 10.5 All Dry Clean 16 Cash 25 Wild Night Out 21 Cash 34 Trip to Dayton 8 Cash 60 Trip to Boston 17 Joans Gas 25.1 Gas 5 Joes Stale $ Dent 150 Groceries 4 Local Utilities 98 Gas 1 Ma Bell 150 Have sons next time 3 Ma Bell 200.32 Cellular Phone 2 Reading R.R. 245.34 Train to Chicago 11 rows selected.
In this example PAYEE is sorted alphabetically, and REMARKS appears in descending order. Note how the remarks in the three checks with a PAYEE of Cash are sorted.
TIP: If you know that a column you want to order your results by is the first column in a table, then you can type ORDER BY 1 in place of spelling out the column name. See the following example.
SQL> SELECT * 2 FROM CHECKS 3 ORDER BY 1; CHECK# PAYEE AMOUNT REMARKS -------- -------------------- -------- ------------------ 1 Ma Bell 150 Have sons next time 2 Reading R.R. 245.34 Train to Chicago 3 Ma Bell 200.32 Cellular Phone 4 Local Utilities 98 Gas 5 Joes Stale $ Dent 150 Groceries 8 Cash 60 Trip to Boston 9 Abes Cleaners 24.35 X-Tra Starch 16 Cash 25 Wild Night Out 17 Joans Gas 25.1 Gas 20 Abes Cleaners 10.5 All Dry Clean 21 Cash 34 Trip to Dayton 11 rows selected.
This result is identical to the result produced by the SELECT statement that you used earlier today:
SELECT * FROM CHECKS ORDER BY CHECK#;
On Chapter 3 you learned how to use aggregate functions (COUNT, SUM, AVG, MIN, and MAX). If you wanted to find the total amount of money spent from the slightly changed CHECKS table, you would type:
SELECT * FROM CHECKS;
Here's the modified table:
CHECKNUM PAYEE AMOUNT REMARKS ======== =========== =============== ====================== 1 Ma Bell 150 Have sons next time 2 Reading R.R. 245.34 Train to Chicago 3 Ma Bell 200.33 Cellular Phone 4 Local Utilities 98 Gas 5 Joes Stale $ Dent 150 Groceries 16 Cash 25 Wild Night Out 17 Joans Gas 25.1 Gas 9 Abes Cleaners 24.35 X-Tra Starch 20 Abes Cleaners 10.5 All Dry Clean 8 Cash 60 Trip to Boston 21 Cash 34 Trip to Dayton 30 Local Utilities 87.5 Water 31 Local Utilities 34 Sewer 25 Joans Gas 15.75 Gas
Then you would type:
SELECT SUM(AMOUNT) FROM CHECKS; SUM =============== 1159.87
This statement returns the sum of the column AMOUNT. What if you wanted to find out how much you have spent on each PAYEE? SQL helps you with the GROUP BY clause. To find out whom you have paid and how much, you would query like this:
SELECT PAYEE, SUM(AMOUNT) FROM CHECKS GROUP BY PAYEE; PAYEE SUM ==================== =============== Abes Cleaners 34.849998 Cash 119 Joans Gas 40.849998 Joes Stale $ Dent 150 Local Utilities 219.5 Ma Bell 350.33002 Reading R.R. 245.34
The SELECT clause has a normal column selection, PAYEE, followed by the aggregate function SUM(AMOUNT). If you had tried this query with only the FROM CHECKS that follows, here's what you would see:
SELECT PAYEE, SUM(AMOUNT) FROM CHECKS; Dynamic SQL Error -SQL error code = -104 -invalid column reference
SQL is complaining about the combination of the normal column and the aggregate function. This condition requires the GROUP BY clause. GROUP BY runs the aggregate function described in the SELECT statement for each grouping of the column that follows the GROUP BY clause. The table CHECKS returned 14 rows when queried with SELECT * FROM CHECKS. The query on the same table, SELECT PAYEE, SUM(AMOUNT) FROM CHECKS GROUP BY PAYEE, took the 14 rows in the table and made seven groupings, returning the SUM of each grouping.
Suppose you wanted to know how much you gave to whom with how many checks. Can you use more than one aggregate function?
SELECT PAYEE, SUM(AMOUNT), COUNT(PAYEE) FROM CHECKS GROUP BY PAYEE; PAYEE SUM COUNT ==================== =============== =========== Abes Cleaners 34.849998 2 Cash 119 3 Joans Gas 40.849998 2 Joes Stale $ Dent 150 1 Local Utilities 219.5 3 Ma Bell 350.33002 2 Reading R.R. 245.34 1
This SQL is becoming increasingly useful! In the preceding example, you were able to perform group functions on unique groups using the GROUP BY clause. Also notice that the results were ordered by payee. GROUP BY also acts like the ORDER BY clause. What would happen if you tried to group by more than one column? Try this:
SELECT PAYEE, SUM(AMOUNT), COUNT(PAYEE) FROM CHECKS GROUP BY PAYEE, REMARKS; PAYEE SUM COUNT ==================== =============== =========== Abes Cleaners 10.5 1 Abes Cleaners 24.35 1 Cash 60 1 Cash 34 1 Cash 25 1 Joans Gas 40.849998 2 Joes Stale $ Dent 150 1 Local Utilities 98 1 Local Utilities 34 1 Local Utilities 87.5 1 Ma Bell 200.33 1 Ma Bell 150 1 Reading R.R. 245.34 1
The output has gone from 7 groupings of 14 rows to 13 groupings. What is different about the one grouping with more than one check associated with it? Look at the entries for Joans Gas:
SELECT PAYEE, REMARKS FROM CHECKS WHERE PAYEE = 'Joans Gas'; PAYEE REMARKS ==================== ==================== Joans Gas Gas Joans Gas Gas
You see that the combination of PAYEE and REMARKS creates identical entities, which SQL groups together into one line with the GROUP BY clause. The other rows produce unique combinations of PAYEE and REMARKS and are assigned their own unique groupings.
The next example finds the largest and smallest amounts, grouped by REMARKS:
SELECT MIN(AMOUNT), MAX(AMOUNT) FROM CHECKS GROUP BY REMARKS; MIN MAX =============== =============== 245.34 245.34 10.5 10.5 200.33 200.33 15.75 98 150 150 150 150 34 34 60 60 34 34 87.5 87.5 25 25 24.35 24.35
Here's what will happen if you try to include in the select statement a column that has several different values within the group formed by GROUP BY:
SELECT PAYEE, MAX(AMOUNT), MIN(AMOUNT) FROM CHECKS GROUP BY REMARKS; Dynamic SQL Error -SQL error code = -104 -invalid column reference
This query tries to group CHECKS by REMARK. When the query finds two records with the same REMARK but different PAYEEs, such as the rows that have GAS as a REMARK but have PAYEEs of LOCAL UTILITIES and JOANS GAS, it throws an error.
The rule is, Don't use the SELECT statement on columns that have multiple values for the GROUP BY clause column. The reverse is not true. You can use GROUP BY on columns not mentioned in the SELECT statement. For example:
SELECT PAYEE, COUNT(AMOUNT) FROM CHECKS GROUP BY PAYEE, AMOUNT; PAYEE COUNT ==================== =========== Abes Cleaners 1 Abes Cleaners 1 Cash 1 Cash 1 Cash 1 Joans Gas 1 Joans Gas 1 Joes Stale $ Dent 1 Local Utilities 1 Local Utilities 1 Local Utilities 1 Ma Bell 1 Ma Bell 1 Reading R.R. 1
This silly query shows how many checks you had written for identical amounts to the same PAYEE. Its real purpose is to show that you can use AMOUNT in the GROUP BY clause, even though it is not mentioned in the SELECT clause. Try moving AMOUNT out of the GROUP BY clause and into the SELECT clause, like this:
SELECT PAYEE, AMOUNT, COUNT(AMOUNT) FROM CHECKS GROUP BY PAYEE; Dynamic SQL Error -SQL error code = -104 -invalid column reference
SQL cannot run the query, which makes sense if you play the part of SQL for a moment. Say you had to group the following lines:
SELECT PAYEE, AMOUNT, REMARKS FROM CHECKS WHERE PAYEE ='Cash'; PAYEE AMOUNT REMARKS ==================== =============== =============== Cash 25 Wild Night Out Cash 60 Trip to Boston Cash 34 Trip to Dayton
If the user asked you to output all three columns and group by PAYEE only, where would you put the unique remarks? Remember you have only one row per group when you use GROUP BY. SQL can't do two things at once, so it complains: Error #31: Can't do two things at once.
How can you qualify the data used in your GROUP BY clause? Use the table ORGCHART and try this:
SELECT * FROM ORGCHART;
NAME TEAM SALARY SICKLEAVE ANNUALLEAVE =============== ======== =========== =========== =========== ADAMS RESEARCH 34000.00 34 12 WILKES MARKETING 31000.00 40 9 STOKES MARKETING 36000.00 20 19 MEZA COLLECTIONS 40000.00 30 27 MERRICK RESEARCH 45000.00 20 17 RICHARDSON MARKETING 42000.00 25 18 FURY COLLECTIONS 35000.00 22 14 PRECOURT PR 37500.00 24 24
If you wanted to group the output into divisions and show the average salary in each division, you would type:
SELECT TEAM, AVG(SALARY) FROM ORGCHART GROUP BY TEAM; TEAM AVG =============== =========== COLLECTIONS 37500.00 MARKETING 36333.33 PR 37500.00 RESEARCH 39500.00
The following statement qualifies this query to return only those departments with average salaries under 38000:
SELECT TEAM, AVG(SALARY) FROM ORGCHART WHERE AVG(SALARY) < 38000 GROUP BY TEAM; Dynamic SQL Error -SQL error code = -104 -Invalid aggregate reference
This error occurred because WHERE does not work with aggregate functions. To make this query work, you need something new: the HAVING clause. If you type the following query, you get what you ask for:
SELECT TEAM, AVG(SALARY) FROM ORGCHART GROUP BY TEAM HAVING AVG(SALARY) < 38000;
TEAM AVG =============== =========== COLLECTIONS 37500.00 MARKETING 36333.33 PR 37500.00
HAVING enables you to use aggregate functions in a comparison statement, providing for aggregate functions what WHERE provides for individual rows. Does HAVING work with nonaggregate expressions? Try this:
SELECT TEAM, AVG(SALARY) FROM ORGCHART GROUP BY TEAM HAVING SALARY < 38000; TEAM AVG =============== =========== PR 37500.00
Why is this result different from the last query? The HAVING AVG(SALARY) < 38000 clause evaluated each grouping and returned only those with an average salary of under 38000, just what you expected. HAVING SALARY < 38000, on the other hand, had a different outcome. Take on the role of the SQL engine again. If the user asks you to evaluate and return groups of divisions where SALARY < 38000, you would examine each group and reject those where an individual SALARY is greater than 38000. In each division except PR, you would find at least one salary greater than 38000:
SELECT NAME, TEAM, SALARY FROM ORGCHART ORDER BY TEAM; NAME TEAM SALARY =============== =============== =========== FURY COLLECTIONS 35000.00 MEZA COLLECTIONS 40000.00 WILKES MARKETING 31000.00 STOKES MARKETING 36000.00 RICHARDSON MARKETING 42000.00 PRECOURT PR 37500.00 ADAMS RESEARCH 34000.00 MERRICK RESEARCH 45000.00
Therefore, you would reject all other groups except PR. What you really asked was Select all groups where no individual makes more than 38000. Don't you just hate it when the computer does exactly what you tell it to?
WARNING: Some implementations of SQL return an error if you use anything other than an aggregate function in a HAVING clause. Don't bet the farm on using the previous example until you check the implementation of the particular SQL you use.
Can you use more than one condition in your HAVING clause? Try this:
SELECT TEAM, AVG(SICKLEAVE),AVG(ANNUALLEAVE) FROM ORGCHART GROUP BY TEAM HAVING AVG(SICKLEAVE)>25 AND AVG(ANNUALLEAVE)<20;
The following table is grouped by TEAM. It shows all the teams with SICKLEAVE averages above 25 days and ANNUALLEAVE averages below 20 days.
TEAM AVG AVG =============== =========== =========== MARKETING 28 15 RESEARCH 27 15
You can also use an aggregate function in the HAVING clause that was not in the SELECT statement. For example:
SELECT TEAM, AVG(SICKLEAVE),AVG(ANNUALLEAVE) FROM ORGCHART GROUP BY TEAM HAVING COUNT(TEAM) > 1; TEAM AVG AVG =============== =========== =========== COLLECTIONS 26 21 MARKETING 28 15 RESEARCH 27 15
This query returns the number of TEAMs with more than one member. COUNT(TEAM) is not used in the SELECT statement but still functions as expected in the HAVING clause.
The other logical operators all work well within the HAVING clause. Consider this:
SELECT TEAM,MIN(SALARY),MAX(SALARY) FROM ORGCHART GROUP BY TEAM HAVING AVG(SALARY) > 37000 OR MIN(SALARY) > 32000; TEAM MIN MAX =============== =========== =========== COLLECTIONS 35000.00 40000.00 PR 37500.00 37500.00 RESEARCH 34000.00 45000.00
The operator IN also works in a HAVING clause, as demonstrated here:
SELECT TEAM,AVG(SALARY) FROM ORGCHART GROUP BY TEAM HAVING TEAM IN ('PR','RESEARCH'); TEAM AVG =============== =========== PR 37500.00 RESEARCH 39500.00
Nothing exists in a vacuum, so this section takes you through some composite examples that demonstrate how combinations of clauses perform together.
Find all the checks written for Cash and Gas in the CHECKS table and order them by REMARKS.
SELECT PAYEE, REMARKS FROM CHECKS WHERE PAYEE = 'Cash' OR REMARKS LIKE'Ga%' ORDER BY REMARKS;
PAYEE REMARKS ==================== ==================== Joans Gas Gas Joans Gas Gas Local Utilities Gas Cash Trip to Boston Cash Trip to Dayton Cash Wild Night Out
Note the use of LIKE to find the REMARKS that started with Ga. With the use of OR, data was returned if the WHERE clause met either one of the two conditions.
What if you asked for the same information and group it by PAYEE? The query would look something like this:
SELECT PAYEE, REMARKS FROM CHECKS WHERE PAYEE = 'Cash' OR REMARKS LIKE'Ga%' GROUP BY PAYEE ORDER BY REMARKS;
This query would not work because the SQL engine would not know what to do with the remarks. Remember that whatever columns you put in the SELECT clause must also be in the GROUP BY clause--unless you don't specify any columns in the SELECT clause.
Using the table ORGCHART, find the salary of everyone with less than 25 days of sick leave. Order the results by NAME.
SELECT NAME, SALARY FROM ORGCHART WHERE SICKLEAVE < 25 ORDER BY NAME;
NAME SALARY =============== =========== FURY 35000.00 MERRICK 45000.00 PRECOURT 37500.00 STOKES 36000.00
This query is straightforward and enables you to use your new-found skills with WHERE and ORDER BY.
Again, using ORGCHART, display TEAM, AVG(SALARY), AVG(SICKLEAVE), and AVG(ANNUALLEAVE) on each team:
SELECT TEAM, AVG(SALARY), AVG(SICKLEAVE), AVG(ANNUALLEAVE) FROM ORGCHART GROUP BY TEAM;
TEAM AVG AVG AVG =============== =========== =========== =========== COLLECTIONS 37500.00 26 21 MARKETING 36333.33 28 15 PR 37500.00 24 24 RESEARCH 39500.00 26 15
An interesting variation on this query follows. See if you can figure out what happened:
SELECT TEAM, AVG(SALARY), AVG(SICKLEAVE), AVG(ANNUALLEAVE) FROM ORGCHART GROUP BY TEAM ORDER BY NAME; TEAM AVG AVG AVG =============== =========== =========== =========== RESEARCH 39500.00 27 15 COLLECTIONS 37500.00 26 21 PR 37500.00 24 24
MARKETING 36333.33 28 15
A simpler query using ORDER BY might offer a clue:
SELECT NAME, TEAM FROM ORGCHART ORDER BY NAME, TEAM; NAME TEAM =============== =========== ADAMS RESEARCH FURY COLLECTIONS MERRICK RESEARCH MEZA COLLECTIONS PRECOURT PR RICHARDSON MARKETING STOKES MARKETING WILKES MARKETING
When the SQL engine got around to ordering the results of the query, it used the NAME column (remember, it is perfectly legal to use a column not specified in the SELECT statement), ignored duplicate TEAM entries, and came up with the order RESEARCH, COLLECTIONS, PR, and MARKETING. Including TEAM in the ORDER BY clause is unnecessary, because you have unique values in the NAME column. You can get the same result by typing this statement:
SELECT NAME, TEAM FROM ORGCHART ORDER BY NAME; NAME TEAM =============== ============ ADAMS RESEARCH FURY COLLECTIONS MERRICK RESEARCH MEZA COLLECTIONS PRECOURT PR RICHARDSON MARKETING STOKES MARKETING WILKES MARKETING
While you are looking at variations, don't forget you can also reverse the order:
SELECT NAME, TEAM FROM ORGCHART ORDER BY NAME DESC; NAME TEAM =============== ============ WILKES MARKETING STOKES MARKETING RICHARDSON MARKETING PRECOURT PR MEZA COLLECTIONS MERRICK RESEARCH FURY COLLECTIONS ADAMS RESEARCH
Is it possible to use everything you have learned in one query? It is, but the results will be convoluted because in many ways you are working with apples and oranges--or aggregates and nonaggregates. For example, WHERE and ORDER BY are usually found in queries that act on single rows, such as this:
SELECT * FROM ORGCHART ORDER BY NAME DESC;
NAME TEAM SALARY SICKLEAVE ANNUALLEAVE =============== ======== =========== =========== =========== WILKES MARKETING 31000.00 40 9 STOKES MARKETING 36000.00 20 19 RICHARDSON MARKETING 42000.00 25 18 PRECOURT PR 37500.00 24 24 MEZA COLLECTIONS 40000.00 30 27 MERRICK RESEARCH 45000.00 20 17 FURY COLLECTIONS 35000.00 22 14 ADAMS RESEARCH 34000.00 34 12
GROUP BY and HAVING are normally seen in the company of aggregates:
SELECT PAYEE, SUM(AMOUNT) TOTAL, COUNT(PAYEE) NUMBER_WRITTEN FROM CHECKS GROUP BY PAYEE HAVING SUM(AMOUNT) > 50; PAYEE TOTAL NUMBER_WRITTEN ==================== =============== ============== Cash 119 3 Joes Stale $ Dent 150 1 Local Utilities 219.5 3 Ma Bell 350.33002 2 Reading R.R. 245.34 1
You have seen that combining these two groups of clauses can have unexpected results, including the following:
SELECT PAYEE, SUM(AMOUNT) TOTAL, COUNT(PAYEE) NUMBER_WRITTEN FROM CHECKS WHERE AMOUNT >= 100 GROUP BY PAYEE HAVING SUM(AMOUNT) > 50;
PAYEE TOTAL NUMBER_WRITTEN ==================== =============== ============== Joes Stale $ Dent 150 1 Ma Bell 350.33002 2 Reading R.R. 245.34 1
Compare these two result sets and examine the raw data:
SELECT PAYEE, AMOUNT FROM CHECKS ORDER BY PAYEE; PAYEE AMOUNT ==================== =============== Abes Cleaners 10.5 Abes Cleaners 24.35 Cash 25 Cash 34 Cash 60 Joans Gas 15.75 Joans Gas 25.1 Joes Stale $ Dent 150 Local Utilities 34 Local Utilities 87.5 Local Utilities 98 Ma Bell 150 Ma Bell 200.33 Reading R.R. 245.34
You see how the WHERE clause filtered out all the checks less than 100 dollars before the GROUP BY was performed on the query. We are not trying to tell you not to mix these groups--you may have a requirement that this sort of construction will meet. However, you should not casually mix aggregate and nonaggregate functions. The previous examples have been tables with only a handful of rows. (Otherwise, you would need a cart to carry this guide.) In the real world you will be working with thousands and thousands (or billions and billions) of rows, and the subtle changes caused by mixing these clauses might not be so apparent.
ToChapter you learned all the clauses you need to exploit the power of a SELECT statement. Remember to be careful what you ask for because you just might get it. Your basic SQL education is complete. You already know enough to work effectively with single tables. Tomorrow (Chapter 6, "Joining Tables") you will have the opportunity to work with multiple tables.
A We did indeed cover WHERE on Chapter 3. You needed a knowledge of WHERE to understand how certain operators worked. WHERE appears again toChapter because it is a clause, and today's topic is clauses.
The Workshop provides quiz questions to help solidify your understanding of the material covered, as well as exercises to provide you with experience in using what you have learned. Try to answer the quiz and exercise questions before checking the answers in Appendix F, "Answers to Quizzes and Exercises."
2. What is the function of the GROUP BY clause, and what other clause does it act like?
3. Will this SELECT work?
SQL> SELECT NAME, AVG(SALARY), DEPARTMENT FROM PAY_TBL WHERE DEPARTMENT = 'ACCOUNTING' ORDER BY NAME GROUP BY DEPARTMENT, SALARY;
5. Can you use ORDER BY on a column that is not one of the columns in the SELECT statement?
2. Using the CHECKS table, write a SELECT that will return the following:
CHECK#_____PAYEE_______AMOUNT 1 MA BELL 150