Web based School

Oracle's SQL: An Overview

SQL—The Standard

Data Types

Numeric
Date
Character
Binary
Others

The CREATE Statement

Tables
Indexes
Sequences
Other Objects

Writing Queries

Built-In Functions
Know Your Tables and Views
Joining Tables

Avoid Cartesian Joins
Outer Joins
Subqueries

The DECODE Statement
INSERTs, UPDATEs, and DELETEs
Master/Detail or Parent/Child SQL
Additional Tips
Summary.

4

Structured Query Language (SQL) was introduced by IBM as the language to interface with its prototype relational database management system, System R. The first commercially available SQL relational database management system was introduced in 1979 by Oracle Corporation. Today, SQL has become an industry standard, and Oracle Corporation clearly leads the world in relational database management system technology.

Because SQL is a non-procedural language, sets of records can be manipulated instead of one record at a time. The syntax is free-flowing, enabling you to concentrate on the data presentation. Oracle has two optimizers (cost- and rule-based) that will parse the syntax and format it into an efficient statement before the database engine receives it for processing. The database administrator (DBA) determines which optimizer is in effect for each database instance.

SQL—The Standard

The American National Standards Institute (ANSI) has declared SQL as the standard language for relational database management systems. Most companies that produce relational database management systems support SQL and tend to comply with the ANSI SQL89 standard.

Data Types

This chapter does not give a lesson on data modeling and creating a proper database schema. In order to write proper SQL statements, familiarity with database objects (tables, views, constraints) are essential.

One general rule to follow when you are writing SQL statements is that data types cannot be mixed. Conversion utilities are available to convert from one type to another. These conversion functions are covered later in this chapter.

Numeric

The NUMBER data type is used to store zero, negative, positive, fixed, and floating point numbers with up to 38 digits of precision. Numbers range between 1.0x10 ^-130 and 1.0x10 ¹²⁶.

Numbers can be defined in one of three ways:

NUMBER(p,s)

where p is the precision up to 38 digits and s is the scale (number of digits to the right of the decimal point). The scale can range between -84 to 127.

NUMBER (p)

This is a fixed-point number with a scale of zero and a precision of p.

NUMBER

This is a floating-point number with a precision of 38.

The following table shows how Oracle stores different scales and precisions:

Actual Data	Defined as	Stored as
123456.789	NUMBER(6,2)	123456.79
123456.789	NUMBER(6)	123457
123456.789	NUMBER(6,-2)	123400
123456.789	NUMBER	123456.789

Date

Instead of storing date and time information in a character or numeric format, IBM created a separate data type. For each DATE data type, the following information is stored:

Century

Year

Month

Day

Hour

Minute

Second

You can easily retrieve the current date and time by using the function SYSDATE.

Date arithmetic is possible using number constants or other dates. Only addition and subtraction are supported. For example, SYSDATE + 7 will return one week from today.

Every database system has a default date format that is defined by the initialization parameter NLS_DATE_FORMAT. This parameter is usually set to DD-MON-YY, where DD is the day of the month (the first day of the month is 01), MON is the abbreviated month name, and YY is a two-digit year designation.

If you do not specify a time, the default time is 12:00:00 a.m. If only the time component is captured, the default date will be the first day of the current month.

Character

There are four character types available:

The CHAR data type is used where fixed-length fields are necessary. Any length up to 255 characters can be specified. The default length is 1. When data is entered, any space left over will be filled with blanks. All alpha-numeric characters are allowed.
The VARCHAR2 is used for variable-length fields. A length component must be supplied when you use this data type. The maximum length is 2000 characters. All alpha-numeric characters are allowed.
The LONG data type is used to store large amounts of variable-length text. Any length up to 2 GB can be specified. Be aware that there are some restrictions to using this data type, such as:

Only one column per table can be defined as LONG.

A LONG column cannot be indexed.

A LONG column cannot be passed as an argument to a procedure.

A function cannot be used to return a LONG column.

A LONG column cannot be used in where, order by, group by, or connect by clauses.
4. The VARCHAR data type is synonymous with VARCHAR2. Oracle Corporation is reserving this for future use. Do not use this data type.

Binary

Two data types, RAW and LONGRAW, are available for storing binary type data such as digitized sound and images. These data types take on similar characteristics as the VARCHAR2 and LONG data types already mentioned.

Use the RAW data type to store binary data up to 2000 characters and use the LONGRAW data type to store binary data up to 2 GB.

Oracle only stores and retrieves binary data; no string manipulations are allowed. Data is retrieved as hexadecimal character values.

Others

Every row in the database has an address. You can retrieve this address by using the ROWID function. The format of the ROWID is as follows:

BLOCK.ROW.FILE

BLOCK is the data block of the data FILE containing the ROW. The data is in hexadecimal format and has the data type ROWID.

MLSLABEL is a data type used to store the binary format of a label used on a secure operating system.

The CREATE Statement

The CREATE statement opens the world to the user. Whether a simple temporary table is to be created or a complex database schema, you will repeatedly use the CREATE statement. Only a few of the more common CREATE statements are covered here.

Tables

Every database designer will have to create a table sometime. The CREATE TABLE system privilege is needed to execute this command. The DBA is responsible for administering these privileges. The syntax to create a table is

CREATE TABLE schema.TABLE (COLUMN DATATYPE default expression column constraint) table constraint

PCTFREE x PCTUSED x INITRANS x MAXTRANS x TABLESPACE name STORAGE clause CLUSTER cluster clause

ENABLE clause DISABLE clause AS subquery

In this syntax, SCHEMA is an optional parameter to identify which database schema to place this table in. The default is your own.

TABLE is mandatory and is the name of your table.

COLUMN DATATYPE are required to identify each column in the table. Separate the columns with commas. There is a maximum of 254 columns per table.

The DEFAULT expression is optional and is used to assign a default value to a column when a subsequent insert statement fails to assign a value.

COLUMN CONSTRAINT is optional. It is used to define an integrity constraint such as not null.

TABLE CONSTRAINT is optional and is used to define an integrity constraint as part of the table, such as the primary key.

PCTFREE is optional but has a default of 10. This indicates that 10 percent for each data block will be reserved for future updates to the table's rows. Integers from 1 to 99 are allowed.

PCTUSED is optional but has a default of 40. This indicates the minimum percentage of space used that Oracle maintains before a data block becomes a candidate for row insertion. Integers from 1 to 99 are allowed. The sum of PCTFREE and PCTUSED must be less than 100.

INITRANS is optional but has a default of 1. Integers from 1 to 255 are allowed. It is recommended that you leave this alone. This is an allocation of the number of transaction entries assigned within the data block for the table.

MAXTRANS is optional but has a default that is a function of the data block size. This is used to identify the maximum number of concurrent transactions that can update a data block for your table. It is recommended that this parameter not be changed.

TABLESPACE is optional but has a default value as the tablespace name of the owner of the schema. A different tablespace name than the default can be used. Tablespace names are usually application-dependent. The DBA will be able to give proper recommendations.

STORAGE is optional and has default characteristics defined by the DBA.

CLUSTER is optional and specifies that a table is to be part of a cluster. You must identify the columns from the table that need to be clustered. Typically, the cluster columns are columns that comprise the primary key.

ENABLE is optional and turns on an integrity constraint.

DISABLE is optional and turns off an integrity constraint.

AS SUBQUERY is optional and inserts the rows returned by the subquery into the table upon creation.

Once the table is created, you can use the ALTER TABLE command to make alterations to the table. To modify an integrity constraint, DROP the constraint first, and then re-create it.

Let's look at two examples on creating tables:

CREATE TABLE ADDRESSES (ADRS_ID NUMBER(6),

    ACTIVE_DATE DATE,

    BOX_NUMBER NUMBER(6),

    ADDRS_1 VARCHAR2(40),

    ADDRS_2 VARCHAR2(40),

    CITY VARCHAR2(40),

    STATE VARCHAR2(2),

    ZIP VARCHAR2(10));

This is the simplest form of a table create using all of the default capabilities. The second example follows:

CREATE TABLE ADDRESSES (ADRS_ID NUMBER(6) CONSTRAINT PK_ADRS PRIMARY KEY,

ACTIVE_DATE DATE DEFAULT SYSDATE,

    BOX_NUMBER NUMBER(6) DEFAULT NULL,

    ADDRS_1 VARCHAR2(40) NOT NULL,

    ADDRS_2 VARCHAR2(40) DEFAULT NULL,

    CITY VARCHAR2(40) DEFAULT NULL,

    STATE VARCHAR2(2) DEFAULT 'NY',

    ZIP VARCHAR2(10))

PCTFREE 5

PCTUSED 65

TABLESPACE adrs_data

STORAGE (INITIAL 5140

NEXT 5140

MINEXTENTS 1

MAXEXTENTS 10

PCTINCREASE 10);

In this example, data constraints are being utilized and certain storage parameters will be in effect. Using PCTFREE and PCTUSED is a good idea if your data is relatively static.

Indexes

Indexes are used to increase performance of the database. An index is created on one or more columns of a table or cluster. Multiple indexes per table are allowed. The CREATE INDEX system privilege is needed to execute this command. The DBA is responsible for administering these privileges. The syntax to create an index is

CREATE INDEX schema.index ON schema.table (COLUMN ASC/DESC) CLUSTER schema.cluster INITRANS x MAXTRANS x TABLESPACE name STORAGE clause PCTFREE x NOSORT

In this syntax, SCHEMA is an optional parameter to identify which database schema to place this table in. The default is your own.

INDEX is mandatory and is the name of the index.

ON is a mandatory reserved word.

TABLE is a mandatory table name upon which the index will be built.

COLUMN is the column name to be indexed. If there is more than one column, make sure they are in order of priority.

ASC/DESC are optional parameters. Indexes are built in ascending order by default. Use DESC for descending order.

CLUSTER is needed only if this index is for a cluster.

INITRANS is optional but has a default of 1. Integers from 1 to 255 are allowed. It is recommended that this parameter not be changed. This is an allocation of the number of transaction entries assigned within the data block for the index.

MAXTRANS is optional but has a default that is a function of the data block size. It is used to identify the maximum number of concurrent transactions that can update a data block for the index. It is recommended that this parameter not be changed.

TABLESPACE is optional but has a default value as the tablespace name of the owner of the schema. A different tablespace name than the default might be needed. The DBA will be able to give some recommendations.

STORAGE is optional and has default characteristics defined by the DBA.

PCTFREE is optional but has a default of 10. This indicates that 10 percent for each data block will be reserved for future updates to the index. Integers from 1 to 99 are allowed.

NOSORT is an optional parameter that will save time when creating the index if the table data is already stored in ascending order. This cannot be used if a clustered index is being created.

Using the addresses table defined from the create table example, two indexes will be created in the next example.

CREATE INDEX x_adrs_id ON ADDRESSES (ADRS_ID);

This will create an index on the adrs_id column only.

CREATE INDEX x_city_state ON ADDRESSES (CITY,STATE)

TABLESPACE application_indexes;

This index has two columns; CITY is the primary column. In order for queries to use an index, the column names must be part of the select statement. If a select statement included STATE but not CITY, the index would not be used. However, if the select statement contained a reference to CITY but not STATE, part of the index would be used because CITY is the first column of the index.

Sequences

Sequences are a great way to have the database automatically generate unique integer primary keys. The CREATE SEQUENCE system privilege is needed to execute this command. The DBA is responsible for administering these privileges. The syntax to create a sequence is

CREATE SEQUENCE schema.name

   INCREMENT BY x

   START WITH x

   MAXVALUE x    NOMAXVALUE

   MINVALUE x    NOMINVALUE

   CYCLE         NOCYCLE

   CACHE x       NOCACHE

   ORDER         NOORDER

In this syntax, SCHEMA is an optional parameter that identifies which database schema to place this sequence in. The default is your own.

NAME is mandatory because it is the name of the sequence.

INCREMENT BY is optional. The default is one. Zero is not allowed. If a negative integer is specified, the sequence will descend in order. A positive integer will make the sequence ascend (the default).

START WITH is an optional integer that enables the sequence to begin anywhere.

MAXVALUE is an optional integer that places a limit on the sequence.

NOMAXVALUE is optional. It causes the maximum ascending limit to be 10 ²⁷ and -1 for descending sequences. This is the default.

MINVALUE is an optional integer that determines the minimum a sequence can be.

NOMINVALUE is optional. It causes the minimum ascending limit to be 1 and -(10 ²⁶) for descending sequences. This is the default.

CYCLE is an option that enables the sequence to continue even when the maximum has been reached. If the maximum is reached, the next sequence that will be generated is whatever the minimum value is.

NOCYCLE is an option that does not enable the sequence to generate values beyond the defined maximum or minimum. This is the default.

CACHE is an option that enables sequence numbers to be preallocated that will be stored in memory for faster access. The minimum value is 2.

NOCACHE is an option that will not enable the preallocation of sequence numbers.

ORDER is an option that ensures the sequence numbers are generated in order of request.

NOORDER is an option that does not ensure that sequence numbers are generated in the order they are requested.

If you want to create a sequence for your adrs_id column in the ADDRESSES table, it could look like the following example:

CREATE SEQUENCE adrs_seq

   INCREMENT BY 5

   START WITH 100;

To generate a new sequence number, use the pseudocolumn NEXTVAL. This needs to be preceded with your sequence name. For example, adrs_seq.nextval would return 100 for the first access and 105 for the second. If determining the current sequence number is necessary, use CURRVAL. Therefore, adrs_seq.currval will return the current value of the sequence.

Other Objects

The purpose of this chapter is not to elaborate on every SQL statement. The ones given have been covered to give an overview of the more common create statements. Listed next is an alphabetical list of all objects that can be created with the CREATE statement.

CREATE xxx, where xxx is one of the following:

CLUSTER

CONTROLFILE

DATABASE

DATABASE LINK

DATAFILE

FUNCTION

INDEX

PACKAGE BODY

PACKAGE

PROCEDURE

PROFILE

ROLE

ROLLBACK SEGMENT

SCHEMA

SEQUENCE

SNAPSHOT

SNAPSHOT LOG

SYNONYM

TABLE

TABLESPACE

TRIGGER

USER

VIEW

Writing Queries

To retrieve data from the database, use the SELECT statement. Once again, proper privileges are required and are maintained by the DBA. The SELECT statement has the following format:

SELECT column(s)

FROM tables(s)

WHERE conditions are met

GROUP BY selected columns

ORDER BY column(s);

Every SQL statement ends with a semicolon (;). When you are writing scripts (disk files) that will be executed, you can also use a slash (\) to terminate the SQL statement.

When SELECT column(s) is used, it is assumed that all of the columns fitting the WHERE clause will be retrieved. It is sometimes necessary to only retrieve columns that are distinct from one another. To do this, use the reserved word DISTINCT before the column descriptions. In the following example, a SELECT statement is used to retrieve all of the cities and states from the addresses table (defined previously).

SELECT city, state

FROM addresses;

When this code run, every city and state will be retrieved from the table. If 30 people lived in Rochester, NY, the data would be displayed 30 times. To see only one occurrence for each city and state use the DISTINCT qualifier, as shown in the following example:

SELECT DISTINCT city, state

FROM addresses;

This will cause only one row to be retrieved for entries with Rochester, NY.

The FROM clause is a listing of all tables needed for the query. You can use table aliases to help simplify queries, as shown in the following example:

SELECT adrs.city, adrs.state

FROM addresses adrs;

In this example, the alias adrs has been given to the table addresses. The alias will be used to differentiate columns with the same name from different tables.

The WHERE clause is used to list the criteria necessary to restrict the output from the query or to join tables in the FROM clause. See the following example.

SELECT DISTINCT city, state

FROM addresses

WHERE state in ('CA','NY','CT')

     AND city is NOT NULL;

This example will retrieve cities and states that are in the states of California, New York, and Connecticut. The check for NOT NULL cities will not bring data back if the city field was not filled in.

The GROUP BY clause tells Oracle how to group the records together when certain functions are used.

SELECT dept_no, SUM(emp_salary)

FROM emp

GROUP BY dept_no;

The GROUP BY example will list all department numbers once with the summation of the employee salaries for that particular department.

Built-In Functions

Functions are an intrinsic part of any SQL statement. Table 4.1 shows an alphabetical list of SQL functions.

Table 4.1. SQL functions.

*Name*	*Type*	*Syntax*	*Returns*
ABS	Number	ABS(n)	Absolute value of n.
ADD_MONTHS	Date	ADD_MONTHS(a,b)	Date a plus b months.
ASCII	Character	ASCII(c)	Decimal representation of c.
AVG	Group	AVG(DISTINCT\|ALL n)	Average value of n. ALL is default.
CEIL	Number	CEIL(n)	Smallest integer equal to or greater than n.
CHARTOROWID	Conversion	CHARTOROWID(c)	Converts character to rowid data type.
CHR	Character	CHR(n)	Character having binary equivalent to n.
CONCAT	Character	CONCAT(1,2)	Character 1 concatenated with character 2.
CONVERT	Conversion	CONVERT(a,	Converts character
		dest_c [,source_c])	string a from one character set to another. The source source_c to the destination character set dest_c.
COS	Number	COS(n)	Cosine of n.
COSH	Number	COSH(n)	Hyperbolic cosine of n.
COUNT	Group	COUNT(DISTINCT\|ALL e)	Number of rows in a query. ALL is default. e can be represented as * to indicate all columns.
EXP	Number	EXP(n)	e raised to the nth power.
FLOOR	Number	FLOOR(n)	Largest integer equal to or less than n.
GREATEST	Other	GREATEST(e [,e]...)	The greatest of the list of expressions e.
HEXTORAW	Conversion	HEXTORAW	Converts hexadecimal character c to raw.
INITCAP	Character	INITCAP	c with the first letter of each word in uppercase.
INSTR	Character	INSTR	Searches 1 with
		(1, 2 [, n [, m]])	nth character for mth occurrence of 2 and returns the position of the occurrence.
INSTRB	Character	INSTRB(1,2[,n[,m]])	Same as INSTR except numeric parameters are in terms of bytes.
LAST_DAY	Dae	LAST_DAY(a)	Last day of the month (date) containing a.
LEAST	Other	LEAST(e [,e]...)	The least of the list of expressions e.
LENGTH	Character	LENGTH(c)	Number of characters in c. If c is a fixed-length data type (char), all trailing blanks are included.
LENGTHB	Character	LENGTHB	Same as LENGTH except in bytes.
LN	Number	LN(n)	Natural logarithm if n, where n > 0.
LOG	Number	LOG(b,n)	Logarithm, base b, of n.
LOWER	Character	LOWER(c)	c with all letters in lowercase.
LPAD	Character	LPAD(1,n [,2])	Character 1 left padded to length of n. If character 2 is not omitted, use as a pattern instead of blanks.
LTRIM	Character	LTRIM(c [,set])	Removed characters from the left of c. If set s defined, remove initial characters up to the first character not in set.
MAX	Other	MAX(DISTINCT\|ALL e)	Maximum of expression e. ALL is default.
MIN	Other	MIN(DISTINCT\|ALL e)	Minimum of expression e. ALL is default.
MOD	Number	MOD(r,n)	Remainder of r divided by n.
MONTHS_BETWEEN	Date	MONTHS_BETWEEN(a,b)	Number of days between dates a and b.
NEW_TIME	Date	NEW_TIME(a, z1, z2)	Date and time in time zone z2 when date and time in time zone z1 are a).
NEXT_DAY	Date	NEXT_DAY(a, c)	Date of first weekday identified by c that is later than date a.
NLSSORT	Character	NLSSORT((c [,parm])	String of bytes to sort c.
NLS_INITCAP	Character	NLS_INITCAP	c with the first
		(c [,parm])	letter of each word in uppercase. parm has the form of NLS_SORT = s where s is a linguistic sort or binary.
NLS_LOWER	Character	NLS_LOWER(c [,parm])	c with all letters lowercase. See parm above.
NLS_UPPER	Character	NLS_UPPER(c [,parm])	c with all letters uppercase. See parm above.
NVL	Other	NVL(e1, e2)	If e1 is null, returns e2. If e1 is not null, returns e1.
POWER	Number	POWER(m,n)	m raised to the nth power.
RAWTOHEX	Conversion	RAWTOHEX(raw)	Converts raw value to its hexadecimal equivalent.
REPLACE	Character	REPLACE(c, s1 [, r2])	Replace each occurrence of string s1 in c with r2. If r2 is omitted then all occurrences of s1 are removed.
ROUND	Date	ROUND(n [,f])	Date rounded to format model f. If f is omitted, n will be rounded to nearest day.
ROUND	Number	ROUND(n[,m])	n rounded to m places right of decimal point. If m is omitted, to 0 places.
ROWIDTOCHAR	Conversion	ROWIDTOCHAR(rowid)	Converts rowid to varchar2 format with length of 18.
RPAD	Character	RPAD(1, n [, 2])	1 right-padded to length of n with 2.
RTRIM	Character	RTRIM(c [, s])	c with characters removed after last character not in set s. If s is omitted, set defaulted to ''.
SIGN	Number	SIGN(n)	-1 if n < 0, 0 if n = 0, 1 if n > 0.
SIN	Number	SIN(n)	Sine of n.
SINH	Number	SINH(n)	Hyperbolic sine of n.
SOUNDEX	Character	SOUNDEX(c)	A string with phonetic representation of c.
SUBSTR	Character	SUBSTR(c, m [,n])	A portion of c beginning at character number m for n characters. If m is negative, Oracle counts backward from the end of c. If n is omitted, all characters are returned to the end of c.
SUBSTRB	Character	SUBSTRB(c, m [,n])	The same as SUBSTR except m and n are number of bytes.
SQRT	Number	SQRT(n)	Square root of n.
STDDEV	Group	STDDEV(DISTINCT\|ALL n)	Standard deviation of number n.
SUM	Group	SUM(DISTINCT\|ALL n)	Sum of numbers n.
SYSDATE	Date	SYSDATE	Current date and time.
TAN	Number	TAN(n)	Tangent of n.
TANH	Number	TANH(n)	Hyperbolic tangent of n.
TO_CHAR	Conversion	TO_CHAR	Converts d date to
		(d [,f [,parm])	varchar2 data type with format f and nls_date_language of parm.
TO_CHAR	Conversion	TO_CHAR	Converts n number
		(n [,f [,parm])	data type to a varchar2 equivalent and number format element parm.
TO_DATE	Conversion	TO_DATE	Converts varchar2
		(c [, f [, parm])	data type c to date data type with format f and nls date format element parm.
TO_MULTI_BYTE	Conversion	TO_MULTI_BYTE	Converts c to their corresponding multibyte equivalent.
TO_NUMBER	Conversion	TO_NUMBER	Converts character
		(c [,f [, parm]])	c to a number using format f and nls number format element parm.
TO_SINGLE_BYTE	Conversion	TO_SINGLE_BYTE(c)	Converts multibyte character c to its single byte equivalent.
TRANSLATE	Character	TRANSLATE(c, f, t)	c with each occurrences in f with each corresponding character in t.
TRUNC	Date	TRUNC(c [,f])	c with time portion truncated to format f.
TRUNC	Number	TRUNC(n[,m])	n truncated to m decimal places. If m is omitted, to 0 places.
UID	Other	UID	An integer that uniquely identifies the user.
USER	Other	USER	Current user as a varchar2.
UPPER	Character	UPPER(c)	c with all letters in uppercase.
VARIANCE	Group	VARIANCE	Variance of number
		(DISTINCT\|ALL n)	n.
VSIZE	Other	VSIZE(e)	Number of bytes from the internal representation of e.

Now look at some examples using functions.

SELECT SUBSTR(addrs_1,1,30),

                city, state, zip

FROM addresses

WHERE addrs_1 is not null

      AND UPPER(city)  =  'ROCHESTER'

      AND TO_NUMBER(SUBSTR(zip,1,5)) > 14525

      AND NVL(active_date,SYSDATE)  >  TO_DATE('01-JAN-90');

Notice the use of the UPPER function. When Oracle performs character string comparisons, the case (upper- and lower-) of the strings in question have to match exactly. Therefore, 'Rochester' does not equal 'ROCHESTER'. The UPPER function will ensure that the column city will be converted to uppercase prior to the comparison of the literal 'ROCHESTER'.

The SUBSTR function is also used to retrieve the characters 1 through 30 of column addrs_1. All remaining characters beyond 30 will not be seen. This function is also used in the WHERE clause to retrieve the first five characters of the zip column before converting it to a numerical value. The comparison is made after the conversion has taken place.

If the column active_date contains any nulls, they will be included in the data set because of the NVL function. If active_date is null, the current date will be returned before the comparison is made to the constant '01-JAN-90'. The constant '01-JAN-90' is converted to a date data type to ensure format compatibility. For a complete list of all date formats, see "Elements of SQL" in Oracle's SQL Language Reference Manual.

SELECT dept_no,

               SUM(emp_salary),

               AVG(emp_salary)

FROM emp

WHERE dept_no = dept_no

GROUP BY dept_no;

This example shows the use of the SUM and AVG functions. The retrieved data will show the summation of employee salaries and the average salary by department. Notice that the GROUP BY clause has to be used in this query.

Know Your Tables and Views

To ensure that your data contains all of the required columns and restrictions, you must be familiar with the database schema. If a schema diagram is not available, there are numerous ways to find out what tables or views might be needed for writing queries. One way is to look at some of the data dictionary tables.

To view all of the data dictionary table names, issue the following SELECT statement:

SELECT table_name

FROM dictionary

ORDER BY table_name;

Some of the tables of interest should be all_tables, all_columns, all_views and all_constraints.

To view the column names of these tables, issue 'DESC table_name'. DESC stands for DESCribe and 'table_name' is the name of the table in question, such as 'all_tables'. Therefore, 'DESC all_tables' will return all of the columns and their data types for the table 'all_tables'.

With the help of the data dictionary tables, it is possible to determine what tables, views, and constraints are in effect for the application in question.

Joining Tables

Tables are physically joined in the FROM clause of your query. They are logically joined in the WHERE clause. Table columns that appear in the WHERE clause must have the table name listed in the FROM clause. The WHERE clause is where the tables relate one to another.

The way in which the WHERE clause is constructed greatly affects the performance of the query. A two-table join does not necessarily perform better than a 10-table join.

If there are a lot of queries that have a large number of tables joined together (more than seven tables, for example), you might need to consider denormalizing certain data elements to reduce the number of table joins. This type of denormalization might be required when user productivity or system performance has significantly decreased.

Table 4.2 shows three tables that you will be working with for the examples.

Table. 4.2. Table descriptions.

*Table Name*	*Column Name*	*Data Type*
emp	emp_id	number(6)
emp	adrs_id	number(6)
emp	first_name	varchar2(40)
emp	last_name	varchar2(40)
emp	dept_no	number(3)
emp	hire_date	date
emp	job_title	varchar2(40)
emp	salary	number(6)
emp	manager_id	number(6)

dept	dept_no	number(3)
dept	name	varchar(40)
dept	adrs_id	number(6)

addresses	adrs_id	number(6)
addresses	active_date	date
addresses	box_number	number(6)
addresses	adrs_1	varchar2(40)
addresses	adrs_2	varchar2(40)
addresses	city	varchar2(40)
addresses	state	varchar2(2)
addresses	zip	varchar2(10)

In the following example, a query is written that will list all departments with their corresponding employees and the city in which the department resides.

SELECT d.name,

       e.last_name,

       e.first_name,

       a.city

FROM emp        e,

     dept       d,

     addresses  a

WHERE d.dept_no    = e.dept_no

  AND a.adrs_id    = d.adrs_id

ORDER BY d.name,e.last_name,e.first_name;

If the employee city needed to be retrieved as well, the query could be written like the following:

SELECT d.name,

       a.city  dept_city,

       e.last_ name,

       e.first_name,

       z.city  emp_city

FROM emp        e,

     dept       d,

     addresses  a,

     addresses  z

WHERE  d.dept_no      = e.dept_no

  AND  a.adrs_id      = d.adrs_id

  AND  z.adrs_id      = e.adrs_id

ORDER BY d.name,e.last_name,e.first_name;

In this example the addresses table was joined twice, enabling the city column to be retrieved for both the department and employee. In order to clarify the output, aliases were assigned to the different city columns in the SELECT portion of the query.

The following example adds the employee manager's name to the query.

SELECT d.name,

       a.city  dept_city,

       e.last_name,

       e.first_name,

       z.city  emp_city,

       m.first_name || m.last_name manager

FROM emp          e,

     dept       d,

     addresses  a,

     addresses  z,

     emp        m

WHERE  d.dept_no      = e.dept_no

  AND  a.adrs_id      = d.adrs_id

  AND  z.adrs_id      = e.adrs_id

  AND  m.emp_id       = e.manager_id

ORDER BY    d.name,e.last_name,e.first_name;

The output from this query will cause the manager (alias) column to appear as one column even though it is made from two columns. The symbol (||) is used to concatenate columns together.

Avoid Cartesian Joins

A Cartesian join happens when the WHERE clause is not properly constructed. A record is returned for every occurrence in table Z and table X. See the following example:

SELECT X.name,

       Z.last_name,

       Z.first_name

FROM   emp          Z,

       dept         X

ORDER BY X.name, Z.last_name;

If the emp table had 10 employees and the department table had three departments, this query would return 30 rows. For each department name, all employees would be listed because the tables are not joined properly (not at all in this example). If the join condition WHERE X.dept_no = Z.dept_no existed, only 10 rows would be retrieved.

Outer Joins

When the columns of a table are outer joined, this tells the database to retrieve rows even if data is not found. The plus symbol (+) is used to denote an outer join condition, as shown in the following example:

SELECT d.name,

       a.city,

       e.last_name,

       e.first_name

FROM emp            e,

     dept           d,

     addresses      a

WHERE       d.dept_no(+)   = e.dept_no

       AND  a.adrs_id      = d.adrs_id

ORDER BY d.name,e.last_name,e.first_name;

If the president of the company was never assigned a department, his name would never be retrieved in previous examples because his department number would be null. The outer join would cause all rows to be retrieved even if there is not a match for dept_no.

Outer joins are effective but will make the query perform slower. You might need to rewrite the query if you need to improve performance.

Subqueries

Subqueries, or nested queries, are used to bring back a set of rows to be used by the parent query. Depending on how the subquery is written, it can be executed once for the parent query or it can be executed once for each row returned by the parent query. If the subquery is executed for each row of the parent, this is called a correlated subquery.

A correlated subquery can be easily identified if it contains any references to the parent subquery columns in its WHERE clause. Columns from the subquery cannot be referenced anywhere else in the parent query. The following example demonstrates a non-correlated subquery.

SELECT e.first_name,

       e.last_name,

       e.job_title

FROM emp  e

WHERE e.dept_no in (SELECT dept_no

                    FROM dept

                    WHERE name = 'ADMIN');

In this example, all employee names and job titles will be retrieved for the department 'ADMIN'. Notice the use of the operator in when referring to the subquery. The in operator is used when one or more rows might be returned by a subquery. If the equal operator (=) is used, it is assumed that only one row will be returned. If the equal operator (=) is used and more than one row is returned, Oracle will return an error.

This statement could have been written by directly joining the dept table with the emp table in the main or parent query. Subqueries are sometimes used for performance gain. If the parent query contains a lot of tables, it might be advantageous to break up the WHERE clause into subqueries.

SELECT d.name,

       e.first_name,

       e.last_name,

       e.job_title

FROM emp  e,

     dept d

WHERE e.dept_no = d.dept_no

  AND d.adrs_id = (SELECT adrs_id

                   FROM ADDRESSES

                   WHERE adrs_id = d.adrs_id)

ORDER BY d.name, e.job_title, e.last_name;

In this example, all employees with their corresponding departments will be retrieved only for departments that have a valid adrs_id in the addresses table. This is a correlated subquery because the subquery references a column in the parent query.

SELECT d.name,

       e.first_name,

       e.last_name,

       e.job_title

FROM emp  e,

     dept d

WHERE e.dept_no = d.dept_no

  AND not exists (SELECT 'X'

                  FROM ADDRESSES

                  WHERE city in ('ROCHESTER','NEW YORK')

                    AND adrs_id = d.adrs_id)

ORDER BY d.name, e.job_title, e.last_name;

This example will return all departments and employees except where departments are located in 'ROCHESTER' and 'NEW YORK'. SELECT 'X' will return a true or false type answer that will be evaluated by the not exists operator. Any constant could be used here; 'X' is only one example.

The DECODE Statement

One of the most powerful and overlooked SQL statements is the DECODE statement. The DECODE statement has the following syntax:

DECODE(val, exp1, exp2, exp3, exp4, ..., def);

DECODE will first evaluate the value or expression val and then compare expression exp1 to val. If val equals exp1, expression exp2 will be returned. If val does not equal exp1, expression exp3 will be evaluated and returns expression exp4 if val equals exp3. This process continues until all expressions have been evaluated. If there are no matches, the default def will be returned.

SELECT e.first_name,

       e.last_name,

       e.job_title,

       DECODE(e.job_title, 'President', '******', e.salary)

FROM emp e

WHERE e.emp_id in (SELECT NVL(z.manager_id, e.emp_id)

                   FROM emp z);

In this example, all manager names will be retrieved with their salaries. When the row identifying the president is displayed, show '******' instead of his salary. Also notice the NVL function used to evaluate a null manager ID. Only the president will have a null manager ID, which would not have been retrieved without the NVL.

Also notice that DECODE is evaluating job_title and returning salary, which would normally be a data type mismatch since the job title and salary columns are different data types but is okay here.

SELECT e.first_name,

       e.last_name,

       e.job_title,

       e.salary

FROM emp e

WHERE DECODE(USER,'PRES',e.emp_id,

             UPPER(e.last_name),e.emp_id,  0) = e.emp_id ;

In this example, if the user is the president, all employees will be returned with their corresponding salary. For all other users, only one row will be retrieved, enabling the user to see his or her own salary only.

SELECT e.first_name,

       e.last_name,

       e.job_title,

       DECODE(USER,'ADMIN',DECODE(e.job_title, 'PRESEDENT', '*****', e.salary),

                                               'PRES', e.salary, '******')

FROM emp e

WHERE e.emp_id in (SELECT NVL(z.manager_id, e.emp_id)

                   FROM emp z);

In this example, the DECODE statement is nested with another DECODE statement. If the Oracle user is 'ADMIN', show the salaries except for the president's salary. If the Oracle user is 'PRES', show all salaries and if the user is anybody else, return '******'.

Another place the DECODE statement can be used is in the ORDER BY clause. The next example will sort the output in such a way that the president is the first row returned followed by the departments 'SALES', 'ADMIN', and then 'IS' with their corresponding employees.

SELECT d.name,

       e.job_title,

       e.first_name,

       e.last_name

FROM emp e,

     dept d

WHERE d.dept_no = e.dept_no

ORDER BY DECODE(e.job_title,'PRESIDENT', 0,

                DECODE(d.name,'SALES',   1, 

                              'ADMIN',   2, 3)), e.last_name;

This example does not ORDER BY e.job_title but uses this column to search for the title 'PRESIDENT' and returns a 0. For all other rows, another DECODE is used to evaluate the department name and returning numbers 1, 2, or 3 depending upon what the department name is. After the DECODEs are finished, the data is further sorted by employee last name e.last_name.

INSERTs, ***UPDATEs, and DELETE***s

The INSERT statement is used to put new rows into the database. This can be done one row at a time using the VALUES expression or a whole set of records at a time using a subquery. The following is the syntax for an INSERT statement:

INSERT INTO schema.table column(s) VALUES subquery

where schema is an optional parameter to identify which database schema to use for the insert. The default is your own.

table is mandatory and is the name of the table.

column is a list of columns that will receive the inserted values.

VALUES is used when one row of data will be inserted. Values are represented as constants.

subquery is used when the VALUES option is not used. The columns in the subquery must match the sequence and data types of the columns in the insert list.

INSERT INTO dept  (dept_no,

                   name,

                   adrs_id)

   VALUES (dept_seq.NEXTVAL,

           'CUSTOMER SERVICE',

           adrs_seq.NEXTVAL);

This example inserts one row into the table dept. Sequences dept_seq and adrs_seq are used to retrieve the next numeric values for dept_no and adrs_id.

If multiple rows need to be inserted, the INSERT EXAMPLE 1 statement would have to executed for each individual row. If a subquery can be used, multiple rows would be inserted for each row returned by the subquery.

INSERT INTO emp (emp_id,

                 first_name,

                 last_name,

                 dept_no,

                 hire_date,

                 job_title,

                 salary,

                 manager_id)

      SELECT emp_seq.NEXTVAL,

             new.first_name,

             new.last_name,

             30,

             SYSDATE,

             'CUSTOMER REPRESENTATIVE',

             new.salary,

             220

      FROM candidates new

      WHERE new.accept      = 'YES'

            AND new.dept_no = 30;

This example will insert all rows form the candidates table that have been assigned to department number 30 and have been accepted. Because the department number and manager ID are known, they are used as constants in the subquery. The UPDATE statement is used to change existing rows in the database. The syntax for the UPDATE statement is

UPDATE  schema.table SET column(s) = expr sub  query  WHERE condition

where

schema is an optional parameter to identify which database schema to use for the update. The default is your own.

table is mandatory and is the name of the table.

SET is a mandatory reserved word.

column is a list of columns that will receive the updated values.

expr is the new value to be assigned.

sub query is a select statement that will retrieve the new data values.

WHERE is optional and is used to restrict which rows are to be updated.

UPDATE emp

   SET dept_no = 30

WHERE last_name = 'DOE'

      AND first_name = 'JOHN';

This example will transfer an employee named JOHN DOE to department 30. If there is more than one JOHN DOE, further restrictions will have to be made in the WHERE clause.

UPDATE emp

   SET salary = salary + (salary * .05);

This update example will give everyone in table emp a 5 percent increase in salary.

UPDATE emp a

   SET a.salary = (SELECT a.salary 

                       + (a.salary * DECODE(d.name, 'SALES', .1,

                                                    'ADMIN', .07,

                                                             .06))

FROM dept d

                   WHERE d.dept_no = a.dept_no)

WHERE a.dept_no = (SELECT dept_no

                   FROM dept y, addresses z

                   WHERE y.adrs_id = z.adrs_id

                     AND z.city    = 'ROCHESTER');

This example will give raises to employees located in Rochester. The amount of the raise is handled by the DECODE statement evaluating the department name. Employees in the Sales department will receive a 10 percent raise, the Admin department receives a seven percent raise, and everyone else receives a 6 percent raise.

The DELETE statement is used to remove database rows. The syntax for DELETE is

DELETE FROM schema.table WHERE condition

where

SCHEMA is an optional parameter to identify which database schema to use for the delete. The default is your own.

TABLE is mandatory and is the name of the table.

WHERE restricts the delete operation.

DELETE FROM addresses

WHERE adrs_id = (SELECT e.adrs_id

                 FROM emp e

                 WHERE e.last_name   = 'DOE'

                   AND e.first_name  = 'JOHN');

DELETE FROM emp  e

WHERE e.last_name   = 'DOE'

  AND e.first_name  = 'JOHN';

If employee John Doe left the company, you probably would want to delete him from the database. One way to accomplish this is to delete the row containing his name from the addresses table and the emp table. In order to find John Doe in the addresses table, you must perform a subquery using the emp table. Therefore, the entry in the emp table has to be the last row to be deleted, or else there would be an orphan row in the addresses table.

DELETE FROM dept

WHERE adrs_id is null;

In this example, all rows in the dept table will be deleted if the corresponding adrs_id is null.

Deletes are permanent! Once the commit has taken place, it is impossible to get the row(s) back apart from issuing an INSERT statement. There is not an undo command available.

Master/Detail or Parent/Child SQL

Any time a SQL statement is constructed with multiple tables, a parent/child relationship is usually in effect.

The user must be familiar with the database schema in use and the corresponding constraints in order to properly join tables. Writing poorly constructed SELECT statements will not harm the database but might decrease the system performance and possibly give a false relationship representation to the users. If there are poorly constructed INSERT, UPDATE, or DELETE statements, the effect could be disastrous.

Before you see any examples, certain assumptions need to be made.

An employee cannot be entered without a department number. This indicates that the emp table is a child of the dept table.
Addresses do not have to be entered when creating a new employee or department. Therefore, the addresses table is optional and is a child of the emp table and a child of the dept table.

If these constraints are enforced in the database, protection would be provided when a parent row is deleted but does not delete the corresponding children.

SELECT d.name                          dept_name,

       d.dept_no                       dept_number,

       e.first_name || e.last_name     emp_name,

       e.job_title                     title,

       e.hire_date                     start_date

FROM dept    d,

     emp     e

WHERE d.dept_no   =  e.dept_no

ORDER BY d.name, e.last_name;

In this example, all the department names and numbers will be displayed (the parent) with all of the corresponding employees (the children) in the departments.

SELECT d.name                                              dept_name,

       d.dept_no                                           dept_number,

       e.first_name || e.last_name                         emp_name,

       e.job_title                                         title,

       e.hire_date                                         start_date,

       DECODE(a.box_number, NULL, a.adrs_1, a.box_number)  address,

       DECODE(a.adrs_2, NULL, NULL, a.adrs_2)              address_2,

       a.city || ', '||a.state ||'   '||a.zip              city_stat_zip

FROM dept       d,

     emp        e,

     addresses  a

WHERE d.dept_no   =  e.dept_no

  AND e.adrs_id   =  a.adrs_id (+)

ORDER BY d.name, e.last_name;

This example shows the addition of the optional child table, called addresses. An outer join, (+) is used so that the employee row will still be retrieved even if there is no address information available yet. The DECODEs will retrieve the box number or adrs 1 depending upon the existence of box number.

When you write INSERTs, UPDATEs, or DELETEs, be careful and make sure the proper relationships exist within the subqueries. If you must manipulate each row from the query, you must use a cursor that is part of the PL/SQL language.

Additional Tips

The following is a summary of some of the information already given in this chapter, along with some new ideas.

When you are comparing date data types, it might be wise to truncate the dates (TRUNC hire_date) to ensure that the time component does not cause erroneous results. If the application enables the insertion of time into the date data types, then the inserted times will also be taken into consideration when manipulating dates.
When writing SQL statements, null values will not be considered by the database unless they are explicitly called for.

SELECT e.first_name || e.last_name      emp_name,

       z.first_name || z.last_name      manager

FROM emp e,

     emp z

WHERE  z.emp_id = e.manager_id;

In this example, all rows for the employee table will be retrieved except for one, the row containing the president. This is because the manager_id for the president is null.

The following example shows how to retrieve the president along with the other employees.

SELECT e.first_name || e.last_name      emp_name,

       z.first_name || z.last_name      manager

FROM emp e,

     emp z

WHERE  z.emp_id = NVL(e.manager_id, e.emp_id);

This code checks for a null manager ID. If the manager ID is null, the database will return the employee ID, which will validate properly and return a row for the president.

Summary.

This chapter covered the major components of the SQL language, placing emphasis on functionality that is frequently used or misunderstood (such as with the DECODE statement). The reader should have sufficient information to venture out and begin writing good SQL code.