Subqueries and Query Expressions - Subqueries and Joins
(Page 6 of 12 )
You may have noticed as you read through this chapter that many of the queries that were written using subqueries could also have been written as multitable queries, or joins. This is often the case, and SQL allows you to write the query either way. This example illustrates the point:
List the names and ages of salespeople who work in offices in the Western region.
SELECT NAME, AGE
FROM SALESREPS
WHERE REP_OFFICE IN (SELECT OFFICE
FROM OFFICES
WHERE REGION = 'Western')
NAME AGE
-------------- ------
Sue Smith 48
Larry Fitch 62
Nancy Angelli 49
This form of the query closely parallels the stated request. The subquery yields a list of offices in the Western region, and the main query finds the salespeople who work in one of the offices in the list. Here is an alternative form of the query, using a two-table join:
List the names and ages of salespeople who work in offices in the Western region.
SELECT NAME, AGE
FROM SALESREPS, OFFICES
WHERE REP_OFFICE = OFFICE
AND REGION = 'Western'
NAME AGE
-------------- -----
| Sue Smith | 48 |
| Larry Fitch | 62 |
| Nancy Angelli | 49 |
This form of the query joins the SALESREPS table to the OFFICES table to find the region where each salesperson works, and then eliminates those who do not work in the Western region.
Either of the two queries will find the correct salespeople, and neither one is right or wrong. Many people will find the first form (with the subquery) more natural, because the English request doesn’t ask for any information about offices, and because it seems a little strange to join the SALESREPS and OFFICES tables to answer the request. Of course if the request is changed to ask for some information from the OFFICES table:
List the names and ages of the salespeople who work in offices in the Western region and the cities where they work.
the subquery form will no longer work, and the two-table query must be used. Conversely, many queries with subqueries cannot be translated into an equivalent join. Here is a simple example:
List the names and ages of salespeople who have above average quotas.
SELECT NAME, AGE
FROM SALESREPS
WHERE QUOTA > (SELECT AVG (QUOTA)
FROM SALESREPS)
NAME AGE
------- ------
Bill Adams 37
Sue Smith 48
Larry Fitch 62
In this case, the inner query is a summary query and the outer query is not, so there is no way the two queries can be combined into a single join.
Nested Subqueries
All of the queries described thus far in this chapter have been two-level queries, involving a main query and a subquery. Just as you can use a subquery inside a main query, you can use a subquery inside another subquery. Here is an example of a request that is naturally represented as a three-level query, with a main query, a subquery, and a sub-subquery:
List the customers whose salespeople are assigned to offices in the Eastern sales region.
SELECT COMPANY
FROM CUSTOMERS
WHERE CUST_REP IN (SELECT EMPL_NUM
FROM SALESREPS
WHERE REP_OFFICE IN (SELECT OFFICE
FROM OFFICES
WHERE REGION = 'Eastern'))
COMPANY
-----------
First Corp.
Smithson Corp.
AAA Investments
JCP Inc.
Chen Associates
QMA Assoc.
Ian & Schmidt
Acme Mfg.
. . .
In this example, the innermost subquery:
SELECT OFFICE
FROM OFFICES
WHERE REGION = 'Eastern'
produces a column containing the office numbers of the offices in the Eastern region. The next subquery:
SELECT EMPL_NUM
FROM SALESREPS
WHERE REP_OFFICE IN (subquery)
produces a column containing the employee numbers of the salespeople who work in one of the selected offices. Finally, the outermost query:
SELECT COMPANY
FROM CUSTOMERS
WHERE CUST_REP IN (subquery)
finds the customers whose salespeople have one of the selected employee numbers.
The same technique used in this three-level query can be used to build queries with four or more levels. The ANSI/ISO SQL standard does not specify a maximum number of nesting levels, but in practice, a query becomes much more time-consuming as the number of levels increases. The query also becomes more difficult to read, understand, and maintain when it involves more than one or two levels of subqueries. Many SQL implementations restrict the number of subquery levels to a relatively small number.
Correlated Subqueries * In concept, SQL performs a subquery over and over again—once for each row of the main query. For many subqueries, however, the subquery produces the same results for every row or row group. Here is an example:
List the sales offices whose sales are below the average target.
SELECT CITY
FROM OFFICES
WHERE SALES < (SELECT AVG(TARGET)
FROM OFFICES)
CITY
---------
Denver
Atlanta
In this query, it would be silly to perform the subquery five times (once for each office). The average target doesn’t change with each office; it’s completely independent of the office currently being tested. As a result, SQL can handle the query by first performing the subquery, yielding the average target ($550,000), and then converting the main query into:
SELECT CITY
FROM OFFICES
WHERE SALES < 550000.00
Commercial SQL implementations automatically detect this situation and use this shortcut whenever possible to reduce the amount of processing required by a subquery. However, the shortcut cannot be used if the subquery contains an outer reference, as in this example:
List all of the offices whose targets exceed the sum of the quotas of the salespeople who work in them:
SELECT CITY
FROM OFFICES
WHERE TARGET > (SELECT SUM(QUOTA)
FROM SALESREPS
WHERE REP_OFFICE = OFFICE)
CITY
------------
Chicago
Los Angeles
For each row of the OFFICES table to be tested by the WHERE clause of the main query, the OFFICE column (which appears in the subquery as an outer reference) has a different value. Thus, SQL has no choice but to carry out this subquery five times—once for each row in the OFFICES table. A subquery containing an outer reference is called a correlated subquery because its results are correlated with each individual row of the main query. For the same reason, an outer reference is sometimes called a correlated reference.
A subquery can contain an outer reference to a table in the FROM clause of any query that contains the subquery, no matter how deeply the subqueries are nested. A column name in a fourth-level subquery, for example, may refer to one of the tables named in the FROM clause of the main query, or to a table named in the FROM clause of the second-level subquery or the third-level subquery that contains it. Regardless of the level of nesting, an outer reference always takes on the value of the column in the current row of the table being tested.
Because a subquery can contain outer references, there is even more potential for ambiguous column names in a subquery than in a main query. When an unqualified column name appears within a subquery, SQL must determine whether it refers to a table in the subquery’s own FROM clause, or to a FROM clause in a query containing the subquery. To minimize the possibility of confusion, SQL always interprets a column reference in a subquery using the nearest FROM clause possible. To illustrate this point, in this example, the same table is used in the query and in the subquery:
List the salespeople who are over 40 and who manage a salesperson over quota.
SELECT NAME
FROM SALESREPS
WHERE AGE > 40
AND EMPL_NUM IN (SELECT MANAGER
FROM SALESREPS
WHERE SALES > QUOTA)
NAME
Sam Clark
Larry Fitch
The MANAGER, QUOTA, and SALES columns in the subquery are references to the
SALESREPS table in the subquery’s own FROM clause; SQL does not interpret them as outer references, and the subquery is not a correlated subquery. SQL can perform the subquery first in this case, finding the salespeople who are over quota and generating a list of the employee numbers of their managers. SQL can then turn its attention to the main query, selecting managers whose employee numbers appear in the generated list.
If you want to use an outer reference within a subquery like the one in the previous example, you must use a table alias to force the outer reference. This request, which adds one more qualifying condition to the previous one, shows how:
List the managers who are over 40 and who manage a salesperson who is over quota and who does not work in the same sales office as the manager.
SELECT NAME
FROM SALESREPS MGRS
WHERE AGE > 40
AND MGRS.EMPL_NUM IN (SELECT MANAGER
FROM SALESREPS EMPS
WHERE EMPS.QUOTA > EMPS.SALES
AND EMPS.REP_OFFICE <> MGRS.REP_OFFICE)
NAME
-------------
Sam Clark
Larry Fitch
The copy of the SALESREPS table used in the main query now has the tag MGRS, and the copy in the subquery has the tag EMPS. The subquery contains one additional search condition, requiring that the employee’s office number does not match that of the manager. The qualified column name MGRS.OFFICE in the subquery is an outer reference, and this subquery is a correlated subquery.
This article is excerpted from MySQL The Complete Reference by Vikram Vaswani (McGraw-Hill, 2003; ISBN 0072224770). Check it out at your favorite bookstore today. Buy this book now. |
Next: Subqueries in the HAVING Clause * >>
More MS SQL Server Articles
More By McGraw-Hill/Osborne