Microsoft Excel is one the most versatile spreadsheet applications available to any office and engineering work. Amateur and novice users of this application are just content to make tables, perform some basic addition/subtraction and make minor use of basic Excel functions such as SUM.

Sadly, this continues to be true even if it takes them a WHOLE DAY to analyze gigantic spreadsheets, consisting of enormous amounts of data. The usual reasons given are the lack of skills needed to manipulate large amounts of information (called a "database") in a short span of time.

This is very a common problem in financial, engineering, academic and even in IT sectors. Working with a database in Excel is a daunting task at first; you can only be efficient if you know how to maximize the use of Excel database functions. Database functions in Excel are one of the most overlooked functions for both amateur and novice users.

This tutorial will focus on how to set up your data to comply with Excel database requirements, as well as how to analyze and maximize the information you get out of it in the easiest way possible. This will significantly increase your office productivity.

{mospagebreak title=What is a database in Microsoft Excel?}

People seem to be afraid to learn when the word "database" appears. Databases are always associated with technical complexity, but the first step to learning about them is to understand their definition.

We hear about "databases" everywhere. You can read about them whenever you learn about web programming. Database programs include MySQL, MS Access, Oracle, etc. But the real meaning of database is not that complex as it should be.

In its simplest terms, a database is just a table. If you are creating a table in Excel, you are starting to create a database. But you need to follow three basic requirements to call it a "database:"

**First Requirement**: It should have a "Fieldname" as the first row.

**Second Requirement**: The data must be inserted as "Rows" and will be called "Records." The total number of rows should be greater than one to call it a "database."

**Third Requirement**: The database should consist of several columns called "Fields" and should be "relational" in nature.

Below is a sample screen shot of a relational database table:

The field names are the following:

- Person
- Height
- Age
- Weight
- IQ

The records are the data shown in rows. Finally, in the example, there is a field that corresponds to the five columns.

This is the standard database definition. It’s even true in MySQL, the most popular open source database used in connection with dynamic websites.

In MS Excel, there are 12 database functions that can be used for both basic and advanced database manipulation. They are:

- DCOUNT
- DCOUNTA
- DGET
- DMAX
- DMIN
- DSUM
- DPRODUCT
- DAVERAGE
- DSTDEV
- DSTDEVP
- DVAR
- DVARP

{mospagebreak title=Syntax for database functions and rules for criteria tables}

As with any other Excel functions, there is syntax that needs to be properly understood and followed in order for Excel to process those data correctly. We all know that if we are not able to use correct syntax, it can lead to errors or inaccurate results in the spreadsheet.

The database function syntax is this:

[database function](database,field,criteria)

It is impossible to understand without illustrating the syntax using an actual Excel table.

For the sake of illustration, please refer to the two tables below. The first one is called "criteria table" and the other table (below the criteria table) is the actual database to be analyzed:

Here are some rules that cover "Criteria Tables."

- Criteria tables should always be above the database.
- Criteria should be treated as an Excel table, NOT a database.
- If you need to query the database, you should use the criteria.
- Criteria tables are REQUIRED to query the database.
- Criteria tables should be adjusted each time you need to make a unique query to the database. (see examples later)
- The column names used in the criteria table should be exactly the same as the database field names (including casing and format).
- The number of columns in criteria tables may not be equal to the number of columns used by the database.
- The first row in any criteria table should be the field names.

{mospagebreak title=Database function examples}

The screen shot in the previous section uses a database extracted from a manufacturing line. It shows various information, such as the % Yield, production numbers, input and output quantity of each production lot.

Let us illustrate how to use those the most important database functions using actual data.

FIRST EXAMPLE:

DCOUNT(database,field,criteria)

Using the criteria table given in the screen shot, let say we will count the total number of TYH-1MN3 and TUP-2MN4 production lots with production number above 1200:

=DCOUNT(A6:E17,"Production number",A1:B3)

This is equal to 5. Bear in mind that the database is being represented by A6:E17 while the criteria table should query A1:B3. The field is "Production number," since we are interested in finding the number of lots.

You should punch in the formula below the database, not above or in between criteria tables. This should avoid any silly mistakes. There is a sample Excel spreadsheet that you can download at the end of this tutorial containing all the illustrated examples.

SECOND EXAMPLE:

What if you need to compute the average yield of NVB-9MN1 product, with input quantity above 4550 but less than 5000?

This problem is impossible to solve in less than five minutes with thousands of records using traditional Excel functions and manual data manipulation. But if you use database functions, this can be solved with only one function using the criteria table (revising the criteria table and formulating the database function query):

By following the rules of "Criteria Tables," we can creatively modify to fit our query. We use two cells for one field name, "Input Quantity," so that we can place those conditions in it (>4550 and <5000)

Then the Excel database query should be:

=DAVERAGE(A6:E17,"Yield",A1:E2)

If you do this correctly the answer to the query is: 98.2%

So far the only functions we’ve discussed are DCOUNT and DAVERAGE.

THIRD EXAMPLE:

This example illustrates the use of DSUM to query the database and to sum up the numerical data.

Okay, using the same criteria table on the latest screen shot, our task is to find the total output quantity of NVB-9MN1 product, with input quantity above 4550 but less than 5000.

The proper database query should be:

=DSUM(A6:E17,"Output Quantity",A1:E2)

and the resulting sum is : 9417

FOURTH EXAMPLE:

So far the examples above are simple, since they only use one database function to produce the result In this last example, I will show you how to combine database functions to arrive at a result.

Consider this complex query: what if you need to compute the average yield of NVB-9MN1, TYH-1MN3 and TUP-2MN4 products, with input quantity above 4550 but less than 5000?

The criteria tables will be adjusted as you see below:

We can compute the average yield by first counting the number of valid events, as well as the sum that falls within the criteria:

=(DCOUNT(A9:E20,"Input Quantity",A1:E2))+(DCOUNT(A9:E20,"Input Quantity",A3:E4))+(DCOUNT(A9:E20,"Input Quantity",A5:E6))

then for the sum:

‘=(DSUM(A9:E20,"Yield",A1:E2))+(DSUM(A9:E20,"Yield",A3:E4))+(DSUM(A9:E20,"Yield",A5:E6))

Now that we have two values, Sum and the Number of events, their average can then be computed as follows:

Average = Sum/ Number of Events

Then you can compute the average yield to be: 96.6%

Here is a sample Excel workbook containing all of the illustrated examples:

http://www.php-developer.org/wp-content/uploads/tutorials/Excel database functions sample sheets.xls

very simple presentation but very effective tool

Very helpful and the last example can be greatly simplified. The row headings do not have to be repeated for each criteria. With the extra headings removed the database again starts at row 6.

And the formula can be combined — =dcount(a6:e17,”Input Quantity”,a1:e4). And a similar version can be used for dsum.