Referential Integrity Explained: Bonding Tables Together
Referential integrity is important to any database design. Why is it so important, and how does it work? Alf Pedersen defines the phrase and spells out why you wouldn't want to build a database without it.
Contributed by Alf Pedersen Rating: / 14 November 17, 2004
In an ordinary database design, referential integrity is a key component. Have you ever wondered what this term really means? I have. Let's explore Referential Integrity together. But first, get yourself a cup of coffee, as we need to work a little on this topic.
First, let's break up this phrase. This is always a smart thing to do; if you can't see it all at once, break it down until you can study a small part of the problem which you can see.
Looking up the word referential in a dictionary yields an explanation something like this:
Ref`er*en"tial: Containing a reference; pointing to something out of itself;
Hmm, something outside itself...not contained inside, but outside. Now, let us check our integrity: A new lookup gives us the following:
The state of being unimpaired; soundness. The quality or condition of being whole or undivided; completeness.
I really like that one. That means something is not broken, but sound and complete.
Referential integrity means just that pointers from something to something outside itself are sound and unbroken.
In database design, the term referential integrity simply means that if a row in a table has a pointer to a row in another table, the row in the table that is pointed at, must be sound (exist). Put another way, you should not remove a row which contains information that other row(s) depend on.
The following example will further clarify this. Take this simple server model:
This is a model that should be easy to understand: We have two tables, TRANSACTIONS and ACCOUNTS. For a given account, there may be one or many (or no) transactions. Sounds fair enough: if you are like me, your bank account has far to many transactions each month. OK, I'll be serious again.
Let's look at it once more. Let's say that the above model is for a bank with lots of accounts and transactions. What if we just deleted all (or just one) accounts?
We would end up with a lot of transactions, but if we wanted to check out which name each transaction belonged to, we couldn't tell! Referential integrity is broken; it is no longer, as defined above, sound and unbroken.
It goes without saying that this is a very unhealthy situation for an information system that is relying upon a good database design.
Referential integrity is a dead serious concept. From the little example above, I'm certain that you can see the dangers of breaking referential integrity. But do you also notice the danger of changing the value of the primary key in ACCOUNTS? Giving an account a new number will just as effectively break the rule of referential integrity as deleting the very same row. That is why it is so important to be absolutely sure when you choose your primary key for ANY table.
Come to think of it, we are now talking about the subject of another article on my site: Candidate Keys. I consider that a very important article, and it has been published on many prominent sites on the Net. (That does NOT imply that my site is not prominent...). However, the original is on this site.
Now let's return to the real subject: How can we enforce referential integrity, or put another way, how can we be sure that referential integrity is maintained?
A long, long time ago, before the database world had really powerful products, referential integrity was programmed in each application SW piece. Image all the possible sources for bugs popping up all over the place!
A relational database system supports the enforcement of referential integrity as constraints: special definitions done by the DDL (Data Definition Language) part of the database. Let us look at how these constraints can be defined by using Oracle's DDL (I have the full script also for SQL Server in the Database Normalization eBook). The constraint definitions look like this:
The first two ALTER-statements are establishing the primary key for each table. The third statement is the important one in our context: it tells Oracle that the ACCOUNT_NO in TRANSACTIONS is referencing the primary key ACCOUNT_NO in ACCOUNTS, and that Oracle should see to it that any references from TRANSACTIONS to a given account in ACCOUNTS should never be broken. And, if you delete a row in ACCOUNTS that has TRANSACTIONS, then that's exactly what you are trying to do!
Actually, the last constraint definition also establishes ACCOUNT_NO in TRANSACTIONS as a foreign key for ACCOUNTS. That is really most of the difference between primary and foreign keys: a primary key in one table becomes a foreign key in tables it has a mother-child relation to. But that's a different article for some day to come...
Let us take this a bit further. The following example shows you the two tables and which rows they contain (for the TRANSACTIONS table I have selected only the interesting columns). Now, look at the end of it: I try to delete the row in ACCOUNTS, and I promptly (and correctly) get the error message.
Isn't that nice? The database is actually stopping me, and preventing me from breaking the referential integrity between ACCOUNTS and TRANSACTIONS. And rightfully so.
The next question then arises: how does your application detect this? The answer is: by comprehensive error detection procedures. Yes, procedures: stored procedures that throw exceptions when an error occurs. That way, no matter what application is running, it bumps into constraint control, error processing logic and error handling that is defined outside the application itself.
However, that is also the subject of a completely different article. For now, I hope you have gotten a few ideas for your own database design and application development, as well as I hope that the "mystery" of referential integrity is unveiled.
One final note: If you are wondering how to create database functions and procedures, check out this: Ref cursors in stored functions and calling a ref cursor from Java.