Rob Gonda's Blog

Database Data Integrity: The Basics

Applications have usually multiple tiers, such as user interface, business model, and data. It is not an uncommon practice to rely on the application level for data validation, since in most cases, it is a single application that interacts with one particular database. However, what would happen if one field got 'corrupt', because of a bug in the application? What would happen if multiple applications accessed the same data, and but one was missing a particular validation rule?

The data is held in a database, and it should be the database that maintained and verified integrity. You must be thinks, of course, you define datatypes and  table relations, but that's not enough. How many times have you -- or your dba -- defined a gender field as a char(1), or a uuid primary key as a char(35)? What makes you think that somehow, let's call it glitch, the gender field couldn't end up with a 'x' value, when you're only accepting 'm' or 'f'? or the id key could end up with '400', when taking only uuids. Chances are in fact, that your application in fact may be validating a uuid datatype, since it's native to ColdFusion, and anything other than a uuid key will break your application.

In fact, and without pointing at anyone, I've downloaded and played with dozens of open-source applications that use a uuid as a primary key, providing a sql schema script that does not validate that a proper uuid is being placed.

What is the solution? Constraints; and they should always be used. A constraint is nothing but a rule that all data have to comply, enforced at the database level. The most common constraints is the foreign key, where the database automatically enforce one column value (fk) to match any column value of a different table (pk), which are automatically created when you declare foreign keys

However, you can also create your own constraints, for example, gender have to be 'm', or 'f'; a credit card expiration have to be between 2005 and 2020, number of children have to be between 0 and 20, and so on. Your datatype can only restrict so much, but you can add additional validation to ensure that if falls into your business logic.

You can add constrains using sql scripts, or using sql GUI.

The following examples are for Microsoft SQL, but should be able to adapt them to any database that supports constraints.

To add a constraint to a gender field with datatype char(1), simple open the table in design mode, click the constraint icon (top toolbar, right icon, shape of a grid) and click the new button. The constraint expression must return a boolean; type ([gender] = 'f' or [gender] = 'm'). Following, you may assign any name to this constraint, commonly prefixed by CK_ for check. (see screen shot)

To add the same constraints by scripting, you may open a script window or the query analyzer and type the following

ALTER TABLE [dbo].[users] ADD
    CONSTRAINT [CK_users_gender] CHECK ([gender] = 'f' or [gender] = 'm')
GO


After this is in place, anytime you try to insert or alter the gender field with any character other than 'f' or 'm', the database will throw an error, which you can catch with the constraint name.

Tomorrow I will post how to validate regular expressions, including UUID datatypes.

Related Blog Entries

TrackBacks
There are no trackbacks for this entry.

Trackback URL for this entry:
http://www.robgonda.com/blog/trackback.cfm?6515CBC4-3048-7431-E41F34372A3E2F86

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
This blog is running version 5.9.003. Contact Blog Owner