Rob Gonda's Blog

SQL Server 2005 Service Pack 2

Microsoft released service pack 2 today for SQL Server 2005. As usual, this fixes a massive amount of bugs. In addition, customers running Vista must update their databases to be fully supported.. The update allows "unlimited virtual instances" to run on "fully licensed" SQL Server 2005 Enterprise Edition.
There are many other perks included with the update, which I shall try shortly.

Vertica: new RDMS claims to be 100x faster

"Vertica describes its offering as a “grid-enabled, column-oriented relational database management system” that runs on industry standard hardware. It is designed to handle data warehousing, business intelligence, fraud detection and other applications, even in environments with hundreds of terabytes of data. The company says its technology can be used to execute queries 100 times faster than traditional row-oriented relational database management systems"

Vertica's product is in beta testing and the company is inviting those who want to be early adopters to give it a whirl. [full story]

Microsoft SQL Server Database Publishing Wizard

The SQL Server Database Publishing Wizard enables the deployment of SQL Server databases into a hosted environment on either a SQL Server 2000 or 2005 server. It generates a single SQL script file which can be used to recreate a database (both schema and data) in a shared hosting environment where the only connectivity to a server is through a web-based control panel with a script execution window. If supported by the hosting service provider, the Database Publishing Wizard can also directly upload databases to servers located at the shared hosting provider.

Leverage SQL Session at the South Florida CFUG

I will be speaking this coming February 22nd at the South Florida CFUG. For this month I chose a topic that will benefit you regardless of your programming language of preference, and should hopefully allow to you take back something that you can apply immediately.

Topic:      Leverage the power of SQL
Description:     Many developers don't realize the power of SQL to perform data related tasks and computations. Learn how to utilize triggers, stored procedures, constraints, and user-defined-functions to their full potential, and see the huge impact this could have in your organization or day-to-day coding.

Tell your friends.

ColdFusion Vs. SQL UUID

A few days ago I blogged about database level data integrity and promised a follow up concentrating in uuids.

A UUID stands for Universally Unique Identifier. The intent of UUIDs is to enable distributed systems to uniquely identify information without significant central coordination. Thus, anyone can create a UUID and use it to identify something with reasonable confidence that the identifier will never be unintentionally used by anyone for anything else. Information labelled with UUIDs can therefore be later combined into a single database without needing to resolve name conflicts. The most widespread use of this standard is in Microsoft's Globally Unique Identifiers (GUIDs) which implement this standard (source: wikipedia).

A UUID is essentially a 16-byte (128-bit) number. In its canonical form a UUID may look like this:

    xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (8-4-4-4-12)

However, for some reason ColdFusion's UUID looks like

    xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxxxxxx (8-4-4-16)

Microsoft SQL has a native datatype called uniqueidentifier, which represents the 36-characters GUID. Many ColdFusion developers choose not to use the GUID because it cannot be implicitly validated by ColdFusion and it cannot be seamlessly moved to a different database like mysql, postgre, oracle.

The most widely adopted solution is to use a 35-character primary key and insert a ColdFusion UUID, nonetheless, how do you validate a proper uuid at the database level? What if you want the database to generate the primary key? If the key gets altered, it will fail ColdFusions implicit UUID datatype validation.

The solution is to add some constraints in the database level.

Is it really simple to generate a UUID, since all it takes it just to remove the 4th hyphen.

CREATE FUNCTION dbo.newUUID(@GUID varchar(36))
RETURNS varchar(35)
AS
BEGIN
 RETURN left(@GUID, 23) + right(@GUID,12)
END


Note that due to limitations and not being able to invoke a newID() function inside a user defined function, we need to pass the GUID. Now, that said, we can add a default value to our primary keys and let SQL Server generate them for us:

Default Value: dbo.newUUID(newid())


To validate a proper UUID is a little more complicated, since SQL has no native isUUID or isGUID function. I chose to use a regular expression, but guess what? SQL Server 2000 has no regular expression capabilities.

So step one is to create a regular expression evaluator function

CREATE FUNCTION dbo.find_regular_expression
    (
        @source varchar(5000),
        @regexp varchar(1000),
        @ignorecase bit = 0
    )
RETURNS bit
AS
    BEGIN
        DECLARE @hr integer
        DECLARE @objRegExp integer
        DECLARE @objMatches integer
        DECLARE @objMatch integer
        DECLARE @count integer
        DECLARE @results bit
       
        EXEC @hr = sp_OACreate 'VBScript.RegExp', @objRegExp OUTPUT
        IF @hr <> 0 BEGIN
            SET @results = 0
            RETURN @results
        END
        EXEC @hr = sp_OASetProperty @objRegExp, 'Pattern', @regexp
        IF @hr <> 0 BEGIN
            SET @results = 0
            RETURN @results
        END
        EXEC @hr = sp_OASetProperty @objRegExp, 'Global', false
        IF @hr <> 0 BEGIN
            SET @results = 0
            RETURN @results
        END
        EXEC @hr = sp_OASetProperty @objRegExp, 'IgnoreCase', @ignorecase
        IF @hr <> 0 BEGIN
            SET @results = 0
            RETURN @results
        END   
        EXEC @hr = sp_OAMethod @objRegExp, 'Test', @results OUTPUT, @source
        IF @hr <> 0 BEGIN
            SET @results = 0
            RETURN @results
        END
        EXEC @hr = sp_OADestroy @objRegExp
        IF @hr <> 0 BEGIN
            SET @results = 0
            RETURN @results
        END
    RETURN @results
    END


Now that we have this, all we need is the UUID regEx pattern and call this function.

CREATE FUNCTION dbo.isUUID (@uuid varchar(35)) 
RETURNS bit AS 
BEGIN

DECLARE @uuidRegex varchar(50)
SET @uuidRegex = '^[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{16}$'

RETURN dbo.find_regular_expression(@uuid,@uuidRegex ,0)

END


Alright! now we have a isUUID function, which you can easily invoke from everywhere... open a sql script and execute

SELECT [dbo].[isUUID]('D929E4FB-537C-495F-BB3F31B8E42C0FBB')


Now that we tested it and know how it works, all we need is to add a constraint to your primary key:

Open your table in design mode, click constraints, new, and add this line:

([dbo].[isUUID]([ID]) = 1)


where ID is the name of the primary key.

So you learned how to generate a UUID, default your primary key to use one, validate a UUID regEx, and add a constraint to enforce db data integrity.

Database Data Integrity: The Basics

Applications have usually multiple tiers, such as user interface, business model, and data. It is not an uncommon practice to rely on the application level for data validation, since in most cases, it is a single application that interacts with one particular database. However, what would happen if one field got 'corrupt', because of a bug in the application? What would happen if multiple applications accessed the same data, and but one was missing a particular validation rule?

The data is held in a database, and it should be the database that maintained and verified integrity. You must be thinks, of course, you define datatypes and  table relations, but that's not enough. How many times have you -- or your dba -- defined a gender field as a char(1), or a uuid primary key as a char(35)? What makes you think that somehow, let's call it glitch, the gender field couldn't end up with a 'x' value, when you're only accepting 'm' or 'f'? or the id key could end up with '400', when taking only uuids. Chances are in fact, that your application in fact may be validating a uuid datatype, since it's native to ColdFusion, and anything other than a uuid key will break your application.

In fact, and without pointing at anyone, I've downloaded and played with dozens of open-source applications that use a uuid as a primary key, providing a sql schema script that does not validate that a proper uuid is being placed.

What is the solution? Constraints; and they should always be used. A constraint is nothing but a rule that all data have to comply, enforced at the database level. The most common constraints is the foreign key, where the database automatically enforce one column value (fk) to match any column value of a different table (pk), which are automatically created when you declare foreign keys

However, you can also create your own constraints, for example, gender have to be 'm', or 'f'; a credit card expiration have to be between 2005 and 2020, number of children have to be between 0 and 20, and so on. Your datatype can only restrict so much, but you can add additional validation to ensure that if falls into your business logic.

You can add constrains using sql scripts, or using sql GUI.

The following examples are for Microsoft SQL, but should be able to adapt them to any database that supports constraints.

To add a constraint to a gender field with datatype char(1), simple open the table in design mode, click the constraint icon (top toolbar, right icon, shape of a grid) and click the new button. The constraint expression must return a boolean; type ([gender] = 'f' or [gender] = 'm'). Following, you may assign any name to this constraint, commonly prefixed by CK_ for check. (see screen shot)

To add the same constraints by scripting, you may open a script window or the query analyzer and type the following

ALTER TABLE [dbo].[users] ADD
    CONSTRAINT [CK_users_gender] CHECK ([gender] = 'f' or [gender] = 'm')
GO


After this is in place, anytime you try to insert or alter the gender field with any character other than 'f' or 'm', the database will throw an error, which you can catch with the constraint name.

Tomorrow I will post how to validate regular expressions, including UUID datatypes.

SQL: A Case For CROSS JOIN

Sometimes we have to select data from two or more tables to make our result complete. We have to perform a join. It is usually an INNER JOIN or [LEFT | RIGHT | FULL] OUTER JOIN, but SQL also provides a CROSS JOIN ... The CROSS JOIN takes all entries of one table and combine them with all entries of a second table; because of this, it does not allow for an ON clause. There are rare occasions when you would use it, so I decided to illustrate one.

Imagine a schema where you have the following tables: contentKeys, languages, and content. This db allows you to store content in various languages. contentKeys will store the unique keys for content pieces, which after combined with languages, will return a unique piece of content in a particular language. The schema looks as follows:

The content table has a unique contraint for FK_key and FK_language (FK denotes it's a forgeign key).

Now, what if you need to know which keys exist for one language and not for others, or even which keys exist and contain content in no languages at all? We'll build a query to show this information.

The first step is to find all combinations of keys and languages. To do this we need to combine all entries in they contentKeys table with all entries in the languages table.

SELECT * FROM contentKeys CROSS JOIN languages


The next step is understanding OUTER JOINs. An outer join selects all of the records from one database table and only those records in the second table that have matching values in the joined field. In a left outer join, the selected records will include all of the records in the first database table. In a right outer join, the selected records will include all records of the second database table.

That said, if you OUTER JOIN the combination of all possible keys in all possible languages with your content table, the resulting query will let you know which keys have been translated, and which ones have not.

SELECT * from [CROSS-JOINED-QUERY] helper LEFT OUTER JOIN dbo.content
ON helper.pk_language = dbo.content.fk_language AND helper.pk_key = dbo.content.fk_key


We called the cross-joined table 'helper', and this query will return all rows there, matching them to the content table. All the exiting content/language combinations will have data in the content table, and those what do not exist will have null values. You may enter an additional where clause to filter only null values, which will indicate exactly which content keys / language combination are missing.

So for the full query, we'll take advantage of the dynamic table aliasing capabilities of sql and it looks like this:

SELECT helper.content_key, helper.code, helper.[language],
content.pk_content, helper.pk_key, helper.pk_language,
content.content
FROM (SELECT * FROM contentKeys CROSS JOIN languages) helper
    LEFT OUTER JOIN dbo.content
    ON helper.pk_language = dbo.content.fk_language AND helper.pk_key = dbo.content.fk_key
ORDER BY helper.content_key, helper.[language]

SQL Views and Performance

After my last post of SQL, Case sensitivity, and views, Brian Kotek brought up an excellent point, bringing up performance concerns.

It turns out that views are very handy, but not very optimized for performance. Tables are generally indexed. SQL has the ability to index data of particular columns so it doesn't have to deep-scan the data each time you query that table. All primary keys are indexed, but you can create as many additional indexes as you want (when and why for another post).
If you use a view, your columns will not be indexed automatically. With SQL 2000, Microsoft introduced View Indexes; SQL Server View Indexes are dynamic and changes to the data in the base tables are automatically reflected in the indexed view. Your columns will be automatically indexded only if your view complies with certain pre-requisites:

  • Must be created the WITH SCHEMABINDING view option
  • May only refer to base tables in the same database.
  • If there is a GROUP BY clause, the view may not have a HAVING, CUBE, or ROLLUP.
  • May not have an OUTER JOIN clause.
  • May not have a UNION.
  • May not have DISTINCT or TOP clauses
  • May not have full-text predicates such as CONATINSTABLE
  • May not have a ROWSET function such as OPENROWSET
  • May not use derived tables or subqueries.
  • Must be created with ANSI_NULLS ON and QUOTED_IDENTIFIER ON
You can create an index manually like this:

CREATE VIEW OrderDetailsXSB   WITH SCHEMABINDING 
AS
SELECT
OD.OrderID, OD.ProductID, P.ProductName , OD.UnitPrice
, OD.Quantity, OD.Discount
FROM dbo.Products P
INNER JOIN dbo.[Order Details] OD

SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON

GO

CREATE UNIQUE CLUSTERED INDEX [IDX_Order_Details_X]
ON OrderDetailsXSB (OrderID, ProductID
, ProductName, Quantity)
GO

ON P.ProductID = OD.ProductID

For more information on Indexed Views, check out Microsoft's official documentation.

SQL Automated Insert Statements

Generating SQL scripts from ms-sql can sometimes be painful. SQL does a great job in generating sql schema scripts, including tables, views, functions, stored procedures, triggers, constraints, and indexes, but where's my data? SQL DTS (Data transformation services) can help if you're moving the data between databases, but what if you need to send someone a change script? or simply generate install scripts? I can't believe SQL provides no options for that.
Behold, I proses the solution! While researching this problem, I finally found a flexible stored procedure that will generate insert statements.

Check out this list of examples:

Example 1:    To generate INSERT statements for table 'titles':
       
        EXEC sp_generate_inserts 'titles'

Example 2:     To ommit the column list in the INSERT statement: (Column list is included by default)
        IMPORTANT: If you have too many columns, you are advised to ommit column list, as shown below,
        to avoid erroneous results
       
        EXEC sp_generate_inserts 'titles', @include_column_list = 0

Example 3:    To generate INSERT statements for 'titlesCopy' table from 'titles' table:

        EXEC sp_generate_inserts 'titles', 'titlesCopy'

Example 4:    To generate INSERT statements for 'titles' table for only those titles
        which contain the word 'Computer' in them:
        NOTE: Do not complicate the FROM or WHERE clause here. It's assumed that you are good with T-SQL if you are using this parameter

        EXEC sp_generate_inserts 'titles', @from = "from titles where title like '%Computer%'"

Example 5:     To specify that you want to include TIMESTAMP column's data as well in the INSERT statement:
        (By default TIMESTAMP column's data is not scripted)

        EXEC sp_generate_inserts 'titles', @include_timestamp = 1

Example 6:    To print the debug information:
 
        EXEC sp_generate_inserts 'titles', @debug_mode = 1

Example 7:     If you are not the owner of the table, use @owner parameter to specify the owner name
        To use this option, you must have SELECT permissions on that table

        EXEC sp_generate_inserts Nickstable, @owner = 'Nick'

Example 8:     To generate INSERT statements for the rest of the columns excluding images
        When using this otion, DO NOT set @include_column_list parameter to 0.

        EXEC sp_generate_inserts imgtable, @ommit_images = 1

Example 9:     To generate INSERT statements excluding (ommiting) IDENTITY columns:
        (By default IDENTITY columns are included in the INSERT statement)

        EXEC sp_generate_inserts mytable, @ommit_identity = 1

Example 10:     To generate INSERT statements for the TOP 10 rows in the table:
       
        EXEC sp_generate_inserts mytable, @top = 10

Example 11:     To generate INSERT statements with only those columns you want:
       
        EXEC sp_generate_inserts titles, @cols_to_include = "'title','title_id','au_id'"

Example 12:     To generate INSERT statements by omitting certain columns:
       
        EXEC sp_generate_inserts titles, @cols_to_exclude = "'title','title_id','au_id'"

Example 13:    To avoid checking the foreign key constraints while loading data with INSERT statements:
       
        EXEC sp_generate_inserts titles, @disable_constraints = 1

Example 14:     To exclude computed columns from the INSERT statement:
        EXEC sp_generate_inserts MyTable, @ommit_computed_cols = 1


Download here the version for ms-sql 2000 or ms-sql 2005. I am sure there are some nice commercial solutions I should check out, so if you know any good one, please comment below.

Select * from Ben

Ben Forta has been blogging about mssql lately, but we shall thank him since it's useful information.
For example, here's a comparison between temp tables and table variables. I always use table variables -- a lot -- and they're quite useful. I never ran into any of the restrictions Ben mentioned, but it's good to know that if I run into any of them, there is an alternative; for example, I never had to Select Into a table variable, but I do see where it can be extremely handy.

Another nice post mentions how to perform case sensitive searches w/o changing the collation for the entire database. As you may know, you may define different collations when creating a database which will indicate the character set and case sensitivity among other properties, but you can use a different collation in run time; I actually did not know that. Ben shows two examples:

SELECT *
FROM MyTable
WHERE Col3 COLLATE SQL_Latin1_General_CP1_CS_AS LIKE '%foo%'
--- or ---
CRETE VIEW MyTableCS AS
SELECT Col1, Col2, Col3 COLLATE SQL_Latin1_General_CP1_CS_AS as Col3
FROM MyTable

This reminds me that I should blog more about SQL ... you may expect some coming soon.

More Entries

This blog is running version 5.9.003. Contact Blog Owner