Tuesday, December 18, 2007

Passing a Table to A Stored Procedure

Introduction
Most of us would be very specific in designing the database code (Stored procedures, functions, views etc) in a re-usable and manageable manner. It is particularly important when the application is large. Code for common functionalities should be identified and moved to a function that can be called from different parts of the application. The same may be done with views, stored procedures etc. Designing the code in such a manner increases the manageability of the code as well as provides greater re-usability, and thus better productivity and lesser bugs.
Some times, while attempting to achieve the above, we would come across certain hurdles due to the limitations of TSQL. At times we feel that TSQL does not really give us enough freedom like other application development platforms. In this article, I am trying to present such a case where a re-usable function is created to which a table can be passed as an argument.
The Problem
Let us say, we are working on an Inventory Management System. When a transaction (sales order, invoice, receipt of goods, inventory adjustment etc) takes place, we need to update the available inventory of the items affected by the transaction. We already have a stored procedure to save/update each transaction. Each of those stored procedures needs to update the inventory of all the items affected by the current transaction.
Please note that, the word 'Transaction' above, does not refer to Database Transactions. They refer to the various Inventory Operations supported by the application.
Since the inventory needs to be updated from different places, it makes sense to move that part of the code to a separate stored procedure. Then this new stored procedure needs to be called from different places from where the inventory is to be updated. So far it looks simple. But the difficult part is to pass the items to be updated.
A TABLE variable would look to be the ideal solution. If we could pass a TABLE variable containing the list of items to be updated, then the complexity can be reduced to a great extend. But SQL Server does not allow to pass a TABLE variable as a parameter to a stored procedure. So what is the next option?
In this article, I am trying to present a solution to the above scenario by using XML as the format to pass a table to a stored procedure. The CALLER can transform the table (Query result) to an XML variable and pass to the stored procedure. The CALLEE can either convert the XML parameter back to a TABLE variable or directly use XQuery on the XML variable.
The Caller
The CALLER should transform the table to an XML variable. The DATA may come from a table or a query. The following example shows how to create an XML variable from the results of a query.
1 /*
2 Let us first create sample table.
3 */
4
5 CREATE TABLE [dbo].[OrderDetails](
6 [OrderDetailID] [int] IDENTITY(1,1) NOT NULL,
7 [ItemNumber] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
8 [Qty] [int] NULL
9 ) ON [PRIMARY]
10
11 /*
12 Populate the sample table with values
13 */
14 INSERT INTO OrderDetails(ItemNumber, Qty)
15 SELECT 'A001', 10
16 UNION SELECT 'A002', 20
17 UNION SELECT 'A003', 30
18 /*
19 The query below returns the results in XML format.
20 */
21
22 SELECT ItemNumber, Qty FROM OrderDetails FOR XML RAW('item'), ROOT('items')
23
24 /*
25 OUTPUT:
26
27
28
29
30
31

32 */In the article
Advanced XML Processing - II,

I have presented a few detailed examples which demonstrate the different ways to generate and format query results as XML.
Now, let us assign the resultant XML value to an XML variable. [code]

1 -- Declare the variable
2 DECLARE @x XML
3
4 -- store the results of the Query to XML variable
5 SET @x = (SELECT ItemNumber, Qty FROM OrderDetails FOR XML RAW('item'), ROOT('items'), TYPE)
6
7 -- select the values from the XML variable (to make sure that we did it correctly)
8 SELECT
9 x.item.value('@ItemNumber[1]', 'VARCHAR(20)') AS ItemNumber,
10 x.item.value('@Qty[1]', 'INT') AS Qty
11 FROM @x.nodes('//items/item') AS x(item)

At this stage, we have an XML variable ready, which we could pass to a child procedure/function. The XML variable contains the values that we want the child procedure/function to process/update. The child procedure can either transform the XML variable back to a TABLE or it can directly read the values from the XML variable.
The Callee
So far, we have seen how to create an XML variable from the results of a query. This XML variable can be passed to another stored procedure which can update the inventory data based on the item information passed to the procedure. The simplest way is to create a wrapper view around the XML variable and use it as if it is a table.
Let us create another sample table, Inventory, which will be updated with the information passed through the XML parameter. The following script will create the sample table.

1 CREATE TABLE [dbo].[Inventory](
2 [InventoryID] [int] IDENTITY(1,1) NOT NULL,
3 [ItemNumber] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
4 [Stock] [int] NULL
5 ) ON [PRIMARY]
6
7 INSERT INTO Inventory (ItemNumber, Stock)
8 SELECT 'A001', 0
9 UNION SELECT 'A002', 0
10 UNION SELECT 'A003', 0


The following sample code shows the implementation needed at the side of the 'callee'.

1 CREATE PROCEDURE [dbo].[UpdateInventory1]
2 (
3 @x XML
4 )
5 AS
6
7 SET NOCOUNT ON
8
9 /*
10 The code below creates a wrapper view around the XML variable and updates the
11 "inventory" table with the information.
12 */
13
14 UPDATE Inventory SET
15 stock = stock + v.Qty
16 FROM Inventory inv
17 INNER JOIN (
18 SELECT
19 x.item.value('@ItemNumber[1]','varchar(20)') AS ItemNumber,
20 x.item.value('@Qty[1]','INT') AS Qty
21 FROM @x.nodes('//items/item') AS x(item)
22 ) v ON (v.ItemNumber = inv.ItemNumber)
23
24 RETURN


Execute

Let us execute the procedure now. Run the following code.
1 -- Declare the variable
2 DECLARE @x XML
3
4 -- store the results of the Query to XML variable
5 SET @x = (SELECT ItemNumber, Qty FROM OrderDetails FOR XML RAW('item'), ROOT('items'), TYPE)
6
7 -- execute the stored procedure
8 EXECUTE UpdateInventory1 @x
9
10 -- review the results
11 SELECT * FROM inventory

Updated Procedure

The sample code above, creates a wrapper view around the XML variable. This is a pretty simple and straight-forward approach. You could still access the values as if it is coming from a table/view. The complexity of XML processing is absorbed in the inner view.
The example below, demonstrates another syntax, which updates the table directly from the XML variable.

1 CREATE PROCEDURE [dbo].[UpdateInventory2]
2 (
3 @x XML
4 )
5 AS
6
7 SET NOCOUNT ON
8
9 /*
10 This version of the stored procedure has a slightly enhanced version of the
11 TSQL code. This version updates the table directly from the XML variable,
12 rather than converting the XML data to a view.
13 */
14
15 UPDATE Inventory SET
16 stock = stock + x.item.value('@Qty[1]','INT')
17 FROM Inventory inv
18 INNER JOIN @x.nodes('//items/item') x(item) ON
19 (x.item.value('@ItemNumber[1]','varchar(20)') = inv.ItemNumber)
20
21 RETURN

Conclusions
In the past few years, several times I came across the situation where I needed a way to pass a table to a function or stored procedure. Since SQL Server does not allow to pass a TABLE variable to a function or stored procedure, the only way I could make it work is by using the approach presented above. There may be other ways to get this done too. It is apparent that there will be a small performance penalty by doing this. I did not do extensive tests to see if there is a performance problem. I did not notice any performance issues in my applications so far.

Login creation in SQL Server 2005 for SQL Server Authentication Mode

Introduction

There are two type of authentication mode in SQL Server.
1. Windows Authentication Mode
2. SQL Server and Windows Authentication Mode

Before we start proceeding, few questions come in my mind. They are as follow:

What is the mean of authentication?
Authentication means identify the visitor/user has privilege or rights to access this or not.

Why we need authentication?
We need authentication to prevent our information/secrets from anonymous user access. We can limit the visitor/user access by apply the different authentication rules.

Many people don't know the difference between Windows Authentication Mode and SQL Sever and Windows Authentication Mode (Mixed Mode). They often don't know how to create and work with them.

In Windows Authentication Mode only Windows User (Operating System User) can access the SQL Server.


In SQL Server and Windows Authentication Mode both Windows (Operating System User) and SQL Server User can access SQL Server.

Description
In this article, I am going to explain how to create SQL Server User/Login and work with it.


Steps to create and use User/Login in SQL Server 2005

Open SQL Server 2005



Select Windows Authentication Mode from Authentication drop down and press Connect button. It will change like this.


Right click on SQL Server name in your computer e.g. in this "techniqu-6ded0c" and click on Properties. It will change like this.


After clicking on Properties this screen will open


Click on Security and it will changes like this


In this screen click on SQL Server and Windows Authentication Mode and press OK button. Now, both Windows and SQL Server User/Login will be able to access the SQL Server.
Note: If you leave the Windows Authentication Mode as it is then SQL Server User/Login will not be able to access the SQL Server.

After this SQL Server 2005 restarts and come in same state as we open it. Here the first phase completesand now lets move to the second phase.

Expand Scurity folder from left panel, right click on Login folder and click on New.This screen will open.


Enter Login Name
Click on SQL Server Authentication
Enter both Passwords
Uncheck checkbox of Enforce password policy (If you want to use this then give strong/complex password)
When you complete this click on Server Roles and it will change like this.


Here you can define roles of this user. Tick on check box of sysadmin or tick check boxes according to your requirement of user roles. When completes click on User Mapping and it will change like this.



Set each database (master, model and etc.) role as db_owner and public only or tick check boxes according to your requirement of user role. When role completes click on Status and it will change like this.


Click on Grant and Enabled only and press OK button.You will receive a User/Login created successfully message. If it returns an error than you are missing some thing in these steps.

Close the SQL Server and open it again and now use this newly created User/Login.


This is how login create and works in SQL Server 2005.


Conclusion

Sometime graphical representation of things, ideas or process gives the better description and elaboration of requirements

Thursday, November 22, 2007

Difference between Truncate and Delete

Introduction
Hi to all, and here I am with a new article. The topic of this article is "The Difference between Truncate and Delete".
Truncate and Delete both are used to delete data from the table. Both these commands will only delete the data of the specified table; they cannot remove the whole table: data along with structure. Now it's ok that both the SQL statements are used to delete only the data from the table but they both differ from each other in many aspects like syntax, performance, resource uses, etc., so let's take a look of both of these commands.
Truncate
The TRUNCATE command in SQL removes all rows from a table without logging the individual row deletions in the transaction log. The TRUNCATE statement has the sane functionality as the DELETE statement has in that it deletes the data from the table without modifying or deleting the structure of the table, however you can't use the WHERE Clause with the TRUNCATE statement.
The Syntax for this statement is:TRUNCATE TABLE [ { database_name.[ schema_name ]. schema_name . } ] table_name
Table_name : Is the name of the table to truncate or from which all rows are removed.To execute it, use a query like the one below:TRUNCATE TABLE authors
The above command will delete all data from the table author.
Delete
The DELETE command in SQL also removes rows from a table, but it logs the individual row deletions in the transaction log. You can also use the WHERE Clause with the DELETE statement to qualify which rows are to be deleted.
Here I am showing just the simple syntax. For a more detailed explanation of the DELETE syntax, visit this link: DELETE (Transact-SQL)
Syntax:DELETE FROM TABLE_NAME[ { database_name.[ schema_name ]. schema_name . } ] table_name
Database_name: Is the name of the database in which the table exists. This is optional. If it is not included, the current database context is assumed.
Schema_name: Is the name of the schema in which the table exists. This is optional. If it is not included, the current database context is assumed.
Table_name : Is the name of the table to truncate or from which all rows are removed.A simple command looks like this query:DELETE FROM authors
The above query will delete all data from the table author.
In DELETE statements you can limit your DELETE query using the WHERE clause to delete only particular records that fulfill the condition of the WHERE clause. In this case, only those records matching the WHERE clause will be deleted, not the all records. A limited DELETE query is shown below.DELETE FROM authors Where AuthorId IN (1,2,3)
This statement only deletes rows from the authors table which have the author ids of 1, 2, or 3.
The Differences between Truncate and Delete
Now above you have seen the DELETE and TRUNCATE statements. Both the statements are similar, but there are many differences that exist between them. Those similarities and differences are explaned below:
TRUNCATE and DELETE remove the data not the structureBoth commands remove rows from a table, but the table structure and its columns, constraints, indexes, and so on remain. To remove the table definition in addition to its data, use the DROP TABLE statement.
Conditional based deletion of dataConditional based deletion of data means that not all rows are deleted. Let's suppose I have a table authors and from this table I want to delete the authors that are living in Australia. Let's examine what our options for doing this with each command.
TRUNCATE - In case of the TRUNCATE command we can't perform the conditional based deletion because there is no WHERE clause allowed with this command.
DELETE - THe DELETE command provides the functionality of conditional based deletion of data from the table using the WHERE clause.
Delete and Truncate both are logged operations:
On most of the articles I have read on the Internet, I have seen this written: "delete is a logged operation and truncate is not a logged operation", which means when we run the delete command it logs (records) the information about the deleted rows and when we run the truncate command it doesn't log any data. But this is not true; truncate is also a logged operation but in a different way. It uses fewer system and transaction log resources than delete. The TRUNCATE command uses minimum logging resources, which is why it is faster than delete. So both delete and truncate are logged operations, but they work differently as shown below.
DELETE is a logged operation on a per row basis. The DELETE statement removes rows one at a time and records an entry in the transaction log for each deleted row. So, in case if you are deleting a huge number of records then it can cause your transaction log to grow. This means the deletion of a huge number of records will use more server resources as it logs each and every row that is deleted. That is why your transaction log will grow very rapidly. Since the delete statement records each deleted row it is also slow. Some people ask that if this is done for each row then why does not Microsoft modify the delete statement to not record each deleted row??? The answer is when you run your databases in full recovery mode, detailed logging is necessary for SQL Server to be able to recover your database to the most recent state.
TRUNCATE logs the deallocation of the data pages in which the data exists. TRUNCATE is faster than DELETE due to the way TRUNCATE "removes" rows from the table. It won't log the deletion of each row; instead it logs the deallocation of the data pages of the table. The TRUNCATE statement removes the data by deallocating the data pages used to store the table data and records only the page deallocation in the transaction log. Actually, TRUNCATE does not remove data, but rather deallocates whole data pages and removes pointers to indexes. The data still exists until it is overwritten or the database is shrunk. This action does not require a lot of resources and is therefore very fast. It is a common mistake to think that TRUNCATE is not logged. This is wrong. The deallocation of the data pages is recorded in the log file. Therefore, "Books Online (BOL)" refers to TRUNCATE operations as "minimally logged" operations. You can use TRUNCATE within a transaction, and when this transaction is rolled-back, the data pages are reallocated again and the database is again in its original, consistent state.
To be more specific lets take a look of an example, which will tell you that truncate is also a logged operation. So take a look of following example./*Create a dummy table in a non-production(dummy) database.*/
CREATE TABLE tranTest
(
Id int Identity(1,1),
Name Varchar(100)
)
/*Now insert the records in the tranTest table.*/
INSERT INTO tranTest(Name) VALUES('prashant')
INSERT INTO tranTest(Name) VALUES('prateek')
INSERT INTO tranTest(Name) VALUES('praveen')
INSERT INTO tranTest(Name) VALUES('prakash')
INSERT INTO tranTest(Name) VALUES('prabhat')
/*Then as you know delete is a looged operation that means
in a transaction if we rollback the transaction after deleting
the records from table, it will restore all deleted records.*/
BEGIN TRAN
DELETE FROM tranTest
SELECT * FROM tranTest
ROLLBACK
SELECT * FROM tranTest
/*Now delete all records from the table.*/
DELETE FROM tranTest
/*And Insert new fresh records in the table.*/
INSERT INTO tranTest(Name) VALUES('prashant')
INSERT INTO tranTest(Name) VALUES('prateek')
INSERT INTO tranTest(Name) VALUES('praveen')
INSERT INTO tranTest(Name) VALUES('prakash')
INSERT INTO tranTest(Name) VALUES('prabhat')
/*Then as you now you know that Truncate is also a logged
opertion so it must restore all records that are deleted from
the tranTest table in the below tansaction*/
BEGIN TRAN
TRUNCATE TABLE tranTest
SELECT * FROM tranTest
ROLLBACK
SELECT * FROM tranTest
So when we run the above command and TRUNCATE also restores all the records that means somewhere the logging of the TRUNCATE operation is also being done, but with minimal resources. Here I think minimal resources means you can restore the Truncate statements for that particular session but if you close the connection then i think it will not able to restore your data. BUT Delete will be able to restore your data later also.
Behavior of Delete and Truncate for identity columnsOK, now the case of identity columns. Both the TRUNCATE and DELETE commands behave differently against Identity columns. When we use truncate it will reset the counter used by an identity column for new rows to the seed value defined for the column. But in the case of DELETE it will not reset the counter of your identity column. Rather it maintains the same counter for new rows. In both the cases, if no seed was defined the default value 1 is used. As TRUNCATE resets the identity column counter, in the case where you want to retain the identity counter, use DELETE instead of TRUNCATE.
Why do these two behave differently? I don't know, but T-SQL is providing you the two ways you can use as needed. In the case where you want all the data from the table deleted and the counter will restarting from 1, then truncate can help you. If you want to delete all the records but don't want to reset you counter, then delete is there for you.
An example for this is here. :/* First if you already created the tranTest table then delete it from your database.*/
DROP Table tranTest
/* Then Create a dummy table in a non-production(dummy) database with an Identity column
*/
CREATE TABLE tranTest
(
Id int Identity(1,1),
Name Varchar(100)
)
/* Now insert the records in the tranTest table. */
INSERT INTO tranTest(Name) VALUES('prashant')
INSERT INTO tranTest(Name) VALUES('prateek')
INSERT INTO tranTest(Name) VALUES('praveen')
INSERT INTO tranTest(Name) VALUES('prakash')
INSERT INTO tranTest(Name) VALUES('prabhat')
/* If you run the below querry you'll see that the
max value of the identity column is 5 caz we have
inserted only five records in this table. */
SELECT * FROM tranTest
/* Now delete all the records from the table using the delete command. */
DELETE FROM tranTest
/* Now by running the above command your table is empty,
so insert the new 5 records in the table to see that after
using the delete command from where identity will tart, fom 1 or from 6. */
INSERT INTO tranTest(Name) VALUES('prashant')
INSERT INTO tranTest(Name) VALUES('prateek')
INSERT INTO tranTest(Name) VALUES('praveen')
INSERT INTO tranTest(Name) VALUES('prakash')
INSERT INTO tranTest(Name) VALUES('prabhat')
/* After running the below query u'll see that your identity now
starts from 6 because delete do not resets the counter of your
identity column */
SELECT * FROM tranTest
/* Now drop your table again and create it again. */
DROP TABLE tranTest

/* Create a dummy table again in a non-production(dummy) database
to see the effect of truncate command on identity columns */
CREATE TABLE tranTest
(
Id int Identity(1,1),
Name Varchar(100)
)
/* And Insert new fresh records in teh table. */
INSERT INTO tranTest(Name) VALUES('prashant')
INSERT INTO tranTest(Name) VALUES('prateek')
INSERT INTO tranTest(Name) VALUES('praveen')
INSERT INTO tranTest(Name) VALUES('prakash')
INSERT INTO tranTest(Name) VALUES('prabhat')
/* Now at this point the counter of this table's
identity column is 5 taht is the max value of id column */
SELECT * FROM tranTest
/* Then truncate the table. */
TRUNCATE TABLE tranTest
/* and insert new records */
INSERT INTO tranTest(Name) VALUES('prashant')
INSERT INTO tranTest(Name) VALUES('prateek')
INSERT INTO tranTest(Name) VALUES('praveen')
INSERT INTO tranTest(Name) VALUES('prakash')
INSERT INTO tranTest(Name) VALUES('prabhat')
/* Now you'll see that after truncating the table the
identity is reset to its seed value. */
SELECT * FROM tranTest
/* So this example explains the beahviour of both of these command for Identity columns. */
TRUNCATE is a DDL command whereas DELETE is a DML commandThis is also a common difference you might have read in many articles. That is TRUNCATE is a DDL (data definition language) operation and DELETE is a DML (data manipulation language) operation. Yes according to SQL server it's true. But why it is so, why is TRUNCATE DDL and DELETE DML? Let's look at this;
When we run the TRUNCATE command it puts a "Schema modification (Sch-M)" lock on the table. What is "schema modification (Sch-M)"?
The Database Engine uses schema modification (Sch-M) locks during a table data definition language (DDL) operation, such as adding a column or dropping a table. During the time that it is held, the Sch-M lock prevents concurrent access to the table. This means the Sch-M lock blocks all outside operations until the lock is released. Now you'll ask how it blocks any modification to the table when in the case of TRUNCATE we are performing modifications because we are deleting data? But deleting the data is the one side of coin only. What we see with the internal workings of truncate is because as you read above, that it doesn't remove the data. Rather it deallocates the data pages. Because TRUNCATE doesn't perform any data modification in the table that is why the DELETE TRIGGER is not called. I think we are not modifying the data of the table, BUT as you know TRUNCATE resets the Identity counter of the column in the table, which means the TRUNCATE is modifying the table definition or structure, which comes under the DDL operations. Also when you are truncating a table, you can't modify or add any data to the table. So, to become a DDL operation you have to fulfill some of the conditions written below:
Modifying a table structure or definition comes under DDL operations, and
When you are modifying the table structure, you can't access the table to do any data modification.
Since TRUNCATE is doing all the activities above, that proves that TRUNCATE is a DDL operation.
Now we move to the DELETE command. In case of the DELETE command I am not sure which lock is implemented, but as we know and you can read above that DELETE command deletes the rows one by one. It is modifying the data by deleting it from the table, and because DELETE performs data modifications that is why the DELETE TRIGGER is called. The DELETE command does not modify the table structure in any manner, such as like how TRUNCATE modifies the identity column by resetting its value.
To become a DML operation you have to fulfill some of the conditions written below:
Modifying the table data.
When you are modifying the table data in the mean time you can't perform any table structure modification on the table.
Here the DELETE command is modifying the data of the table and also when delete statement is running you can't modify the table structure. So we can say that DELETE is a DML operation.
Behavior of Truncate and Delete for TriggersTriggers are important topic in SQL Server, and here I am talking about how both TRUNCATE and DELETE behave differently for Triggers. As you all know triggers fire whenever any data modification happens in the table. In case of TRUNCATE and DELETE, because they are deleting data from the table, the DELETE TRIGGER will fire if present and also the INSTEAD OF and AFTER triggers can be triggered if present. The INSERT and UPDATE triggers will not be fired here.
So let's take both one by one:
TRUNCATE - When we run the TRUNCATE command to delete all the rows of a table it actually doesn't remove any row rather it deallocates the data pages. So in the case of the TRUNCATE command, triggers will not be fired because here no modification takes place. As we know that TRUNCATE is a DDL command, and DDL commands doesn't modify your data; instead they modify your table structure and definition.
DELETE - In case of DELETE the DELETE trigger will be fired if present and also if the INSTEAD OF and AFTER triggers for DELETE TRIGGER is present then they also will be fired. As we know that delete command is a DML command and it deletes the data on row-by-row basis. So that means delete is modifying the data by deleting it from the table. As we know delete is a DML command and trigger will be fired whenever any DML operation on the table takes place.
Where we can use these statementsThere are some restrictions on the use of both of these statements as follows:
For Delete
The DELETE statement may fail if it violates a trigger or tries to remove a row referenced by data in another table with a FOREIGN KEY constraint. If the DELETE removes multiple rows, and any one of the removed rows violates a trigger or constraint, the statement is canceled, an error is returned, and no rows are removed.
For TruncateYou cannot use TRUNCATE TABLE on tables that:
Are referenced by a FOREIGN KEY constraint.
Participate in an indexed view.
Are published using transactional replication or merge replication.
Permissions of performing TRUNCATE or DELETE operationFor using both the statements you need some permissions on the server, which decides whether you can perform the Delete or Truncate action on the table or not. To truncate a table, you need at least ALTER permissions on the table (which is granted by default to the table owner, members of the sysadmin fixed server role, and the db_owner and db_ddladmin fixed database roles). To delete rows using the delete statement, you only need the DELETE permission.
The above is the detailed explanation of the differences between both of these statements. Please feel free to comment or ask questions about this article.

Wednesday, November 21, 2007

XML Workshop XI - Default Namespaces

Introduction
In the previous workshop (XML Workshop X) we have seen a basic introduction to XML Namespaces. We have seen examples which clearly explained why we need to have namespaces in XML. In this session we will examine XML Default Namespaces and see how to generate XML which contains default namespace definitions.

Default Namespace
Default namespace applies to all the un prefixed elements of an XML document. A default namespace declaration is usually placed at the root of an XML document. In this case, the default namespace is said to have global scope and all the un prefixed elements of the XML document will belong to the default namespace. It is also quite possible to have a default namespace defined for a specific XML element. In this case, the scope of the namespace declaration will last up to the closing tag of the element. Let us look at a few examples.
Here is the XML that we generated in the previous lab. [xml]
1 2 xmlns:db="urn:www.dotnetquest.com/DbConnection"
3 xmlns:net="urn:www.dotnetquest.com/NetConnection">
4
5 World Wide Internet Providers
6 512 KBPS
7

8
9 SQL Client Provider
10 TCP/IP
11 Windows
12

13
Given below is the equivalent version of the same XML which uses default a default namespace. [xml]
1
5 6 xmlns="urn:www.dotnetquest.com/DbConnection"
7 xmlns:net="urn:www.dotnetquest.com/NetConnection">
8
9 World Wide Internet Providers
10 512 KBPS
11

12
16
17 SQL Client Provider
18 TCP/IP
19 Windows
20

21
The above XML is equivalent to the previous one that we have seen. The only difference is that the new version of the XML makes use of the default namespace declaration. Let us modify the TSQL Query that we created in the previous lab, so that it will generate the XML structure with a default namespace declaration. If you have not done the previous lab, you need to create the sample tables and populate them. You can find the script here. The following query will generate the XML structure that we just discussed. [code]
1 WITH XMLNAMESPACES
2 (
3 -- This is the default namespace
4 DEFAULT 'urn:www.dotnetquest.com/DbConnection',
5 'urn:www.dotnetquest.com/NetConnection' AS net
6 )
7 SELECT
8 net.Provider AS 'net:Connection/net:Provider',
9 net.Speed AS 'net:Connection/net:Speed',
10 -- we don't need the prefix any more
11 db.Provider AS 'Connection/Provider',
12 db.Protocol AS 'Connection/Protocol',
13 db.[Authentication] AS 'Connection/Authentication'
14 FROM NetConnection net
15 CROSS JOIN DbConnection db
16 FOR XML PATH('Configuration')
Reading values
Now let us see how to read values from an XML variable which contains namespace information. In the previous labs we have seen several examples of reading values from XML variables and columns. We have not seen any example with namespace information so far. Here is the query that reads values from an XML variable which contains namespace information. [code]
1 declare @x xml
2 set @x = '
3 4 xmlns:db="urn:www.dotnetquest.com/DbConnection"
5 xmlns:net="urn:www.dotnetquest.com/NetConnection">
6
7 World Wide Internet Providers
8 512 KBPS
9

10
11 SQL Client Provider
12 TCP/IP
13 Windows
14

15
16 '
17 -- read values from the XML variable
18 SELECT
19 x.c.value(
20 'declare namespace net="urn:www.dotnetquest.com/NetConnection";
21 (net:Connection/net:Provider)[1]', 'varchar(max)')
22 AS NetProvider,
23 x.c.value(
24 'declare namespace net="urn:www.dotnetquest.com/NetConnection";
25 (net:Connection/net:Speed)[1]', 'varchar(max)')
26 AS Speed,
27 x.c.value(
28 'declare namespace db="urn:www.dotnetquest.com/DbConnection";
29 (db:Connection/db:Provider)[1]', 'varchar(max)')
30 AS DbProvider,
31 x.c.value(
32 'declare namespace db="urn:www.dotnetquest.com/DbConnection";
33 (db:Connection/db:Protocol)[1]', 'varchar(max)')
34 AS Protocol,
35 x.c.value(
36 'declare namespace db="urn:www.dotnetquest.com/DbConnection";
37 (db:Connection/db:Authentication)[1]', 'varchar(max)')
38 AS [Authentication]
39 FROM @x.nodes('/Configuration') x(c)
Using WITH XMLNAMESPACES you can make this query simpler. Here is a different syntax which produces the same results, but using WITH XMLNAMESPACES. [code]
1 declare @x xml
2 set @x = '
3 4 xmlns:db="urn:www.dotnetquest.com/DbConnection"
5 xmlns:net="urn:www.dotnetquest.com/NetConnection">
6
7 World Wide Internet Providers
8 512 KBPS
9

10
11 SQL Client Provider
12 TCP/IP
13 Windows
14

15
16 '
17 -- read values from the XML variable
18 ;WITH XMLNAMESPACES
19 (
20 'urn:www.dotnetquest.com/NetConnection' AS net,
21 'urn:www.dotnetquest.com/DbConnection' AS db
22 )
23 SELECT
24 x.c.value(
25 '(net:Connection/net:Provider)[1]', 'varchar(20)')
26 AS NetProvider,
27 x.c.value(
28 '(net:Connection/net:Speed)[1]', 'varchar(10)')
29 AS Speed,
30 x.c.value(
31 '(db:Connection/db:Provider)[1]', 'varchar(20)')
32 AS DbProvider,
33 x.c.value(
34 '(db:Connection/db:Protocol)[1]', 'varchar(10)')
35 AS Protocol,
36 x.c.value(
37 '(db:Connection/db:Authentication)[1]', 'varchar(10)')
38 AS [Authentication]
39 FROM @x.nodes('/Configuration') x(c)
40
41 /*
42 OUTPUT:
43
44 NetProvider Speed DbProvider Protocol Authentication
45 -------------------- ---------- -------------------- ---------- --------------
46 World Wide Internet 512 KBPS SQL Client Provider TCP/IP Windows
47
48 (1 row(s) affected)
49 */

Conclusions
This workshop focussed on explaining XML NAMESPACES. We have seen how go generate XML which contains namespace information. We then saw how to read values from an XML variable which contains namespace information.

Tuesday, October 16, 2007

Moving the SQL 2005 System Databases

A typical installation of Microsoft SQL 2005 ends up with system databases buried five levels deep under the C:\Program Files directory. Most DBAs prefer their data, including system databases, reside on a different drive than the executables. But moving the system databases, including master and the new hidden mssqlsystemresource database, is non-trivial, even scary. After doing this several times manually, I wrote a script to perform this task.

System Databases

The system databases, according to Microsoft (System Databases) include:
Master - Holds the description of other databases, logins, and master-only system tables.
Model - The snapshot of new databases that are created.
Msdb - Contains code and data for the SQL Server Agent and SQL Server Management Studio.
Tempdb - Where #tables and ##tables get created, and internal sorting space is written.
Mssqlresourcedb - Internal hidden read-only database containing system objects Other databases, such as distribution, may also be included depending upon your configuration. In this article, I refer to only the above five databases. Why would you want to move these databases? A few reasons. Personally, I don't like the idea of important database changes happening somewhere in the depths of the Program Files directory. I'd rather know that all my databases are on an easy to find directory structure. More importantly, database storage is usually in a RAID array or SAN separate from the C: boot disk. Why not have the protection and performance those disks afford on the critical system data? If you lose your boot disk, you can reinstall the OS and SQL and after pointing to the right locations, you can be back up quickly with no data loss.

A Special Database Requires Special Care

The newest member of the system databases is mssqlsystemresource. Its hard to tell exactly what it does, but Books Online (Resource Database) indicates that "SQL Server system objects, such as sys.objects, are physically persisted in the Resource database." We are also told there that the only supported operation on this database is to move it, so we're not breaking any rules! Another tidbit: this database can't be backed up. It must be copied like an EXE file. A very important fact that is easy to miss is that this database MUST appear in the same location as the master database. I didn't notice this restriction at first, and it caused some strange errors when we upgraded to SQL 2005 SP1. Turns out the service pack upgraded an older unused copy of the resource database, but not the copy in the same location as master. To play it safe, don't keep unused copies of this database in the default location, but rather keep backup copies in clearly marked backup directory. In order to move this database, we'll need special startup and trace flags and SQL DDL commands. And, of course, we need to ensure the path for the master database (specified in the service startup parameters in the registry) and the location of the mssqlsystemresource.mdf and .ldf files are the same.

Steps to Moving

The steps to moving the databases are outlined well in Books Online (Moving System Databases). To move the databases, we'll need to do the following:
Use ALTER DATABASE MODIFY FILE to tell the master database where most of the system databases will reside.

  • Update the service startup parameters in the registry so that the service finds the master database and log.
    Stop the service to unlock the files.

  • Move the files to the new location.

  • Start the service with the -f (minimal configuration) flag and the -T3608 trace flag to prevent automatic recovery.

  • Use ALTER DATABASE MODIFY FILE to record the new location of the mssqlsystemresource database.

  • Move the mssqlsystemresource file.

  • Set mssqlsystemresource database to read only.

  • Stop and Start the SQL Server Service in normal mode.

Examining The Code

Download the code


The .CMD script MUST be run on the SQL Server machine itself, and the accompanying .SQL must live in the same directory as the .CMD script. The script requires two parameters - the instance name and the full path where the data files will be moved (leave off the trailing backslash). If using the default instance, provide MSSQLSERVER for the instance name. After initial setup and parameter checking, SQLCMD is called with the -Q parameter to accomplish step a. for model, msdb, and tempdb. Then SQLCMD is called again using the -i parameter to run the sql script. Interestingly, though, it is called from the FOR command. This powerful but arcane command can do a lot in a batch script, but it is often far from obvious just what its doing. In this case, FOR is used as means to communicate from SQL back to the calling script. The script will return the old path of the system files, information we will need to move the files in a later step. But how to get the variable from SQL to the calling batch script? This was enough of a challenge that I considered writing it all as a SQL script using SQLCMD's new parameters, but again, going from SQL @variables to SQLCMD $(variables) proved difficult. I considered writing the script in Perl, but didn't want to require everyone to download the Perl runtime. I finally settled on the ugly, but working, FOR /F command to execute SQLCMD with an input file and return its result as an environment variable that can be used later in the script. With the /F switch, FOR will run the single-quoted command specified in the parenthesis, then set the batch variable (%%s) to the output of the command. Specifying -h-1 on the SQLCMD indicates no column headers and dash divider lines on output. We also use the "delims=;" option to have the batch variable read up to the first semicolon it sees, since the default action of delimiting on spaces would choke in the presence of "Program Files" and "Microsoft SQL Server". The .SQL Script first uses xp_regread procedure to get the directory path (MSSQL.n) associated with an instance name. Notice the use of the $(InstName) variable. SQLCMD, unlike OSQL, can easily read environment variables. The value of this key tells us where to search in the registry for the SQL Server service parameters. Plugging in the values, we read the SQLArg0, SQLArg1, and SQLArg2 values that contain the -d, -l, and -e parameters. There is no set order to these parameters, so we need to iterate through them all. We obtain the old path for sending back to the calling script, replace it with the new path, and call the xp_regwrite procedure to write it out. The script can then proceed in a straightforward way to accomplish steps c-i.

Caveats

The code assumes that all of the system databases are in the same location. If you've already moved TEMPDB, for example, you might want to comment that portion of the code from the .CMD script. The assumption is also made that the -d and -l parameters occur within the first three parameters of the service startup. Its not likely, but other options such as trace flags could be specified before these parameters and break this part of the code. Before running this code, always backup your data files. Get a screen print of the Advanced tab in the SQL Server Configuration Manager, so that you can restore the registry values manually if necessary. Run it on a test server first if at all possible. If you don't have a test server, consider installing the server on your workstation to test it out.

Conclusion

Moving the system databases offers some real advantages but needs to be done carefully and consistently. The script presented here can help to accomplish this, particularly when run soon after installation of a new SQL instance

Monday, October 1, 2007

Working Your SOX Off

Introduction
It used to be a mainframe joke. "What is a SOC4?" people would ask when the error came up. The answer: "To put on your foot, of course". Now, however, SOX is a word that strikes terror in the hearts of IT people everywhere.
In 2002, the United States government passed a law called the Sarbanes-Oxley Act. Nicknamed SOX (or Sarbox by some), this act has confused many a manager and implementation guru because of its broad, open-to-interpretation language. Having just gone through a major implementation at my workplace, I understand the worry and fear that accompanies this law. I hope my own experiences can shed a little light on this major, and often misunderstood, project.
If you are in the United States, or do business in the United States, and your company is over a certain size, you have to abide by a lot of regulatory laws - privacy, financial and documentation. SOX is a combination of financial and documentation. It consists of several different Titles, each of which establish just what is legal and illegal under the new act, but the gist of what applies to the IT world is the word "accountability".
Accountability in this context means documentation of established procedures, automation of reporting / tasks wherever possible and the ability to prove to an independent auditor that documented procedures are followed all the time. This includes financial reporting. Contrary to popular belief (and I was surprised by this one too), not all financial reporting have to be completely automated. There are instances where you can have financials in a spreadsheet that the accountants manipulate and change. The trick is to have this information documented. Always remember it's about "What, Why, Who and How".
The best way to start with a SOX implementation is to do a discovery phase। Find out what processes you currently have documented. Then list the ones you don't have documented. Processes include --but are not limited to-- security, access, permissions, reports, responsibilities and documentation location. I'll go through each of these one at a time.

Security
This goes hand in hand with access and permissions. You need to list every single possible database, application, file share and server that you have control over. Then you need a list of who has any permissions or access to these items. This includes listing out Windows Groups (not the individual members), Windows Users, SQL Logins, or anyone who has local admin access to a Server.
Now there are certain circumstances where you won't have complete control over your security. For instance, if you're a DBA who does soft Admin tasks on a server where another team controls the hardware, you might not be able to keep them from adding groups or users to the Server's OS. Don't sweat it. Just find out the name of the team in question and include a notation in your documentation that says what you are responsible for and that Team X also has the ability to add logins to Server ABC.
Access & Permissions
Here we go with the hand in hand stuff I mentioned earlier. On the security list you just created, you need to include the access levels and permissions of each individual user / group. Since access and permissions aren't necessarily the same thing, I've listed them both.
Remember how Santa Claus checks his list twice? Creating the security / access / permissions list is just the beginning. Once you have it, you have to make a commitment (and document that commitment) to check the list on a regular basis, matching it against the reality of what security is actually implemented. You also have to fix problems as you find them and have a documented process for what to do when for the fixes.
Lastly, you have to have a documented process for granting permissions / access when someone joins the team or company and removing those permissions when someone leaves the team or company. And most important, you have to prove that you follow that procedures!
Here's a real life example। In my workplace, I wrote up the policy dictating who got what permissions in what SQL environments (and on what file shares) while a co-worker created a SQL job that appends the current S/A/P settings to an Excel sheet on a weekly basis. We double-check that sheet each Monday to see who has what and if anything violates the written policy. We then investigate how that permission got there (including the who) and then remove the permission. Sometimes, we even remove the permissions of the person who granted that access if it turns out that grantor is doing things outside the scope of his/her job duties. Lastly, we document those violations for later reference and tell the boss (so he doesn't get side-blinded when someone asks him about the violation).

Reports
This may sound silly, but SOX requires you to document your reports. The documentation for this should include the business rules used in generating the reports and, if necessary, an explanation of the math behind it. Also, if you have financial reports being generated by hand (data entry into a spreadsheet anyone?), you have to automate as much of the report as possible. Hand-keyed financials are no longer acceptable to the Feds unless the data in question is something that changes constantly, like an interest rate.
This is important to note. Not everything has to be automated when it comes to financial reporting. Just as much as possible. Then you have to document why you didn't automate something, what process is to be followed in the manual entry, who can make those manual entries and how those entries are to be made. And, again, you have to be able to prove to the auditor that someone isn't just making up numbers.
The general rule to this one is, start with the simple reports (working your way to the more complicated ones) and automate as many of them as you can। But make sure you write requirements on those reports as you work or an auditor may come along and make you redo the whole process.

Responsibilities
Make sure you document who is responsible for what area of SOX implementation and follow-up. For example: if you have an internal Help Desk who is responsible for setting up Windows Logins, group memberships and permissions, don't forget to include them in your documentation (and your project). When asked, you need to be able to tell an auditor that it was the Help Desk that gave that user Domain Admin permissions.
And the auditors will ask. They want to see if you actually know the SOX processes as this gives them a good idea if you're even following them. After all, if you don't know the processes, chances are you aren't following them because you can't answer their questions.
So make sure to integrate your SOX documentation with that of all affected teams, to get their input and to document everyone's roles and responsibilities for tracking SOX violations. Doing this before the auditors show up will make your life that much easier.
Documentation Location
Last but not least, you have to document your documentation location.
Say what??? Yep, as strange as it sounds, you have to put down in your SOX documentation where you plan to store (electronically & physically) all the stuff you just wrote up. You also have to publish those locations to all affected team members. Auditors do actually ask you where your documents are and want you to prove you can get to them. And if you forget to tell one person about it, you get dinged severely in the audit.
Implementation Advice
Not all companies have to implement SOX. If you're a small business under a certain number of employees (and I don't know the number off the top of my head), you are exempt from SOX. It's always good to implement anyway under the notion that you'll improve your company security. And, if your company grows, you won't have to do it later on when life gets more complicated. Finally, when it comes to actually doing the project, there are a few "shortcuts" that might be available to you.
First, get your lists together of what might be affected. Meet with the entire team to weed out anything that doesn't really need to be done or add things that have been forgotten. Get a Project Manager to write up a project timeline and assign priorities to the different tasks. This is the most important thing you can do. If you get a surprise visit from the auditors and can prove that you're at least working on the issue (show them the project plan), they probably won't ding you as badly as if you didn't have anything done and had no proof you were in the middle of implementation.
Second, if you're in a large company, find out if other departments have done their SOX implementations yet. If so, they might be able to give you advice or a sample list of what they did. This could save you some time if you know what your own auditors are looking for.
Third, call the auditors! Beat them to the punch. Tell them you're in the middle of an implementation and you're stuck. You'd like advice on what you should be checking and, if your boss doesn't mind, can they come in and give you a pseudo-audit so you have an idea of what they are looking for. Believe it or not, the auditors will be more than happy to help you out. After all, it looks good on their records if you come out as a compliance All-Star.
Fourth, while it might be better to over-document than under-document, you still have to make sure that everyone on your team understands the documents. If they don't, you've just wasted an awful lot of time for nothing.
Last, there are rumors that SOX is due to expire soon, but skimming the text of the actual law, I don't see any expiration date listed। And since SOX compliance can actually help your business more than hurt it, I would recommend going through the processes anyway. Certainly you could clear away a lot of deadwood processes simply by finding out what would need to be documented for SOX. After all, why write documentation on a legacy report with no data and that no one ever looks at? Just delete the report and go on to the next thing.

Conclusion
SOX isn't actually as hard as a lot of people make it out to be. The main problem is that it is time and resource intensive, which means lots of big dollar signs if you have to pay people overtime or hire temps to help you out during the implementation. It's also very easy to let it slide when you have more pressing projects to complete. But if you document as you work on those other projects, you've just saved yourself time later down the road.

Thursday, September 20, 2007

INFORMATION_SCHEMA

The algorithm for generating the temp table name suffix changed between Shiloh in Yukon. In any case, it is not based upon the session id.
I suggest you give your temp table unique prefixes and do this:
use tempdbgo
select * from INFORMATION_SCHEMA.TABLESwhere TABLE_CATALOG = 'tempdb'and TABLE_SCHEMA = USERand TABLE_NAME like '#DIRECTORY%'go
Note that TABLE_SCHEMA = USER only works in Shiloh. Reason - because of the user/schema separation feature. In Yukon, the TABLE_SCHEMA is really that ... the table's schema name... which might not be the same as the user name. We have real schemas now. User X can own schemas Y and Z. All schema names occupy the same namespace regardless of owner, however.
Another difference between Shiloh and Yukon is this: You cannot use 3-part names to refer to tempdb from another database context unless you are sa. You must "use" tempdb and stick to a 2-part name, as shown in the example above. This works in Yukon, however, for non-sa users.
In Summary
For Shiloh
TABLE_SCHEMA = user name
This won’t work from non-tempdb calling context unless you’re sa/dbo. You get an empty set back.
use otherdbgo
select * from tempdb.INFORMATION_SCHEMA.TABLES
go
The temp table name is formed from login time stamp + nest level.

For Yukon
TABLE_SCHEMA = schema name
This will work from non-tempdb calling context even if you are a least-privileged user. You get the rows back.
use otherdbgo
select * from tempdb.INFORMATION_SCHEMA.TABLES go
Formed from an internal counter.

If you want to write code that works both on Shiloh and Yukon for non-sa users, then:
a) You must "use tempdb"b) You must use 2-part name: SELECT * FROM INFORMATION_SCHEMA.TABLESc) You must assume that for Yukon customers, the schema name == user name. This will be the case for all upgraded databases. This will also hold true as long as your customers avoid user/schema separation features. This will hold true for the old “sp_adduser” API. d) You can enforce (c) this by using DDL triggers in Yukon and doing ROLLBACKs on CREATE SCHEMA and CREATE USER statements.

Wednesday, September 12, 2007

SQL Server 2005 UNICODE considerations.

ISV applications today often require international support. Migrating an existing non-Unicode database to Unicode is a commonly discussed topic in the ISV application space. This BLOG entry discusses the general considerations and approaches to migrate a database to Unicode with SQL Server.

SQL Server Unicode Support

SQL Server Unicode data types support UCS-2 encoding. Unicode data types store character data using two bytes for each character rather than one byte. There are 65,536 different bit patterns in two bytes, so Unicode can use one standard set of bit patterns to encode each character in all languages, including languages such as Chinese that have large numbers of characters.

In SQL Server, data types that support Unicode data are:
nchar
nvarchar
nvarchar(max) – new in SQL Server 2005
ntext
Use of nchar, nvarchar, nvarchar(max), and ntext is the same as char, varchar, varchar(max), and text, respectively, except:
Unicode supports a wider range of characters.
More space is needed to store Unicode characters.
The maximum size of nchar and nvarchar columns is 4,000 characters, not 8,000 characters like char and varchar.
Unicode constants are specified with a leading N, for example, N'A Unicode string'

Migration Considerations

Migrating a database to Unicode involves considerations beyond simply changing columns to use Unicode data types. It requires careful planning on the database and application to avoid the possibility of losing and corrupting data. The discussion on application planning is outside the scope of this BLOG. On the database side, the things to consider before migration are
1) Column size
2) Page size
3) Variable
4) Disk Space

Column Size
Unicode data types store each character with two bytes instead of one byte. SQL Server column limit is 8000 bytes for both Non-Unicode and Unicode column. The maximum size of a Unicode column is 4000 characters. When migrate to Unicode, all column data beyond 4000 characters will be truncated. Column data should be scanned to prevent data loss. If a Unicode column needs to hold more than 4000 characters, the column should be converted to nvarchar(max).
Page Size
SQL Server page size is 8K. The size of a character column doubles when migrate to Unicode. Existing data that used to fit in an 8K page might not fit any more. In SQL Server 2000, the way to extend a record beyond 8060 bytes of data is through a BLOB (text, image) column. With SQL Server 2005, the 8K page size is still there, but SQL Server 2005 supports large rows by allowing data in variable size columns to over-flow automatically in the case of a record length is greater than 8060 bytes. SQL Sever 2005 has a mechanism to dynamically over-flow data off page or pull data in page as a record size increases beyond 8k or decreases to fit within an 8k page. The flexibility provided by SQL Server 2005 could mean one less consideration for Unicode migration. However, be aware that flexibility comes with a price. Constant data move off/on page could affect performance. If you know a column will most likely contain a large block of data, use the nvarchar(max) data type. And use the sp_tableoption “large value types out of row” option to store the max data type out of page.
Variable
Nchar, nvarchar, nvarchar(max), ntext should be used when declaring a variable for use with the Unicode schema. With Unicode data type declaration, the column size specifier is in character instead of in bytes. Unicode literals are specified with a leading uppercase N, for example N’A Unicode string’. The N stands for National Language in the SQL-92 standard. If a query contains Non-Unicode variable, literal is being executed against a Unicode schema, the variable/literal will need to be converted to Unicode during execution, and cause additional execution overheads. Existing T-SQL code (store procedures, user defined functions, check constraints, computed columns) hould be reviewed to ensure 1) valid variable declaration. 2) use of N identifier. 3) any byte operations are still valid.
Disk Space
SQL Server uses a single byte for storing character in a non-Unicode code page that’s non-DBCS, and two bytes for Asian languages that use DBCS. For Unicode data SQL Server uses two bytes for each character. Depending on the amount of character data and character based indexes in the database, the size of the database could increase significantly after migrating a non-Asian database to Unicode. Sufficient disk space should be planned for the migration.

Migrating to Unicode Database
There are two general approaches to migrate data from non-Unicode to Unicode: 1) Full Export and Import 2) In-place Conversion.
Full Export and Import
In most case a full export and import data is needed to convert data to Unicode without data loss. The steps to migrate to Unicode using a full export and import are:
Backup the database
Scan the database for possible data truncation, column size, page size issues
Create a new database with Unicode schema (without indexes)
Export non-Unicode data
Import data into Unicode database
Create indexes on Unicode database
Review existing T-SQL code to ensure variable declaration, byte operations are still valid
To simplify the process of exporting and importing data into SQL Server, you can use SQL Server Integration Service or the SQL server BCP utility. Both provide the ability for automatic Unicode conversion.

Using BCP

To prevent data loss when copying code page data into Unicode database, you can specify the data to be exported and imported in Unicode format with the –N or the –w flag. The following table is a comparison of the two options.

Flag
Description
-N
Use Unicode character format for all character data, and use native (database) data types for all non-character data. This option offers higher performance than the –w option. This option limits BCP data between SQL Server with same character sets and sort order.
-w
Use Unicode character data format for all columns.

For example, to export data to from table “strTbl” using trusted connection.

bcp myDb.dbo.strTbl out C:\strTbl.Dat -N -T

When importing the data, use the BCP fast mode when possible. The fast mode can save substantial time for data loading. To take advantage of the BCP fast mode, the following criteria must be met.

The “select into/bulkcopy” database option must be set
The target table should not have any indexes
The target table must not be part of the replication
Use TABLOCK to lock the target table

For example, to import data into Unicode table “wstrTbl” using the data file exported above.
bcp myUnicodeDb.dbo.wstrTbl in C:\strTbl.Dat -N -T


Using SQL Server Integration Service

If there are requirements to pre-process data before migrating data to Unicode, SQL Server Integration Service (SSIS) would be a better tool than BCP. The advantages of using SSIS over BCP are 1) better manageability and control 2) no need for intermediate disk storage for bcp files 3) user friendly GUI interface.

Using SSIS Data Flow Task to pump data between a non-Unicode to a Unicode database is straightforward. What should be aware of is that SSIS is now more restricted on data mapping in data Flow. For example, with the old SQL2K DTS, you can do Copy Column Transformation from a varchar column to an nvarchar column. In case when the source data does not have the same code page as the source server, data could be corrupted during the data pump. DTS does not prevent that. SSIS is more proactive in preventing data corruption. With SSIS, if you attempt to do Copy Column Transformation between a varchar column and an nvarchar column, you will receive a compile time error “cannot convert between Unicode and no-Unicode string data type”. No implicit data conversion is allowed in SSIS. All data types need to match or be explicitly converted. To load non-Unicode data to a Unicode table, all character data would need to be explicated converted to Unicode data types upstream using the Data Conversion or Derived Column Transformation task. The following picture shows the Data Flow Task that uses a Data Conversion Transformation to explicitly convert non-Unicode character data to Unicode and fast loads data into a Unicode table.

(Click on the image to enlarge)

In-Place Conversion

If the non-Unicode database character set ASCII, and is in the 7-bit range such as US-English, it’s possible to convert the data in-place since UCS-2 is a superset of ASCII. You can use the Alter Table Alter Column statement to change columns from non-Unicode data types, char, varchar, varchar(max), text to Unicode data types, nchar, nvarchar, nvarchar(max), ntext. For example,

With the following table schema:

CREATE TABLE employee (
EmployeeID int NOT NULL,
NationalIDNumber varchar(15) NULL,
LoginID] varchar(256) NULL,
ManagerID int NULL,
Title varchar(50) NULL,
BirthDate datetime NOT NULL,
Gender char(1) NULL
)

You could convert the table to Unicode with the following Alter Table Alter Column statements:

Alter Table employee Alter Column NationalIDNumber nvarchar(15)
Alter Table employee Alter Column LoginID nvarchar(256)
Alter Table employee Alter Column Title nvarchar(50)
Alter Table employee Alter Column Gender nchar(1)

Before you run the Alter Table Alter Column statement, ensure all constraints, indexes on the column are remove.

Another method of in-place conversion, or rather a pseudo in-place conversion is to use the SELECT INTO statement to create a new table with Unicode scheme, and bulk load the data over to the new table. For example with the above schema, you could create a new Unicode table.

SELECT
EmployeeID,
CAST(NationalIDNumber as nvarchar(15)) as NationalIDNumber,
CAST(LoginID as nvarachar(256)) as LoginID,
ManagerID,
CAST(Title as nvarchar(50)) as Title,
BirthDate,
CAST(Gender as nchar(1)) as Gender
INTO
employee_wstr
FROM
employee

Depending on how many constraints, indexes you have defined on the character columns, the SELECT INTO method may perform better than the Alter Table Alter Column method.

In-place conversion is less time-consuming than full export and import, but it should only be used with extreme discretion. To prevent data loss, the data should be fully scanned to ensure that no data are outside the ASCII 7-bit range.

Friday, August 24, 2007

Using CROSS APPLY in SQL Server 2005

My interest in writing this article was started by an MSDN article titled SQL Server 2005: The CLR Enters the Relational Stage. The article shows how to write a function that returns the top three countries per category. That's always been something that was difficult to do in SQL so I was curious about the approach. The article started out well but I was very unhappy by the end. It's just soooo much easier to do this in SQL Server 2005 using the new CROSS APPLY clause in Transact-SQL. So I'm going to write a query to return the top 3 orders for each customer and I'm going to do it in about 10 lines of SQL. (UPDATE: An alert reader found an even better approach!)

First please take a second to read the MSDN article. Pay special attention to how much C# code is required for the solution and how flexible it is.
Note: I also want to mention here that I started out trying to replicate their solution. I tried to install the MSI but that failed with an invalid characters message. I tried to recreate the CLR code but since he mostly posted snippets and had three versions I finally gave up trying to get that to work. One thing I really wanted to do was compare performance on identical data. Unfortunately I wasn't able to do that. My example will use the AdventureWorks database

Let's start at the TOP
Along the way to making this solution work we'll discuss a few new features of SQL Server 2005. The first is that the TOP clause can now take a variable:

DECLARE @Rows INT
SET @Rows = 10

SELECT TOP ( @Rows ) *
FROM Sales.SalesOrderHeaderThis will return the top 10 rows from SalesOrderHeader. You can also replace @Rows with anything that evaluates to a number. The following query looks odd but runs just fine:

SELECT TOP (
SELECT COUNT(*)
FROM Sales.Customer
) *
FROM Sales.SalesOrderHeaderThere are 19,185 rows in the Customer table and this query returns the top 19,185 rows from SalesOrderHeader. You can also use the TOP clause for INSERT, UPDATE and DELETE statements. If you wanted to DELETE in batches of 500 you can now do that using the TOP clause.

My Function
Next we need a function to return the TOP X rows from SalesOrderHeader based on the total sales amount. That function looks like this:

CREATE FUNCTION dbo.fn_GetTopOrders(@custid AS int, @n AS INT)
RETURNS TABLE
AS
RETURN
SELECT TOP(@n) *
FROM Sales.SalesOrderHeader
WHERE CustomerID = @custid
ORDER BY TotalDue DESC
GONotice that it accepts the number of orders to return as a parameter. Also notice that I'm using SELECT * all over the place. I really encourage you to explicitly list out the columns when you write this for real. If you call this with a CustomerID and a number of rows it will return that many rows ordered by the total amount of the order in descending order. Also notice that there is an ORDER BY clause in this function. We'll talk about that in a minute.

Apply the APPLY Clause
The real magic happens when you use SQL Server 2005's new APPLY clause. The APPLY clause let's you join a table to a table-valued-function. That let's you write a query like this:

SELECT C.CustomerID,
O.SalesOrderID,
O.TotalDue
FROM
AdventureWorks.Sales.Customer AS C
CROSS APPLY
AdventureWorks.dbo.fn_GetTopOrders(C.CustomerID, 3) AS O
ORDER BY
CustomerID ASC, TotalDue DESCwhich results in this...

CustomerID SalesOrderID TotalDue
----------- ------------ ---------------------
1 45283 37643.1378
1 46042 34722.9906
1 44501 26128.8674
2 46976 10184.0774
2 47997 5469.5941
2 57044 4537.8484
3 53616 92196.9738
3 47439 78578.9054
3 48378 56574.3871
4 47658 132199.8023
. . .The APPLY clause acts like a JOIN without the ON clause comes in two flavors: CROSS and OUTER. The OUTER APPLY clause returns all the rows on the left side (Customers) whether they return any rows in the table-valued-function or not. The columns that the table-valued-function returns are null if no rows are returned. The CROSS APPLY only returns rows from the left side (Customers) if the table-valued-function returns rows.

Notice that I'm just passing in the CustomerID to the function. It returns the TOP 3 rows based on the amount of the order. Since I'm using CROSS APPLY a customer without orders won't appear in the list. I can also pass in a number other than 3 to easily return a different number of orders per customer. Even better I can pass in a different number of orders for each customer. So I could list the top 5 orders for one type of customer but the top 10 for another type of customer. How cool is that?!?

And it gets even better. Remember the function has an ORDER BY in it. It will always return the top orders based on the amount. However you can change the ORDER BY clause in the query that calls the function to display those rows in whatever order you want. You could easily display the top 3 orders in ascending order instead.

Performance and Conclusion
Unfortunately I wasn't able to compare the performance of my solution to MSDN's. The query plan from my solution doesn't look that bad. (And did I mention that it showed me a full query plan for the query with the UDF rather than just a UDF step? Sweet!) If anyone gets their solution running send me the code and I'll update the article.

I also think their solution returns the result as a comma separated value. Now that would be a great use for the CLR. You could easily wrap that around the results of this function for a fast, effecient procedure. And it happens to be the next article I'm working on. :) If you want to see a great article on using the CLR inside SQL Server I'd start with Using CLR Integration in SQL Server 2005. The code is from an earlier beta but it gives you a great idea of when to use CLR code.

In summary I think that 5-10 lines of SQL is much simpler than 50+ lines of C# (or VB) for this type of solution. The article provides a lot of good information on the CLR inside SQL Server. I just don't think this is the best place to use it. One of my biggest fears is that people will use the CLR incorrectly and cause performance problems. This is the first real case I've seen of this. Is this really how Microsoft wants us to use the CLR?

UPDATE
I was curious when I posted this how long it would take to find an even better solution. It didn't take long. Arnold Fribble posted it in the forums. The syntax needed a little tweaking but it appears you can do this in a single query. And how did I forget the PARTITION BY clause of the ROW_NUMBER function. Doh! Here's the cleaned up query:

SELECT
CustomerID,
SalesOrderID,
TotalDue
FROM (
SELECT
O.CustomerID,
O.SalesOrderID,
O.TotalDue,
ROW_NUMBER = ROW_NUMBER() OVER (
PARTITION BY O.CustomerID
ORDER BY O.TotalDue DESC)
FROM
Sales.SalesOrderHeader AS O
) AS d
WHERE
d.ROW_NUMBER <= 3

Tuesday, August 21, 2007

Using the OUTPUT Clause to Capture Identity Values on Multi-Row Inserts

SQL Server 2005 introducted the OUTPUT clause which we can use to capture values from the inserted and deleted virtual tables. Previously this data was only available through triggers. We can use this in an INSERT ... SELECT statement to capture all the inserted identity values. Previously this required some type of loop or temporarily altering the target table.

We'll start with two tables: a product table and table of products to insert. The scenario is a vendor that sends you a complete list of all their products and you only need to insert the rows that don't already exist. However you need to insert those new rows into multiple tables. The following script will create the tables in tempdb based on data in AdventureWorks.

USE tempdb
GO
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[Product]') AND type in (N'U'))
DROP TABLE [dbo].[Product]
GO
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].ProductsToInsert') AND type in (N'U'))
DROP TABLE [dbo].ProductsToInsert
GO

CREATE TABLE Product (
ProductID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
[Name] NVARCHAR(50) NOT NULL,
ProductNumber NVARCHAR(25) NOT NULL,
ListPrice MONEY NOT NULL)
GO
CREATE UNIQUE INDEX IX_Product_ProductNumber ON Product ( ProductNumber )
GO

CREATE TABLE ProductsToInsert (
RowID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
[Name] NVARCHAR(50) NOT NULL,
ProductNumber NVARCHAR(25) NOT NULL,
ListPrice MONEY NOT NULL,
InsertedIdentityValue INT NULL)
GO
INSERT Product ([Name], ProductNumber, ListPrice)
SELECT TOP 450 [Name], ProductNumber, ListPrice
FROM AdventureWorks.Production.Product
ORDER BY SellStartDate, ProductID
GO
INSERT ProductsToInsert ([Name], ProductNumber, ListPrice)
SELECT [Name], ProductNumber, ListPrice
FROM AdventureWorks.Production.Product
GOThe Product table has an identity column as its primary key. Product number is a natural key on the table. The ProductsToInsert table has the ProductNumber column and a column for whatever identity value is inserted when we put the row into the Product table. You'll notice I only put 450 of the products in the Product table to start with but all 504 in the ProductsToInsert table.

A simple script to insert the new products looks like this:

use tempdb
GO
INSERT Product ([Name], ProductNumber, ListPrice)
SELECT
[Name], ProductNumber, ListPrice
FROM
ProductsToInsert I
WHERE
NOT EXISTS (SELECT 1
FROM Product
WHERE ProductNumber = I.ProductNumber)That inserts the 54 products that weren't in the table previously. We can use the OUTPUT clause to return back the rows that were inserted. That looks like this:

INSERT Product ([Name], ProductNumber, ListPrice)
OUTPUT inserted.ProductID,
inserted.[Name],
inserted.ProductNumber,
inserted.ListPrice
SELECT
[Name], ProductNumber, ListPrice
FROM
ProductsToInsert I
WHERE
NOT EXISTS (SELECT 1 FROM Product
WHERE ProductNumber = I.ProductNumber)When that statement is run it returns the following recordset back to the client:

ProductID Name ProductNumber ListPrice
----------- ----------------------------------- --------------------- -------------
451 LL Bottom Bracket BB-7421 53.99
452 ML Bottom Bracket BB-8107 101.24
453 HL Bottom Bracket BB-9108 121.49

. . .

504 HL Touring Handlebars HB-T928 91.57

(54 row(s) affected)That is almost what we want. We have the identity values in the result set but we don't have a way to work with the data and we don't have a way to tie it back to the original source row. We'll need to add two things. First we'll need to store this result set in a table variable. I'll also remove a few columns to make it easier to read and add an update statement to save the identity value. That script looks like this:

DECLARE @InsertedRows TABLE (ProductID INT, ProductNumber NVARCHAR(25) )

INSERT Product ([Name], ProductNumber, ListPrice)
OUTPUT inserted.ProductID,
inserted.ProductNumber
INTO @InsertedRows
SELECT
[Name], ProductNumber, ListPrice
FROM
ProductsToInsert AS I
WHERE
NOT EXISTS (SELECT 1 FROM Product
WHERE ProductNumber = I.ProductNumber)

UPDATE ProductsToInsert
SET InsertedIdentityValue = T.ProductID
FROM ProductsToInsert I
JOIN @InsertedRows T ON T.ProductNumber = I.ProductNumber

SELECT RowID, ProductNumber, InsertedIdentityValue
FROM ProductsToInsert
WHERE InsertedIdentityValue IS NOT NULLWe declare a table variable to store the results of the OUTPUT clause. We use the OUTPUT INTO syntax to store the results into the table variable. Next we use the table variable to update the source table with the inserted identity columns. After that a simple SELECT statement returns the new values which we can easily use in other statements:

RowID ProductNumber InsertedIdentityValue
----------- ------------------------- ---------------------
451 HB-T721 503
452 HB-T928 504
453 FB-9873 502

. . .

504 BK-R19B-52 470

(54 row(s) affected)The OUTPUT clause can also be used with UPDATE and DELETE statements and return values from either the inserted or deleted table. In its simplest form the OUTPUT clause greatly simplifies importing data into SQL Server.

SQL Server 2005 Encryption – Encryption and data length limitations

SQL Server 2005 encryption solution has some limitations and one of them that has raised a few questions is the limited amount of plaintext that can be encrypted. I hope this article helps answer this question.

What is the limit on the data length that can be encrypted?
The answer to this question is a little bit more complex than a simple number: the limit on the input (plaintext) really depends on the output (ciphertext), and it depends on the key you are using and on whether you are using optional features of the encryption builtins.

You can use the following formula to predict the length of the ciphertext that will result from calling EncryptByKey based on the plaintext length, the key algorithm used and whether the optional authenticator parameter is being used or not. Note: this formula is valid only for the DES and AES key families, it is not valid for RC4, and I would recommend to not even use RC4 at all)

@CipherLen = FLOOR( (8 +@PTLen + (@UsesAuth * @HASHLEN) ) / @BLOCK) + 1 ) * @BLOCK + 16 + @BLOCK + 4

Where:
@CipherLen: The ciphertext length in bytes
@PTLen: The plaintext length in bytes
@UsesAuth: 1 if using the optional authenticator parameter, 0 otherwise
@HASHLEN: 20 bytes for SHA1 (The authenticator parameter adds a SHA1 hash to the plaintext)
@BLOCK: The length in bytes per block. 8 for the DES family, 16 for AES

Now let’s explain every part of the formula to make it easier to understand:
FLOOR( (
8 Encrypted internal header
+@PTLen Plaintext length
+ (@UsesAuth * @HASHLEN) If the authenticator parameter is used, the plaintext will include a hash derived from it.
) / @BLOCK) + 1 ) * @BLOCK This portion of the formula as a whole predicts the length of the ciphertext, including padding. Note: If the plaintext length fits exactly in one block, there will be an additional block of padding. This part of the formula reflects this behavior
+ 16 Key GUID (not encrypted). This information is used during decryption to identify the key that needs to be used
+ @BLOCK Initialization vector (IV)
+ 4 Internal header that describes the BLOB version number

Once we have reviewed this formula, we can discuss the limitations on the encryption builtins: The output (ciphertext) is limited to up to 8000 bytes; what does this mean? This means that the plaintext limit is a little bit below the 8000 bytes (i.e. for AES encryption, using the authenticator parameter it should be around 7920 bytes ). If you try to encrypt a larger plaintext, the builtin will fail (return null) as it won’t be able to fit all the output in the available buffer.

For certificates, the way to calculate the maximum length of plaintext we can encrypt, as well as the ciphertext length, is easier because it is based on key modulus.
@Cipher_len = key_modulus (regardless of the plaintext length)
@Max_Plaintext_len = key_modulus – 11

By default, the self-signed certificates created by SQL Server 2005 have a 1024 bit modulus = 128 bytes, therefore the cipher texts are always 128 bytes and the maximum plaintext that can be encrypted is 117 bytes. The same formula applies for any certificate you may want to export into the system (the limit is 3456 bits for the private keys modulus).

For more information on this one, I recommend you to go to http://msdn.microsoft.com/library/default.asp?url=/library/en-us/seccrypto/security/cryptencrypt.asp and look at the remarks section for CryptEncrypt.

Should I always use this formula to calculate the size of my encrypted columns?
I strongly recommend against considering the result of the formula as a hardcoded length for your encrypted columns. I would suggest to estimate what would be the expected maximum ciphertext length (either by creating a sample using EncryptByKey directly on your system or by using the formula) and add a little bit of extra space. My personal suggestion would be to reserve at least 1 extra block (8 bytes for DES family, 16 bytes for AES), but it would really depend on the specific scenario.

The formula I shared here is a way to predict the length of the plaintext, but there is no guarantee that this formula will work for future algorithms or even for the existing algorithms in future versions of the product. As you have noticed, SQL Server encryption includes a few fields that are not part of the ciphertext itself, and in future versions of the product there may be a need to include new fields.


But I really need to encrypt BLOBs larger than 8000 bytes! Is there anything I can do?
Using the native solution, there is nothing you can do. Your application will need to slice the input before encrypting it.
I have created a small sample code that creates a UDF (user defined function) that is essentially a wrapper around EncryptByKey. It basically partitions the data in different pieces, encrypts them, and then pastes them together in a single BLOB that can be decrypted by its counterpart wrapper UDF for DecryptByKey. I created two different set of UDFs, one for symmetric keys and one for certificates.

The main idea behind this demo is to split the BLOB we want to encrypt in smaller segments that we can encrypt and then concatenate their ciphertexts in such a way that we can recover the individual segments, decrypt them, and paste them again in a single LOB (the original plaintext).

Demo


/***********************************************************************
* This posting is provided "AS IS" with no warranties,
* and confers no rights.
*
* Authors: Raul Garcia,
* Tanmoy Dutta,
* Pankaj Kamat,
* Laurentiu Cristofor
* Date: 11/07/2005
* Description:
*
* Create a scalar functions that allow encryption and decryption
* of large objects (> 8000 bytes).
*
* (c) 2005 Microsoft Corporation. All rights reserved.
*
**********************************************************************/

CREATE DATABASE demo_LobEncryption
go

USE demo_LobEncryption
go

--------------------------------------------------------------------
-- Create a new Encryption function – EncryptLob.
--
-- This is a wrapper around the EncryptByKey builtin
-- In order to keep this demo simple, it doesn't support
-- all the options of that builtin
--
-- PARAMETERS
-- @key_guid: Symmetric key GUID,
-- equivalent to EncryptByKey’s first argument.
-- This key needs to be already opened
-- before calling the function
-- @lob: The plaintext to encrypt.
-- Limits for the datalength are defined
-- by the varbinary(max) data type
-- RETURN VALUE
-- varbinary(max): If the encryption succeeded,
-- it will return the encrypted data
-- as described in NOTES
-- NULL is returned if there is any error
-- NOTES:
-- The plaintext will be split in blocks that can be
-- encrypted individually and then will be concatenated together:
-- ([2 bytes - ciphertext block length][ciphertext block])
-- per plaintext block
CREATE FUNCTION dbo.EncryptLob(
@key_guid uniqueidentifier,
@lob varbinary(max) )
returns varbinary(max)
as
BEGIN
declare @PtLimit int
declare @Total int
declare @CurPos int
declare @Aux varbinary(8000)
declare @AuxLen int
declare @PlaintextColumnLimit int
declare @Cipher varbinary(max)

-- Limit for the PT block
SET @PtLimit = 7800
SET @CurPos = 1
SET @Total = datalength( @lob )
SET @Cipher = null

---- Simple validation of the input parameters

if(
@lob is not null
AND encryptbykey( @key_guid, 0x00 ) is not null
-- Can we encrypt with the key specified by the GUID
)
BEGIN
WHILE @CurPos <= @Total
BEGIN
-- Get a new PT block
SELECT @Aux = substring( @Lob, @CurPos, @PtLimit )
SET @CurPos = @CurPos + @PtLimit

-- Encrypt the PT block
SET @Aux = encryptbykey(@key_guid, @Aux)
-- If any block failed to decrypt,
-- we should just return NULL (failed)
if( @Aux is null )
return null

-- Get the ciphertext length
-- 2 bytes should be enough
SET @AuxLen = datalength( @Aux )
SET @Aux = convert( binary(2), @AuxLen ) + @Aux

-- Only set the @Cipher for the first block,
-- otherwise concatenate the existing data
-- with the new cipher block
if( @Cipher is null )
SET @Cipher = @Aux
ELSE
SET @Cipher = @Cipher + @Aux
END
END

return @Cipher
END
go
-------------------------------------------------------------------
-- Create a new Decryption function - DecryptLob.
--
-- This is a wrapper around the decryptByKey builtin.
-- It is the counterpart of the EncryptLob function.
--
-- PARAMETERS
-- @Cipher: A ciphertext that was generated by
-- EncryptLob.
-- The decryption key needs to be already
-- opened before calling the function
-- RETURN VALUE
-- varbinary(max): If the decryption succeeded, it will
-- return the plaintext as varbinary(max)
-- NULL is returned if there is any error
--
-- NOTES:
-- The ciphertext will be split in blocks that can be
-- decrypted individually and then concatenated together:
-- ([2 bytes - ciphertext block length][ciphertext block])
-- per plaintext block
CREATE FUNCTION dbo.DecryptLob( @Cipher varbinary(max) )
returns varbinary(max)
as
BEGIN
declare @PtLimit int
declare @Total int
declare @CurPos int
declare @Aux varbinary(8000)
declare @AuxLen int
declare @LobPt varbinary(max)

SET @CurPos = 1
SET @Total = datalength( @Cipher )
SET @LobPt = null

-- No op for null or empty data
if( @Cipher is not null
AND @Total > 0
)
BEGIN
WHILE @CurPos <= @Total
BEGIN
-- Read the next cipher block length
SELECT @AuxLen = substring( @Cipher, @CurPos, 2 )
SET @CurPos = @CurPos + 2

-- Length field validation.
-- Any unexpected length will result in error
if( @AuxLen <= 0 OR @AuxLen > 8000 )
return null

-- Get the next cipher block
SELECT @Aux = substring( @Cipher, @CurPos, @AuxLen )
SET @CurPos = @CurPos + @AuxLen

-- If there is any discrepancy,
-- it is either a data truncation error
-- (cipher was truncated)
-- or a data corruption error.
-- We will fail the whole operation.
if( datalength( @Aux ) <> @AuxLen )
return null

-- Decrypt the current cipher block
SET @Aux = decryptbykey( @Aux )

-- ... and make sure we could decrypt it.
-- Again, any error here will terminate
-- the operation and return null
if( @Aux is null )
return null

-- Either set (if first block) or concatenate
-- to the available PT we already decrypted
if( @LobPt is null )
SET @LobPt = @Aux
ELSE
SET @LobPt = @LobPt + @Aux
END
END

return @LobPt
END
go

---------------------- Test
CREATE SYMMETRIC KEY key1 WITH ALGORITHM = AES_192 ENCRYPTION BY PASSWORD = '|\/|y p@22\/\/0rD'
go

OPEN SYMMETRIC KEY key1 DECRYPTION BY PASSWORD = '|\/|y p@22\/\/0rD'
go

--------------- Quick Demo
declare @x varbinary(max)
declare @y varbinary(max)
declare @z varchar(max)
declare @i int
set @i = 0
SET @x = convert( varbinary(max), '1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ')
while @i < 15 -- PT will be a little bit less than 2MB
BEGIN
SET @i = @i + 1
SET @x = @x + @x
END
SELECT @y = dbo.EncryptLob( key_guid('key1'), @x )
SELECT datalength( @y ) as 'Ciphertext length'
if( @y is not null )
BEGIN
SELECT @z = convert( varchar(max), dbo.DecryptLob( @y ))

if( @z is not null )
BEGIN
SELECT datalength( @x) as 'original PT len', datalength(@z) as 'decrypted PT len', substring( @z, datalength(@z) - 16, 100) as 'Last 16 chars'
if( @z = @x )
print 'decrypted PT == original PT'
else
print 'Unexpected error. PTs didn''t match'
END
ELSE
print 'Failed to decrypt. Make sure the key is opened'
END
ELSE
print 'Failed to encrypt. Make sure the key is opened'
go

--------------------------------------------------------------------
--------------------------------------------------------------------
--------------------------------------------------------------------
-- Create a new Encryption function - EncryptLobByCert.
--
-- This is a wrapper around EncryptByCert builtin
-- In order to keep this demo simple, it doesn't support
-- all the options EncryptByCert supports.
--
-- PARAMETERS
-- @cert_id: Cert ID, equivalent to EncryptByCert
-- first argument.
-- @lob: The plaintext to encrypt.
-- Limits for the datalength are defined
-- by the varbinary(max) data type
-- RETURN VALUE
-- varbinary(max): If the encryption succeeded,
-- it will return a byte stream that
-- with the encrypted data. The byte stream
-- format is described in NOTES
-- NULL if there is any error
-- NOTES:
-- The plaintext will be split in blocks that can be
-- encrypted individually and then concatenated together:
-- ([2 bytes - ciphertext block length][ciphertext block])
-- per plaintext block
-- Remember that asymmetric key encryption/decryption is orders
-- of magnitude more expensive than symmetric key
-- encryption/decryption.
-- We recommend avoiding the use of asymmetric key encryption for
-- large amounts of data, but these functions may be used to
-- protect data < 1000 bytes that are larger than what
-- the SQL builtins are currently supporting
-- (for example, data larger than 117 bytes)
CREATE FUNCTION dbo.EncryptLobByCert( @cert_id int, @lob varbinary(max) )
returns varbinary(max)
as
BEGIN
declare @PtLimit int
declare @Total int
declare @CurPos int
declare @Aux varbinary(8000)
declare @AuxLen int
declare @PlaintextColumnLimit int
declare @Cipher varbinary(max)

SET @Aux = encryptbycert( @cert_id, 0x00 )
if( @Aux is null ) -- Return null if we cannot encrypt by the specified cert
return null

-- Calculate the limit for the PT block
SET @PtLimit = datalength( @Aux ) - 11

SET @CurPos = 1
SET @Total = datalength( @lob )
SET @Cipher = null

---- Simple validation of the input parameters
WHILE @CurPos <= @Total
BEGIN
-- Get a new PT block
SELECT @Aux = substring( @Lob, @CurPos, @PtLimit )
SET @CurPos = @CurPos + @PtLimit

-- Encrypt the PT block
SET @Aux = encryptbycert( @cert_id, @Aux)
-- If any block failed to decrypt,
-- we should just return NULL (failed)
if( @Aux is null )
return null

-- Get the ciphertext length
-- 2 bytes should be enough
SET @AuxLen = datalength( @Aux )
SET @Aux = convert( binary(2), @AuxLen ) + @Aux

-- Only set the @Cipher for the first block,
-- otherwise concatenate the existing data with
-- the new cipher block
if( @Cipher is null )
SET @Cipher = @Aux
ELSE
SET @Cipher = @Cipher + @Aux
END

return @Cipher
END
go


---------------------------------------------------------------------
-- Create a new Decryption function - DecryptLobByCert.
--
-- This is a wrapper around DecryptByCert builtin.
-- It is the counterpart of EncryptLobByCert function.
--
-- PARAMETERS
-- @cert_id: Cert ID, equivalent to EncryptByCert
-- first argument.
-- @Cipher: A ciphertext that was generated by
-- EncryptLobByCert.
-- @Password: Password for the private key.
-- If encrypted by DBMK, you should use null
-- for this parameter.
--
-- RETURN VALUE
-- varbinary(max): If the decryption succeeded, it will
-- return the plaintext as varbinary(max)
--
-- NOTES
-- The plaintext will be split in blocks that can be
-- encrypted individually and then concatenated together:
-- ([2 bytes - ciphertext block length][ciphertext block])
-- per plaintext block
CREATE FUNCTION dbo.DecryptLobByCert(
@cert_id int,
@Cipher varbinary(max),
@Password nvarchar(4000) )
returns varbinary(max)
as
BEGIN
declare @PtLimit int
declare @Total int
declare @CurPos int
declare @Aux varbinary(8000)
declare @AuxLen int
declare @LobPt varbinary(max)

SET @CurPos = 1
SET @Total = datalength( @Cipher )
SET @LobPt = null

-- No op for null or empty data
if( @Cipher is not null
AND @Total > 0
)
BEGIN
WHILE @CurPos <= @Total
BEGIN
-- Read the next cipher block length
SELECT @AuxLen = substring( @Cipher, @CurPos, 2 )
SET @CurPos = @CurPos + 2

-- Length field validation. Any unexpected
-- length will result in error
if( @AuxLen <= 0 OR @AuxLen > 8000 )
return null

-- Get the next cipher block
SELECT @Aux = substring( @Cipher, @CurPos, @AuxLen )
SET @CurPos = @CurPos + @AuxLen

-- If there is any discrepancy,
-- it is either a data truncation error
-- (cipher was truncated)
-- or a data corruption error.
-- We will fail the whole operation.
if( datalength( @Aux ) <> @AuxLen )
return null

-- Decrypt the current cipher block
SET @Aux = decryptbycert( @cert_id, @Aux, @Password )

-- ... and make sure we could decrypt it.
-- Again, any error here will terminate
-- the operation and return null
if( @Aux is null )
return null

-- Either set (if first block) or concatenate
-- to the available PT we already decrypted
if( @LobPt is null )
SET @LobPt = @Aux
ELSE
SET @LobPt = @LobPt + @Aux
END
END

return @LobPt
END
go

------------ Test
CREATE CERTIFICATE certTest ENCRYPTION BY PASSWORD = 'c3R+ p@zz\/\/0Rd!' WITH SUBJECT = 'Demo - LOB encryption'
go

declare @x varbinary(max)
declare @y varbinary(max)
declare @z varchar(max)
declare @i int
set @i = 0
SET @x = convert( varbinary(max), '1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ')
while @i < 4 -- PT will be a little bit more than 1k
BEGIN
SET @i = @i + 1
SET @x = @x + @x
END
SELECT @y = dbo.EncryptLobByCert( cert_id('CertTest'), @x )
SELECT datalength( @y ) as 'Ciphertext length'
--select @y as CipherTextLob
if( @y is not null )
BEGIN
SELECT @z = convert( varchar(max), dbo.DecryptLobByCert( cert_id('CertTest'), @y, N'c3R+ p@zz\/\/0Rd!' ))

if( @z is not null )
BEGIN
SELECT datalength( @x) as 'original PT len', datalength(@z) as 'decrypted PT len', substring( @z, datalength(@z) - 16, 100) as 'Last 16 chars'
if( @z = @x )
print 'decrypted PT == original PT'
else
print 'Unexpected error. PTs didn''t match'
END
ELSE
print 'Failed to decrypt. Make sure the key is opened'
END
ELSE
print 'Failed to encrypt. Make sure the key is opened'
go

------ Test2
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'D8|\/||< p455\/\/oRx!'
go

CREATE CERTIFICATE certTest2 WITH SUBJECT = 'Demo - LOB encryption DBMK protected'
go


declare @x varbinary(max)
declare @y varbinary(max)
declare @z varchar(max)
declare @i int
set @i = 0
SET @x = convert( varbinary(max), '1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ')
while @i < 4 -- PT will be a little more than 1k
BEGIN
SET @i = @i + 1
SET @x = @x + @x
END
SELECT @y = dbo.EncryptLobByCert( cert_id('CertTest2'), @x )
SELECT datalength( @y ) as 'Ciphertext length'
--select @y as CipherTextLob
if( @y is not null )
BEGIN
-- Use null for the password parameter
SELECT @z = convert( varchar(max), dbo.DecryptLobByCert( cert_id('CertTest2'), @y, null ))

if( @z is not null )
BEGIN
SELECT datalength( @x) as 'original PT len', datalength(@z) as 'decrypted PT len', substring( @z, datalength(@z) - 16, 100) as 'Last 16 chars'
if( @z = @x )
print 'decrypted PT == original PT'
else
print 'Unexpected error. PTs didn''t match'
END
ELSE
print 'Failed to decrypt. Make sure the key is opened'
END
ELSE
print 'Failed to encrypt. Make sure the key is opened'
go
----------------------- CLEANUP
use master
go

DROP DATABASE demo_LobEncryption
go
----------------------- END OF DEMO -----------------------

SearchSQLServer: Expert advice on database administration