Wednesday, October 20, 2010

How to Share Data Between Stored Procedures --Erland Sommarskog http://www.sommarskog.se/share_data.html

http://www.sommarskog.se/share_data.html


How to Share Data Between Stored Procedures
An SQL text by Erland Sommarskog, SQL Server MVP. Most recent update 2010-01-10.


This article tackles two related questions:

How can I use the result set from one stored procedure in another, also expressed as How can I use the result set from a stored procedure in a SELECT statement?
How can I pass a table as a parameter from one stored procedure to another?
In this article I will discuss a number of methods, and also point out their advantages and drawbacks. Some of the methods apply only when you want to reuse a result set, whereas others apply in both situations. In the case you want to reuse a result set, most methods require you to rewrite the stored procedure in one way or another, but there are some methods that do not.

This table summarises the methods:

Method Input/ Output SQL Server versions Comment
Using OUTPUT Parameters Output All Not generally applicable, but sometimes overlooked.
Table-valued Functions Output SQL 2000 Probably the best method for output, but has some restrictions.
Inline Functions Use this when you want to reuse a single SELECT.
Multi-statement Functions When you need to encapsulate more complex logic.
Using a Table In/Out All Most general methods with no restrictions, but a little more complex to use.
Sharing a Temp Table Mainly for single pair of caller/callee.
Process-keyed Table Best choice for many callers to same callee.
Global Temp Tables A variation of Process-Keyed.
INSERT-EXEC Output SQL 6.5 Does not require rewrite. Has some gotchas.
Table Parameters and Table Types In/(Out) SQL 2008 Could have been the final answer, but due to a restriction it is only mildly useful in this context.
Using the CLR Output SQL 2005 Does not require a rewrite. Clunky, but is useful as a last resort when INSERT-EXEC does not work.
OPENQUERY Output SQL 7 Does not require rewrite. Tricky with many pitfalls.
Using XML In/Out SQL 2005 A roundabout way that requires you to make a rewrite, but it has some advantages over the other methods.
Using Cursor Variables Output SQL 7 Not recommendable.

At the end of the article, I briefly discuss the particular situation when your stored procedures are on different servers.

A question very close to the second question above is how to pass table data from a client, but that is a topic which is a bit outside the scope of this text. Of the methods that I discuss in this article, only table parameters and XML are useful for this case. For a more general discussion on passing structured data from client to SQL Server, see my article Arrays and Lists in SQL Server.

OUTPUT Parameters
This method can only be used when the result set is one single row. Nevertheless, this is a method that is sometimes overlooked. Say you have this simple stored procedure:

CREATE PROCEDURE insert_customer @name nvarchar(50),
@address nvarchar(50),
@city nvarchar(50) AS
DECLARE @cust_id int
BEGIN TRANSACTION
SELECT @cust_id = coalesce(MAX(cust_id), 0) + 1 FROM customers (UPDLOCK)
INSERT customers (cust_id, name, address, city)
VALUES (@cust_id, @name, @address, @city)
COMMIT TRANSACTION
SELECT @cust_id
That is, the procedure inserts a row into a table, and returns the id for the row.
Rewrite this procedure as:

CREATE PROCEDURE insert_customer @name nvarchar(50),
@address nvarchar(50),
@city nvarchar(50),
@cust_id int OUTPUT AS
BEGIN TRANSACTION
SELECT @cust_id = coalesce(MAX(cust_id), 0) + 1 FROM customers (UPDLOCK)
INSERT customers (cust_id, name, address, city)
VALUES (@cust_id, @name, @address, @city)
COMMIT TRANSACTION
You can now easily call insert_customer from another stored procedure. Just recall that in T-SQL you need to specify the OUTPUT keyword also in the call:

EXEC insert_customer @name, @address, @city, @cust_id OUTPUT
Note: this example has a single output parameter, but a stored procedure can have many output parameters.

If you take this path, you need to learn how to use OUTPUT parameters with your client API. As this is not a text on client programming, I leave it to the reader to explore how to do that. I let suffice to say that whichever client library you are using, it can be done. I also like to add that using OUTPUT parameters rather than result sets (where this is possible) results in simpler client code, as you don't have to open recordsets and that. You can also expect better performance, although it is only likely to exhibit if you run a tight loop and call the same procedure all over again.

Table-valued User-defined Functions
User-defined functions were introduced in SQL 2000 and there are in fact three kinds of them: 1) scalar functions 2) inline table-valued functions. 3) multi-statement table-valued functions. It is only the latter two that are of interest here.

I am not going to give an in-depth description of user-defined functions, but just give some quick examples. For full details, please see the CREATE FUNCTION topic in Books Online.

Table-valued functions are far more generally applicable than output parameters, but as we shall see, there are still quite a few restrictions, so there are situations where you cannot use them.

Inline Functions
Here is a example of an inline function taken from Books Online:

CREATE FUNCTION SalesByStore (@storeid varchar(30))
RETURNS TABLE AS
RETURN (SELECT title, qty
FROM sales s, titles t
WHERE s.stor_id = @storeid and
t.title_id = s.title_id)
To use it, you simply say:
SELECT * FROM SalesByStore('6380')
That is, you use it just like was a table or a view. Inline functions are indeed much like a parameterized view, because the query optimizer expands the function as if it was a macro, and generates the plan as if you had provided the expanded query. Thus, there is no performance cost for packaging a SELECT statement into a table-valued function. For this reason, when you want to reuse a stored procedure that consists of a single SELECT statement, rewriting it into an inline UDF is without doubt the best choice. (Or instead of rewriting it, move the SELECT into a UDF, and rewrite the existing procedure as a wrapper on the function, so that the client is unaffected.)
When you are on SQL 2000, there is one restriction, though: you cannot use system functions that are nondeterministic, that is they do not return the same value for the same input parameters on each call. A typical example is getdate(). This restriction has been lifted in SQL 2005.

Multi-statement Functions
A multi-statement function has a body that can have as many statements as you like. You need to declare a return table, and you insert the data to return into that table. Here is the function above as a multi-statement function:

CREATE FUNCTION SalesByStore (@storeid varchar(30))
RETURNS @t TABLE (title varchar(80) NOT NULL,
qty smallint NOT NULL) AS
BEGIN
INSERT @t (title, qty)
SELECT t.title, s.qty
FROM sales s
JOIN titles t ON t.title_id = s.title_id
WHERE s.stor_id = @storeid
RETURN
END
You use multi-statement functions as inline functions, but in difference to inline functions, they are not expanded in place, but instead it's like you would call a stored procedure in the middle of the query, and return the data in a temp table. This permits you to move more complex stored procedure into functions.

Compared to inline functions, multi-statement functions incur some overhead due to the return table. More important, though, is that if you use the function in a query, the optimizer will have no idea of what the function returns, and will make standard assumptions. Many times this is not a real issue, but particularly if the result is huge, it can be. In this case, you may be better off getting the result into a temp table, as a temp table has statistics that the optimizer can consider. And once there, using one of the table methods or INSERT-EXEC may be better choice. (More about statistics etc in the section using a temp table.)

It follows from this, that there is rarely any reason to consider which sort of function to use. If you can express you problem in a single query, use a inline function. Use multi-statement only when you have to.

A drawback with multi-statement functions, is that there are many things you cannot do in a UDF, because a UDF must never change database state. This leads to the following restrictions:

You can only perform INSERT, UPDATE or DELETE statements on table variables local to the function.
You cannot call stored procedures (with the exception of extended stored procedures).
You cannot invoke dynamic SQL.
You cannot create temp tables (use table variables instead).
As with inline functions, on SQL 2000, you cannot access non-deterministic system functions. Yet a restriction is that on SQL 2000, you cannot use user-defined data types in the return table. On SQL 2005 you can, but not if they have rules or defaults bound to them.

Please see the Remarks section for the CREATE FUNCTION topic in Books Online for a complete list of restrictions.

Using a Table
Of course, what could be better to use in a database than a table? We will look at two ways to do this. The beauty of using tables to share the data, is that you can use this in all situations. You may feel, though, that they require more red tape to use. There are also some potential performance issues with these methods, related to recompilation.

Sharing a Temp Table
When you want to share data between a single pair of a calling stored procedure (caller) and a called stored procedure (callee), using a local temp table is probably the most appealing choice, as you do not have to create a permanent table. But the more procedures you throw into the mix, the more likely that you will face a maintenance problem that makes this method less preferable.

The method itself is as simple as this:

CREATE PROCEDURE called_procedure @par1 int,
@par2 bit,
... AS
...
INSERT/UPDATE/DELETE #tmp
go
CREATE PROCEDURE caller AS
DECLARE ...
CREATE TABLE #tmp (col1 int NOT NULL,
col2 char(5) NULL,
...)
...
EXEC called_procedure @par1, @par2 ...
SELECT * FROM #tmp
go
In this example, caller creates the temp table, and called_procedure fills in in. But it could also be the case that caller fills the table with input data, that called_procedure then uses for some processing.

There is a potential maintenance problem here: if called_procedure is called from many places, and you want to change what columns that called_procedure read/writes, you need to revisit all calling stored procedures to edit the temp-table definition. For this reason, sharing temp tables is mainly useful when you have a single pair of caller and callee. If there are several callers, using a process-keyed table is probably a better choice. Then again, if the temp is small, maybe only a single column of customer IDs to process, the table is likely to be very stable. Use your own good judgement to determine what is best for you.

Yet a possibility was suggested to me by Richard St-Aubin. The callers would create the temp table with a single dummy column, and then call a stored procedure that uses ALTER TABLE to add the real columns. This procedure would be the single holder of the temp table definition. (You cannot put the CREATE TABLE in the stored procedure, as in that case the temp table goes away when the procedure exits. Using ALTER TABLE evades the problem.) This method can definitely be worth exploring, but I like to add a word of caution. Adding columns to a table at run-time can lead to unexpected errors if the procedure is recompiled. If you call the procedure directly after the CREATE TABLE statement, you should be fairly safe. But this depends on the fine print in SQL Server, so it could break with a new release. (And in any case, it does not work with SQL 6.5.)

(A more advanced way to tackle the maintenance problem is to use a pre-processor, and put the definition of the temp table in an include file. If you have a C compiler around, you can use the C pre-processor. My AbaPerls includes Preppis which we use in the system I spend most of my time with.)

There is an interesting possibility for flexibility here. The callee only has to care about the columns it reads or writes. This permits a caller to add extra columns for its own usage when it creates the temp tables, and two callers to the same callee could have different definitions of the same temp table.

You need to be aware of that there is a possible performance problem with using temp tables and that is recompilation. This can be a serious issue on SQL 7 and SQL 2000. Each time the caller is invoked, a new instance of the temp table is created, and for this reason the callee will be recompiled (as the table actually could have a new definition). The cost for a recompilation of a procedure with a single SELECT that is moderately complex, may be less than 200 ms. A longer procedure of 100-200 lines of code may take over a second to compile. If the procedure itself needs a minute to run, this is no big issue. But if the procedure once compiled executes in sub-second, that recompilation second will be very noticeable to end users. So this is an important issue that you need to consider before you choose this method.

In SQL 2005 the situation is different, as it does recompilation on statement level. This means that only the statements that refer to the shared temp table will be recompiled. Thus, a callee that accepts input data into a temp table, could copy that data into a local temp table or a table variable, and only that statement would be recompiled, and that is likely to be cheap. Likewise a callee that fills in output data, only needs to refer to the shared table in a final INSERT statement.

Here I only discussed recompilation because the temp table is a new table each time. Temp tables can also cause recompilations because of auto-statistics, something that applies in general when you use temp tables, not only when you share them between stored procedures. I return to the topic of auto-statistics when I look at process-keyed tables.

If you want to reuse an existing procedure by sharing a temp table, you do as for a table-valued function: you move to the core to a common procedure, and transform the existing procedure into a wrapper so that client-side programs are unaffected.

Process-Keyed Tables
This method evades the maintenance problem by using a permanent table instead. There is still a recompilation problem, but of a different nature.

A process-keyed table is simply a permanent table that acts like a temp table. To permit several processes to use the table simultaneously, the table has an extra column to identify the process. The simplest way to identify a process is the global variable @@spid (@@spid is the process id in SQL Server). In fact, this is so common, that these tables are often referred to as spid-keyed tables. Here is an example:

CREATE TABLE process_keyed (spid int NOT NULL,
col1 int NOT NULL,
col2 char(5) NULL,
...)
go
CREATE CLUSTERED INDEX processkey_ix ON process_keyed (spid)
-- Add other columns as needed.
go
...
DELETE process_keyed WHERE spid = @@spid
INSERT process_keyed (spi, col1, col2, ....)
VALUES (@@spid, @val1, @val2, ...)
...
SELECT col1, col2, ...
FROM process_keyed
WHERE spid = @@spid
...
DELETE process_keyed WHERE spid = @@spid
A few things to note here:

The table should have a clustered index on the process key (spid in this example), as all queries against the table will include the condition WHERE spid = @@spid.
You should delete any existing data for @@spid before you insert any data into the table, as a safety precaution.
And when you are finished using the data you should delete it, so that it does not occupy any extra space.
If you start to use this sort of tables on a larger scale in your database, I would recommend that you use some naming convention so that you easily can do:

SELECT 'TRUNCATE TABLE ' + name
FROM sysobjects
WHERE type = 'U'
AND name LIKE '%yourpattern%'
and then cut and paste the result into a query window. It cannot be denied that a drawback with this method is that sloppy programmers could forget to delete data when they are done. Beside wasting space, this can also lead to subsequent uses of the table will use incorrect row-count estimates when building query plans.

In the example, I used @@spid as the process key, and this is great as long only stored procedures are involved. But if a client needs to pass a process key, @@spid is likely to be a poor choice, as modern clients typically connect and disconnect and thus may get different spids. In this case, make the process-key a GUID (data type uniqueidentifier) and create the process key with the NEWID() function.

The nice thing with process-keyed tables, is that since the table is permanent, there is no maintenance problem. But what about recompilation? It turns out that also with a process-keyed table you are likely to face recompilation issues, but for a difference reason than for temp tables, to wit changed statistics. Recall that SQL Server has auto-statistics enabled by default. Auto-statistics kicks in when the first 500 rows are added, or when 20% of the rows have changed. (For full details on recompilation, see this white paper by Eric Hansen.) Since a process-keyed table is typically empty when it is not in use, auto-statistics sets in often. Sometimes this can be a good thing, as the statistics may help the optimizer to find a better plan. Other times, the recompilation adds extra seconds to the execution time, which may not always be acceptable. In difference to temp tables, there is a possible way out: use sp_autostats to disable auto-stats for your process-keyed table.

One disadvantage with process-keyed tables is that you tend to put them in the same database as all the other tables. This means that if you insert and delete a lot of data in your tables, that will take up more space in your log backups without serving any real purpose. This issue can be addressed by putting all your process-keyed tables in a separate database with simple recovery, or using a global temp table, which I will discuss in the next session.

Despite the drawbacks, process-keyed tables is often of the best alternative, and in the system I work with, we have quite a few of these tables.

Global Temp Tables
If you create a table with two leading number characters (e.g. ##temp), that is a global temp table. In difference to a regular temp table, a global temp table is visible to all processes. However, when the process that created the table goes away, so goes the table (with some delay if an other process is running a query against the table in that precise moment). That makes global temp tables difficult to use on any real global basis, but only when you have control over all involved processes like when spawning a second process through xp_cmdshell.

However, SQL Server MVP Itzik Ben-Gan recently made me aware of a special case: if you create a global temp table in a startup procedure, the global temp table will be around as long as the server is up, unless someone explicitly drops it. This makes it possible to use a global temp table as a process-keyed table. The advantage over a regular table is that you get the simple recovery for free. In the previous section I suggested that that you could put your process-keyed tables in a separate database with simple recovery, but that will bring you some hassle with administering users and permissions in multiple databases, something you are relieved from when you use a global temp table.

Here is a quick sample of how you create a global temp table this way:

USE master
go
CREATE PROCEDURE create_global_temp AS
CREATE TABLE ##global(spid int NOT NULL,
-- other columns here
)
go
EXEC sp_procoption create_global_temp, 'startup, 'true'
It shall not be denied that there are some problems with this solution. What if you need to change the definition of the global temp table in way that cannot be handled with ALTER TABLE? Having to restart the server to get new definition in place may not be acceptable. One way to address is to refer to your process-keyed tables through synonyms (a feature added in SQL 2005). In development, you let the synonym point to a local table, and only when you are ready for production you change the synonym to refer to a global temp table.

INSERT-EXEC
The main advantage with INSERT-EXEC is that does not require a change to the procedure being called. You do like this:

CREATE TABLE #tmp (...)
INSERT #tmp (...)
EXEC some_sp @par1, @par2 ...
The definition of #tmp must match the result set of the stored procedure exactly. The stored procedure should only return one result set. (But it will work with several result sets, as long as they match the table you insert into.) The table can be a regular table, a temp table or a table variable. (Except that on SQL 2000, where you cannot use a table variable.)

At the surface, this method looks neat, but there are a couple of issues that you should be aware of:

It can't nest. If some_sp tries to call some_other_sp with INSERT-EXEC, you will get an error message. Thus, you can only have one INSERT-EXEC active at a time. This is a restriction in SQL Server.
There is a serious maintenance problem. If someone changes the result set in the callee, the INSERT-EXEC statement will fail, because the column list must match the INSERT statement exactly. Keep in mind that the person who changes the callee may not even be aware of that you are calling it with INSERT-EXEC. In SQL 2008, this can be handled by using a table type, as I will discuss in the next section.
The procedure is executed in the context of a transaction. Since the procedure is called as part of an INSERT statement, there will always be an open transaction when the procedure is executed. This is generally not a cause for any greater concern, but occasionally it can cause surprises when you call system procedures. It can also be a hassle if you call a procedure in a linked server and you get errors about distributed transactions.
A ROLLBACK aborts the batch. If the called procedure issues a ROLLBACK statement, you get an error message saying you cannot do this. The transaction is rolled back, though. (Since the batch is aborted.) Warning: on SQL 6.5, this is a lot worse. When I made an experiment, my SQL Server process crashed, and when I reconnected, tempdb, where I had run the test, was corrupted.
INSERT-EXEC does not start an implicit transaction on some SQL Server versions. This applies only if: 1) you have SQL 2000 SP3 or earlier or SQL 7, and 2) you run with SET IMPLICIT_TRANSACTIONS ON. With this setting, an INSERT statement should start a transaction, but because of a bug this does not happen with INSERT-EXEC. This bug is fixed in SQL 2000 SP4 and SQL 2005. It is not present in SQL 6.5.
Because of these issues, I advice against using INSERT-EXEC as a general solution. This method is mainly useful when you cannot change the code of the procedure you are calling, for instance a system stored procedure, or a stored procedure in a third-party application.

Sometimes INSERT-EXEC can be useful with dynamic SQL, like this:

INSERT #tmp (...)
EXEC(@sql)
In this case, the statement in @sql is presumably created locally, so if the result set from the dynamic SQL changes, you probably know to change your table as well. But the restriction that INSERT-EXEC can't nest remains, and you still get a transaction context. For this reason, it is in many cases better to put the INSERT statement inside the EXEC() statement. There is also a performance aspect to it, that SQL Server MVP Adam Machanic has detailed in a blog post. The short summary is that with INSERT-EXEC, data does not go directly to the target table, but bounces over a "parameter table", which incurs some overhead. Then again, if your target table is a temp table, and put the INSERT inside the EXEC, you may face a performance issue because of recompilations.

I should also add that if you are using INSERT-EXEC to get scalar values back from the EXEC() statement to assign variables in the surrounding procedure, you should probably use sp_executesql instead. Dynamic SQL is a complex topic, and if you are not acquainted with it, I recommend you to read my article The Curse and Blessings of Dynamic SQL.

Table Parameters and Table Types
A new feature in SQL 2008 is table-valued parameters. You can pass a table variable as a parameter to a stored procedure. When you create your procedure, you don't put the table definition directly in the parameter list of the procedure, instead you first have to create a table type, and use that in the procedure definition. At first glance it may seem like step of extra work, but when you think of it, it makes very much sense: you will need to declare the table in at least two places, in the caller and in the callee. So why not have the definition in one place?

Here is a quick example that illustrates how you do it:

CREATE TYPE my_table_type AS TABLE(a int NOT NULL,
b int NOT NULL)
go
CREATE PROCEDURE the_callee @indata my_table_type READONLY AS
INSERT targettable (col1, col2)
SELECT a, b FROM @indata
go
CREATE PROCEDURE the_caller AS
DECLARE @data my_table_type
INSERT @data (a, b)
VALUES (5, 7)
EXEC the_callee @data
go
So this is the final solution that makes everything else I've talked of in this article of academic interest? Unfortunately, it's the other way round. See that word READONLY in the procedure definition? That word is compulsory with a table parameter. That is, table parameters are for input only, and you cannot use them to get data back. There are of course when input-only tables are of use, but most of the time I share a temp table or use a process-keyed table it's for input-output or output-only.

For this reason I have written a separate article, Why read-only table parameters is not enough, where I try to whip up support for a Connect issue in order to persuade the dev team to change this. I wrote it already during the beta of SQL 2008, but the limitation was not lifted before SQL 2008 shipped. Let's really hope that in the next version of SQL Server, we can use table parameters to pass data in all directions!

Nevertheless, there are some workarounds to make it possible to employ this feature. Say that you have want to pass a table to a stored procedure and have the callee to update some columns in that table. By using table types and INSERT-EXEC, you can make this in a more robust way than in SQL 2005 and earlier. Here is an outline. First how the caller looks like:

DECLARE @intable my_table_type,
@outtable my_table_type

-- Fill @intable with data

INSERT @outtable
EXEC some_procedure @intable
And here is how the callee would be like:

CREATE PROCEDURE some_procedure @intable my_table_type READONLY AS
DECLARE @copy my_table_type
INSERT @copy SELECT * FROM @intable
-- Code to update @copy follows there.
SELECT * FROM @copy
This code is somewhat perplexing, because it employs constructs that usually are considered very bad practice: SELECT * and INSERT without an explicit column list. But in this particular context, this is alright, because we know for certain that we are copying data between two tables that have exactly the same definition. (By the way, don't consider for a second of adding an IDENTITY column to the table type. That would break this scheme entirely.) The use of the table type, also addresses the most serious problem with INSERT-EXEC: if the final SELECT in the callee is changed, the caller will break. Of course, a very ignorant programmer could still wreak havoc by replacing the * with a column list, or join with a second table. But the table type should serve as a clear signal that there is a dependency.

Obviously, there are some obvious disadvantages with this method. You need to copy the data twice, which may be a costly affair if we are talking 100 000 rows. And INSERT-EXEC does still not nest in SQL 2008.

Using the CLR
As you will see this method is quite bulky to use, so it is not likely to be your first choice, and nor should it. However, if you are in the position that you need to capture the output from a stored procedure that you cannot change, and you run into one of the limitations with INSERT-EXEC, the CLR can be your last resort.

The idea is simple: you write a stored procedure in a CLR language like C# or VB .NET that runs the procedure and captures the result set(s) into a dataset. Then you write the data from the dataset back to the table where you want the data. While simple, it breaks some best practices when it comes how to use the CLR in SQL Server: avoid data access from the CLR, and if you do, only use a data reader, not datatables and datasets. The reason you should normally should not do data access from the CLR, is simply that T-SQL is much better equipped for this. And you should stay away from datatables and datasets, because you should not fill up the memory outside the buffer pool in SQL Server with data. But as I said, using the CLR is a last resort.

As a recap, here are the main situations where INSERT-EXEC fails you, and you would want to turn to the CLR:

The procedure you returns several result sets with different structures. This is true for many system procedures in SQL Server.
The procedure you call, cannot be called with an active transaction. See the final note in this section for an example.
The procedure you call, does already use INSERT-EXEC.
The CLR has one more advantage over INSERT-EXEC: it is less sensitive to changes in the output from the procedure you call. If a column is added to the result set of the procedure, your CLR procedure can still work. I'm not really sure that this advantage is worth the extra hassle, though.

Let's have a look at an example. When you call the system procedure sp_helpdb for a specific database, it produces two result sets, of which the second result set lists the files for the database. Say that you want to gather this output for all database on the server. You cannot use INSERT-EXEC due to the multiple result sets. To address this issue, I wrote a stored procedure in C# that you find in the file helpdb.cs and in the script helpdb.sql you can see how I employ it. The C# procedure first runs sp_helpdb with the DataAdapter.Fill method to get the data into a dataset. It then iterates over the rows in the second datatable in the dataset and insert these into the temp table created by the SQL Script.

You may ask: is the data adapter and the dataset really needed? Why not just run ExecuteReader and insert rows as we receive them? Yes, this is possible, but then you would need two connections, since the first connection would be busy with receiving data. This second connection would be not be the context connection, and thus you would need to give your assembly EXTERNAL_ACCESS permission. For this reason I prefer to use a dataadapter. But if the procedure you call can return many rows, it may be better to take the extra hassle with the permission, so that you don't fill up SQL Server with a big data set.

As I mentioned above, SQL 2008 adds table-valued parameters. I've written a second version of the helpdb procedure, available in the file helpdb-2008.cs, where I pass the data to the INSERT statement in a table parameter instead. This is neat, because I just pass the data table as a parameter, and insert all rows in one go. On the other hand, I need to create a table type to make this possible. I like to highlight two more things in helpdb-2008:

Since I want the database name in the final output, I pass this as a separate parameter to the INSERT statement, as it is not included in the data table.
The first column from sp_helpdb is called name, and in the temp table I've changed that to logicalname to make it clearer. However, in the table type the column is called name, since it must match the names in the datatable which gets its names from the output of sp_helpdb.
Note: I initially got this idea when SQL Server MVP Steve Jones tried to run DBCC SHRINKDATABASE from INSERT-EXEC to capture the result set SHRINKDATABASE produces. However, this command cannot be run in a transaction, so it did not work out. I suggested to him that the CLR could work, and when I tested it I found that it did ... on SQL 2008 only. On SQL 2005, my process was killed with an access violation (which means a bug in SQL Server), so in that case not even the last resort worked.

OPENQUERY
Just like INSERT-EXEC this is a method that leaves the called stored procedure untouched. But OPENQUERY is poor choice for the general case, because there are several traps as well as performance issues with OPENQUERY.

This construct was introduced in SQL 7 together with its cousin OPENROWSET to permit you to send pass-through queries to linked servers. It can be very useful, not the least with non-relational data sources such as Active Directory. You can also call back to your own server, permitting you to say things like:

SELECT * FROM OPENQUERY(LOCALSERVER, 'EXEC sp_who') WHERE status = 'runnable'
Thus, you can use the result set from a stored procedure right out of the box in a SELECT statement and do whatever you like with it.

While this is neat, it takes some effort to come there. The first thing is that the settings ANSI_NULLS and ANSI_WARNINGS must be ON for queries involving linked servers. For an ad-hoc statement the two commands

SET ANSI_NULLS ON
SET ANSI_WARNINGS ON
will suffice. These settings are also ON by default with most means of connection, the main exception being DB-Library clients. If you call OPENQUERY from a stored procedure, you need to be aware of that the ANSI_NULLS setting for a stored procedure is saved with the procedure when it is created. And when you create procedures from Enterprise Manager in SQL 7 and SQL 2000, ANSI_NULLS is OFF. (This is not the case in SQL Server Management Studio for SQL 2005 and SQL 2008.)

The next thing is to define LOCALSERVER. It may look like a keyword in the example, but it is in fact only a name. This is how you do it:

sp_addlinkedserver @server = 'LOCALSERVER', @srvproduct = '',
@provider = 'SQLOLEDB', @datasrc = @@servername
To create a linked server, you must have the permission ALTER ANY SERVER, or be a member of any of the fixed server roles sysadmin or setupadmin.

OPENQUERY opens a new connection to SQL Server. This has some implications:

The procedure that you call with OPENQUERY cannot refer temp tables created in the current connection.
The new connection has its own default database (defined with sp_addlinkedserver, default is master), so all object specification must include a database name.
If you have an open transaction and are holding locks when you call OPENQUERY, the called procedure can not access what you lock. That is, if you are not careful you will block yourself.
Connecting is not for free, so there is a performance penalty.
There are some other issues with OPENQUERY that we need to cover more in detail. The first to notice is that its parameters cannot be variables, so you cannot say:

SELECT * FROM OPENQUERY(LOCALSERVER, @sql)
If the stored procedure takes parameters, and these are not known until run-time, you will have to use dynamic SQL with all the baggage that comes with it. See here for an example of using OPENQUERY with dynamic SQL.

The second issue is due to that to be able to compile a query that includes OPENQUERY, SQL Server must retrieve metadata from the linked server. SQL Server makes all connections to the linked server through OLE DB, and the way OLE DB determines metadata (on SQL Server) is to run the command batch preceded by SET FMTONLY ON. When FMTONLY is ON, SQL Server does not execute any statements, but only returns metadata about the result sets. FMTONLY can be a cause for confusion in more than one way. One example, is that if the procedure creates a temp table, you will get an error message, since the table never gets created in FMTONLY mode. For instance:

SELECT * FROM OPENQUERY(LOOPBACK, 'EXEC pubs..sp_helpindex authors')
On SQL 2000, this results in:

Server: Msg 208, Level 16, State 1, Line 1
Invalid object name '#spindtab'.
(On SQL 2005, the message is different and longer. Not less confusing, though.) The workaround for this is to say:

SELECT * FROM OPENQUERY(LOCALSERVER,
'SET FMTONLY OFF EXEC pubs..sp_helpindex authors')
But beware! This means that the procedure is actually executed twice, so there certainly is a performance cost. And if the procedure performs updating actions, these are also performed twice which may be the completely wrong thing to do.

A similar issue appears, if the procedure you call uses dynamic SQL to return the results set. For instance:

SELECT * FROM OPENQUERY(LOOPBACK, 'EXEC sp_who2')
returns on SQL 2000:

Could not process object 'EXEC sp_who2'.
The OLE DB provider 'SQLOLEDB' indicates that the object has no columns.
Again, the workaround with SET FMTONLY OFF can be applied. (But only on SQL 2000. On SQL 2005 you will get an error, because sp_who2 returns two columns called SPID.)

Yet another possible source of confusion with FMTONLY is demonstrated by this script:

CREATE TABLE nisse (a int NOT NULL)
go
CREATE PROCEDURE silly_sp @x int AS
--SET NOCOUNT ON
INSERT nisse VALUES (@x)
SELECT @x * @@trancount
SELECT @x * 3
go
SELECT * FROM OPENQUERY(KESÄMETSÄ, 'EXEC tempdb.dbo.silly_sp 7')
go
SELECT * FROM nisse
go
The yields the same errors as for sp_who2, saying that there are no columns. The reason for this message is that the first "result set" comes from the rows affected message from the INSERT statement, which is thus the "result set" OLE DB finds when it runs the query. Adding SET NOCOUNT ON to the procedure, resolves this issue.

Leaving FMTONLY, we turn to the next thing that may be confusing when using OPENQUERY. When SQL Server executes the query for real, the OLE DB provider first issues SET IMPLICIT_TRANSACTIONS ON. With this setting SQL Server starts a transaction when an INSERT, UPDATE or DELETE statement is executed. (This also applies to a few more statements, see Books Online for details.) This can give some surprises. For instance, take the script above. Once SET NOCOUNT ON is in force, this is the output:

-----------
7

(1 row(s) affected)

a
-----------

(0 row(s) affected)
You see that I get back '7' from the call to silly_sp, which indicates that @@trancount is 1, and there is thus an open transaction, despite there is no BEGIN TRANSACTION in the procedure. (I don't get the '21' that I get when execute silly_sp directly, because with OPENQUERY, I only get one result set.) You also see that when I SELECT directly from nisse after the call to OPENQUERY, that this table is empty; this is because the implicit transaction was rolled back.

There are a few more things to know about OPENQUERY:

If you use table variables rather than temp tables in the called procedure, you don't need to use the trick with SET FMTONLY OFF. Then again, if you are to rewrite the stored procedure you are calling, this article has presented other ways to access its result set that all are superior.
Normally when you set up a linked server to another SQL Server instance, you use SQLOLEDB or SQL Server Native Client. But you can also set up the linked server with MSDASQL instead, that is OLE DB over ODBC. This may resolve some of the issues that I've mentioned here. However, Microsoft deprecates the use of MSDASQL for connections to SQL Server, so it is difficult to recommend it.
By now, you might be overwhelmed by the complexity of the issues with OPENQUERY, and that is sort of the point. OPENQUERY was not intended for accessing the local server, and you should think twice before you use it.

Using XML
Already in SQL 2000, it was possible to use XML for input. You can pass an XML string as an input parameter to a stored procedure, and in the procedure you use sp_xml_preparedocument and OPENXML to extract data from the XML document into one or more tables. This is a very valuable method to pass tabular data from client level to a stored procedure. But in SQL 2000, you cannot use XML to pass data from one procedure to another, at least not easily.

But this is possible in SQL 2005, thanks to the new xml data type. Here is an example how to do this:

CREATE PROCEDURE get_data @data xml OUTPUT AS
SELECT @data = (SELECT au_id, au_lname, au_fname
FROM pubs..authors
FOR XML RAW('authors'))
go
CREATE PROCEDURE the_caller AS
DECLARE @x xml
EXEC get_data @x OUTPUT
SELECT au_id = T.item.value('@au_id', 'varchar(11)'),
au_lname = T.item.value('@au_lname', 'varchar(40)'),
au_fname = T.item.value('@au_fname', 'varchar(20)')
FROM @x.nodes('/authors') AS T(item)
In get_data, you use one of the several options of FOR XML to build your XML document. Here I have used RAW which is the simplest option for this task. I have also added TYPE so that the query returns the xml data type, but strictly this is not necessary as else there would be an implicit conversion to xml.

In the caller, you use the xml type method nodes to shred the document into a one-column table of table fragments. You need to provide an alias for the table, as well as a name for that single column. You see this as T(item) above. Then you use another xml type method, value, to get the individual values out of the fragment. The value method takes two arguments whereof the first addresses the value you want to extract, and the second is the data type. The first parameter is a fairly complex story, but as long as you follow the example above, you don't really need to know any more. In the XML section of my article Arrays and Lists in SQL Server 2005 and Beyond, I have some more information about nodes and values.

In the example I use XML to pass data back to the caller, but you could also use XML to pass data in the opposite direction. (Although on SQL 2008, it is certainly cleaner to use table parameters.)

The result set in the example is from a single table, but what if we have some form of parent/child-relationship? Say that we also want to return all titles for the authors. With temp tables or process-keyed tables, the natural solution would be to use two tables (or actually three, since in pubs there is a many-to-many relationship between titles and authors, but I overlook this here.) But since XML is hierarchical, it would be more natural to put everything in a single XML document, and here is a query to do this:

SELECT a.au_id ,
a.au_lname,
a.au_fname ,
(SELECT t.title
FROM pubs..titleauthor ta
JOIN pubs..titles t ON t.title_id = ta.title_id
WHERE a.au_id = ta.au_id
FOR XML RAW('titles'), TYPE)
FROM pubs..authors a
FOR XML RAW('authors'), TYPE
Rather than a regular join query, I use a subquery, for the titles, because I only want one node per author with all titles. (With a join, I get one author node for each title, so that authors with many books appear in multiple nodes.) The subquery also uses FOR XML to create a nested XML document, and this time the TYPE option is mandatory to get things to work the way we want. (It is actually possible to construct an XML document with one node per author, if you use FOR XML AUTO, but I leave it to the reader to explore this option.)

To retrieve the titles from the XML document, you could use this query:

SELECT au_id = T.item.value('../@au_id', 'varchar(11)'),
title = T.item.value('@title', 'varchar(80)')
FROM @x.nodes('/authors/titles') AS T(item)
Is this a useful method or is just cute? I got a mail from David Walker, and he went as far as saying this is the only method that really works. And undoubtedly, it less contrived than using the CLR. You are not caught up with the limitations of table-valued functions, nor with the nesting limitation of INSERT-EXEC. You also have the flexibility that if you add a column to the XML document, because one caller needs it, you don't need to change all other callers. And XML is certainly is a better choice than OPENQUERY. Compared to temp tables and process-keyed tables, you don't have to be worried about recompilation, and nor that programmers fails to clean up a process-keyed table after use.

When it comes to performance, you get some cost for building the XML document and parsing it shortly thereafter. Then again, you don't need to write to disk, so XML may still be more efficient. Larger XML documents are likely to spill to disk, though. A general caveat is that inappropriate addressing in a large XML document may be a real performance buster. I would recommend that you benchmark, if performance is a concern for you.

So XML certainly has its benefits. Still... One problem with XML is that is more sensitive to errors. If you make a spelling mistake in the argument to value, you will silently get NULL back, and no error message. Likewise, if you get the argument to nodes wrong, you will simply get no rows back. The same problem arises if you change a column alias or a node name in the FOR XML query, and forget to update a caller. When you use process-keyed tables or temp tables, you get errors at compile time or at least at run-time. One further problem is that we have to specify the data type for each column in the call to value, which can lead to errors if we do it wrong. True, this is similar to having the declaration of a temp table in two placea, but in this case there are some possible workarounds (preprocessor, ALTER TABLE), but for XML this repetition is unavoidable.

One advantage with using tables, particularly process-keyed tables, is that the schema is better documented in a table, than in an XML document which is created in a stored procedure. Admittedly, this is more of interest when you have many callers to the same callee that implements some core functionality. For a single pair of caller and callee this is less of an issue.

Personally, I can't but see that this method has k-l-u-d-g-e spelled all over it. We are in a relational database, and of course we want to work with data in tables. Yes, XML can be a good solution, but somehow it makes me think of the proverbial camel being pushed through the needle's eye. So, my preference is for temp tables and process-keyed tables, but as they say, your mileage may vary. And as long we don't have output table parameters, undeniably all methods are kludges to some extent.

Using Cursor Variables
This method was suggested by Peter Radocchia. Cursor variables were introduced in SQL 7, but I suspect that many SQL developers are at most only dimly aware of their existence. I never use them myself. Here is an example of how you use them to bring the result set from one procedure to another:

CREATE PROCEDURE get_cursor @cursor CURSOR VARYING OUTPUT AS
SET @cursor = CURSOR STATIC FOR
SELECT au_id, au_lname, au_fname FROM pubs..authors
OPEN @cursor
go
CREATE PROCEDURE caller AS
DECLARE @cursor CURSOR
DECLARE @au_id char(11),
@au_fname varchar(40),
@au_lname varchar(40)
SET NOCOUNT ON
EXEC get_cursor @cursor OUTPUT
WHILE 1 = 1
BEGIN
FETCH NEXT FROM @cursor into @au_id, @au_lname, @au_fname
IF @@fetch_status <> 0
BREAK
PRINT 'Au_id: ' + @au_id + ', name: ' + @au_fname + ' ' + @au_lname
END
DEALLOCATE @cursor
go
EXEC caller
Note that the cursor is STATIC. A FAST_FORWARD may also work, but you could get problems if you use the default keyset-driven cursors.

I will have to admit that I see little reason to use this method. Just like INSERT-EXEC, this method requires an exact match between the caller the callee for the column list. And since data is processed row by row, performance is likely to take a serious toll if there are any volumes.

The Challenges of Linked Servers
If you procedures are on different servers, your options are reduced, particularly for input. So let's look at output first.

Output
If all you want to do is to get data back, several of the methods listed above works:

OUTPUT parameters – but only for data types that are 8000 bytes or less. That is, output parameters that are xml, varchar(MAX) etc cannot be retrieved.

INSERT-EXEC – this works as long as the result set does not include types that are not permitted in remote operations. One such example is xml. But varchar(MAX) is OK.

OPENQUERY – since OPENQUERY is a feature for linked servers in the first place, there is no difference to what I discussed above.

Using the CLR – I have actually not tested this, but it should be no different from what I said above.

XML – You cannot pass a XML document as an OUTPUT parameter, unless you convert it to (n)varchar, but then it must not exceed 8000 bytes. However, if you use INSERT-EXEC, you can use XML documents of unlimited length, if you convert them to nvarchar(MAX).

The other methods do not work, and that includes user-defined functions. Remote user-defined function calls are not permitted.

Input
If you want to pass large of data for input over a linked server, there are two possibilities. Or two kludges if you like.

One is to use XML, but since the xml data type is not supported in remote procedure calls, the caller needs to pass the XML document as nvarchar(MAX) or varbinary(MAX). Or call the linked server directly through a CLR procedure. The parameter on the other side can still be xml.

The other alternative is really messy. The caller stores the data in a process-keyed table locally and then calls the remote procedure, passing the process-key. The remote procedure then calls back to the first server and either selects directly from process-keyed table, or calls a procedure on the source server with INSERT-EXEC.

Table-valued parameters may sound like a good idea, but they are not permitted in calls to remote procedures.

Acknowledgments and Feedback
The issue about using SET FMTONLY ON is something that I learnt from Umachandar Jayachandran at Microsoft. SQL Server MVP Tony Rogerson pointed out that a process-keyed table should have a clustered index on the process key. Simon Hayes suggested some clarifications. Peter Radocchia suggested the cursor method. Richard St-Aubin suggested an interesting trick when sharing temp tables. Thanks to SQL Server MVP Iztik Ben-Gan for making me aware of global temp tables and startup procedures. Sankar Reddy pointed out to me that my original suggestion for XML as a solution for linked servers was flawed. Greg Borota pointed out that an old leftover from SQL 2000 still was in the text. SQL Server MVP Adam Machanic made some interesting revelations about INSERT-EXEC with dynamic SQL. David Walker encouraged me to write more in depth on XML, and SQL Server MVP Denis Gobo gave me a tip on that part. Jay Michael pointed out an error in the section on table parameters.

If you have suggestions for improvements, corrections on topic, language or formatting, please mail me at esquel@sommarskog.se. If you have technical questions that any knowledgeable person could answer, I encourage you to post to any of the newsgroups microsoft.public.sqlserver.programming or comp.databases.ms-sqlserver.
Revision History
2010-01-10 – Extended the XML section with more examples and a deeper discussions on pros and cons. Updated the section table parameters for the fact that SQL 2008 is no longer in beta, and fixed error in code sample. Modified the section on OPENQUERY to explain why FMTONLY ON exists more accurately.

2009-06-29 – Added a brief discussion on performance about INSERT-EXEC with dynamic SQL, and a reference to a blog post from SQL Server MVP Adam Machanic.

2009-05-18 – The section on INSERT-EXEC said that it does not work with table variables, which is right on SQL 2000 only.

2008-08-16 – Added a trick for sharing temp tables, suggested by Richard St-Aubin.

2008-06-06 – Added a section on linked servers, and removed the note on linked servers in the XML section, since it was not very accurate.

2008-03-03 – Added a section on how could use the CLR when INSERT-EXEC fails you. Reviewed the section on XML anew, pointing out that it's useful when working with linked servers.

2007-09-23 – Added sections on global temp tables and table parameters. Reviewed the section on XML.

2005-12-19 – Article revised to cover SQL 2005, and added section on cursor variables.

2005-03-27 – Various minor clarifications on suggestion from Simon Hayes. The bug about INSERT-EXEC and IMPLICIT_TRANSACTIONS is now fixed in SQL 2000 SP4 and SQL 2005.

Back to my home page.

No comments:

Post a Comment