rschandrastechblog: September 2010

Tuesday, September 14, 2010

Previously committed rows might be missed if NOLOCK hint is used

http://blogs.msdn.com/b/sqlcat/archive/2007/02/01/previously-committed-rows-might-be-missed-if-nolock-hint-is-used.aspx

Previously committed rows might be missed if NOLOCK hint is used
Rate This
LuborK 1 Feb 2007 2:05 PM
Comments 9
I received a question from one of our customers about using the NOLOCK hint: can it cause missing rows in scans even if the rows were committed well before my SELECT with NOLOCK starts?

The NOLOCK hint is employed by many users to avoid contention on tables where updates are performed concurrently with selects. The popular belief is that the only side effect of using the NOLOCK hint is that it may cause reading uncommitted rows. These would be the rows that may be either rolled back or are part of larger concurrent transaction from which some rows may not be retrieved because they had not been written to the pages at the time the NOLOCK SELECT has been scanning them. It is much less obvious that even rows committed a long time before my NOLOCK transaction started might be skipped in the scan. (After I asked several SQL Server experts around me, I found that no one except for the development lead who owns the component in question knew this.) Additionally, a scenario theoretically exists (I don’t have a repro yet; read below) where the same record might be accounted for twice if SELECT reads a table with a NOLOCK hint.

This blog contains a simple repro – two scripts that you have to run on two connections concurrently showing double-counting rows. The third script is used to identify cases when the count of rows in the table decreased in the subsequent scan despite the fact we are only inserting rows into the table.

I will then show how you can realize the same concurrency benefit in SQL Server 2005 that is provided by NOLOCK while maintaining transactional consistency, and without using the NOLOCK hint. Later, I will explain which table scans might have this problem and which will not (even with NOLOCK), and how you can distinguish the scans by investigating their showplans. Then, I put up a challenge – create a repro that shows double-counting previously committed rows with the NOLOCK hint.

First, you should understand that SQL Server is using two kinds of scans for reading data from tables and indexes: allocation scans and range scans. An allocation scan is scanning the pages in physical (allocation) order as they are laid down on the disk. In this situation, no particular logical order of rows is guaranteed. If the optimizer requires rows to be scanned in a certain order, then a range scan is employed. This is usually when there is a predicate on the leading column of an index or if the optimizer requires visiting the rows in the logical order (for example, because of aggregation or ORDER BY in the query).

Next, you should know that if a new row is inserted into an index (clustered or not), it might cause page split. This happens if the existing page is full and the new row belongs between two existing rows (because of the key value of the new row is between the key values of the two rows) and it cannot fit physically into the same page. The page is split into two (approximately ½ empty each), and the new row is inserted where it belongs.

Here is a simple scenario that shows that the NOLOCK hint might miss some rows when concurrently executed updates (INSERTs in our case) are causing page splits. The INSERTs in script #2 will cause page splits because the table has a clustered key. First, you insert two rows, one with key value 1, and another with key value 10000. Then, you insert rows with key values 2, 9999, 3, 9998, etc. When the first page is full, it is split as described above. However, if the concurrently performed SELECT is using the NOLOCK hint on the same table, and the optimizer had chosen the allocation scan, the scan is moving only forward. Therefore, it might miss the ½ page of rows if the newly allocated page falls behind the scan.

Here are the scripts:

Script #1 – This script is trying to count the rows in the table t1. You can run three versions; see the “activate only one…” comment in the script. The first version has no hint, the second has the NOLOCK hint, and the third has both the NOLOCK hint and a BETWEEN predicate. The predicate causes the optimizer to select the range scan instead of the unordered scan in the previous two cases. You start this script and let it loop on the while statement until 10000 rows are inserted into the table t1 by the second script.

use test;

go

drop table tcnt;

create table tcnt(num int, cnt int);

drop table t1;

create table t1 (a int primary key, b char(500));

declare @i int, @tcnt1 int, @tcnt2 int;

set @i=1;

set @tcnt1=0;

set @tcnt2=0;

insert into tcnt values (0,0);

while (@tcnt2<10000) begin -- Activate only one of the following three SELECT statements and compare the behaviors -- select @tcnt2=count(*) from t1; --- this will perform the correct count -- select @tcnt2=count(*) from t1 with (NOLOCK); -- this one might skip some previously committed records --select @tcnt2=count(*) from t1 with (NOLOCK) where a between 0 and 10001; -- this one will work correctly because the predicate is causes the optimizer to select a range scan if (@tcnt2<>@tcnt1)

begin

insert into tcnt values (@i,@tcnt2);

set @i=@i+1;

set @tcnt1=@tcnt2;

end;

end;

Script #2 – You activate this script from another connection against the same database where you ran the first one.

use test;

go

declare @imin int, @imax int;

set @imin=1;

set @imax=10000;

while (@imax>@imin)

begin

insert into t1 values (@imin,'x');

insert into t1 values (@imax,'y');

set @imin=@imin+1;

set @imax=@imax-1;

end;

Script #3 – You run this one to investigate the result after the two previous scripts are finished. It will show all cases when the count(*) value decreased after additional committed inserts were performed. You can observe this only in the second scenario in script #1, when you run the query with NOLOCK hint and without the BETWEEN predicate. If NOLOCK is not used or the NOLOCK hint is used in conjunction with the BETWEEN predicate, the following query returns an empty result:

select * from tcnt t1, tcnt t2 where t1.numt2.cnt;

If the statement select @tcnt2=count(*) from t1 with (NOLOCK); is executed in script #1, the result of the above select * is non-empty. The following shows some of many rows I see in my output:

num cnt num cnt

----------- ----------- ----------- ----

622 1189 624 1187

623 1191 624 1187

687 1310 690 1309

688 1312 690 1309

689 1313 690 1309

1044 2004 1051 2003

1045 2006 1051 2003

1046 2008 1051 2003

1047 2010 1051 2003

1048 2012 1051 2003

1049 2014 1051 2003

1050 2015 1051 2003

1460 2817 1463 2815

……..

The “num” column is the sequence number of the count(*) against the table where we inserted rows. The “cnt” column is the result of the same count(*). Therefore, one can expect that if my count(*) #622 counted 1189 rows, count #624 must logically see more rows because we are only inserting. However, we see only 1187 rows. This shows that scan #624 must have missed some rows we have encountered in scan #622. An important detail is that the insert transaction of each single row was committed, which is the default behavior.

In SQL Server 2005, no-blocking scans can be performed by using alter database set READ_COMMITTED_SNAPSHOT ON (for more information, see (http://msdn2.microsoft.com/en-us/library/ms189050.aspx). This is the best way to get rid of the NOLOCK statements, achieve greater concurrency, and return consistent data. But be aware that there can be negative performance implications – for example, the READ_COMMITTED_SNAPSHOT will force using range scans while the NOLOCK hint is using allocation scan. Paul Randal’s blog at http://blogs.msdn.com/sqlserverstorageengine/archive/2006/11/09/when-can-allocation-order-scans-be-used.aspx explains many cases when SQL Server might use allocation scans. However, the blog does not mention cases when the allocation scan is replaced by range scan even if NOLOCK (or TABLOCK or READ UNCOMMITED SNAPSHOT isolation) is used. You can find out if range scan is used with NOLOCK hint by investigating the graphical showplan. There you will see the following Seek Predicate property (at least one of the Start or End Range must be present for the scan to be “range scan”):

Start Range: [test].[dbo].[t1].a >= CONVERT_IMPLICIT(int,[@1],0), End Range: [test].[dbo].[t1].a <= CONVERT_IMPLICIT(int,[@2],0) Similarly, in the XML showplan, you will find either something similar to StartRange ScanType="GE” or EndRange ScanType="LE" (note the GE is for the >= , and LE is for the <= predicate type, respectively).

I’m challenging readers to find similar reproductions showing twice-scanned rows. Please submit your solution here, or submit feedback to http://blogs.msdn.com/sqlcat/contact.aspx. I will reward the first correct solution by sending the author one of the books from the “Inside Microsoft® SQL Server™ 2005” series (http://www.microsoft.com/mspress/findabook/list/series_SH.aspx).

After I finished the above investigation I searched the Web and found Tony Rogerson’s blog with the same theme – skipping and double-counting previously committed rows when NOLOCK hint is used. The blog is located at http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/10/1280.aspx. Maybe I should have searched first; but this way, I enjoyed the discovery, and creating the repro. I decided to post my own blog for several reasons - the repro is a bit different, I explain the discrepancy between the range and allocation scans, and I show how you can find out if your NOLOCK is returning unexpected row sets. Most importantly, I think there are still many users using the NOLOCK hint without realizing the potentially serious consequences that both blogs describe.

Lubor Kollar

SQL Server Development

Customer Advisory Team

Table Hints : NOLOCK AND READPAST

http://articles.techrepublic.com.com/5100-10878_11-6185492.html

Takeaway: Table locking hints provide developers much tighter control of their transactions. Look at the benefits and disadvantages of using the NOLOCK and READPAST table hints in SQL Server.

When data in a database is read or modified, the database engine uses special types of controls, called locks, to maintain integrity in the database. Locks basically work by making sure database records involved in a transaction cannot be modified by other transactions until the first transaction has committed, ensuring database consistency.

Get SQL tips in your inbox

TechRepublic's SQL Server newsletter, delivered each Tuesday, contains hands-on tips that will help you become more adept with this powerful relational database management system.
Automatically sign up today! When designing database applications, you should keep in mind the different types of locks that will be issued, and the different levels of isolation your transactions will occur. Typically, the SQL Server defaults work fine for what you are trying to accomplish. However, there will be times when it is advantageous to manually make hints to how locks are issued on your tables in your SQL statements.

This article focuses on two table hints: NOLOCK and READPAST. I'll set up a table to use for our example queries. Execute the script in Listing A to create the SalesHistory table and populate the table with data.

NOLOCK
This table hint, also known as READUNCOMMITTED, is applicable to SELECT statements only. NOLOCK indicates that no shared locks are issued against the table that would prohibit other transactions from modifying the data in the table.

The benefit of the statement is that it allows you to keep the database engine from issuing locks against the tables in your queries; this increases concurrency and performance because the database engine does not have to maintain the shared locks involved. The downside is that, because the statement does not issue any locks against the tables being read, some "dirty," uncommitted data could potentially be read. A "dirty" read is one in which the data being read is involved in a transaction from another connection. If that transaction rolls back its work, the data read from the connection using NOLOCK will have read uncommitted data. This type of read makes processing inconsistent and can lead to problems. The trick is being able to know when you should use NOLOCK.

As a side note, NOLOCK queries also run the risk of reading "phantom" data, or data rows that are available in one database transaction read but can be rolled back in another. (I will take a closer look at this side effect in part two of this article series.)

The following example shows how NOLOCK works and how dirty reads can occur. In the script below, I begin a transaction and insert a record in the SalesHistory table.

BEGIN TRANSACTION INSERT INTO SalesHistory (Product, SaleDate, SalePrice) VALUES ('PoolTable', GETDATE(), 500) The transaction is still open, which means that the record that was inserted into the table still has locks issued against it. In a new query window, run the following script, which uses the NOLOCK table hint in returning the number of records in the SalesHistory table.

SELECT COUNT(*) FROM SalesHistory WITH(NOLOCK)The number of records returned is 301. Since the transaction that entered the record into the SalesHistory table has not been committed, I can undo it. I'll roll back the transaction by issuing the following statement:

ROLLBACK TRANSACTIONThis statement removes the record from the SalesHistory table that I previously inserted. Now I run the same SELECT statement that I ran earlier:

SELECT COUNT(*) FROM SalesHistory WITH(NOLOCK)This time the record count returned is 300. My first query read a record that was not yet committed -- this is a dirty read.

READPAST
This is a much less commonly used table hint than NOLOCK. This hint specifies that the database engine not consider any locked rows or data pages when returning results.

The advantage of this table hint is that, like NOLOCK, blocking does not occur when issuing queries. In addition, dirty reads are not present in READPAST because the hint will not return locked records. The downside of the statement is that, because records are not returned that are locked, it is very difficult to determine if your result set, or modification statement, includes all of the necessary rows. You may need to include some logic in your application to ensure that all of the necessary rows are eventually included.

The READPAST table hint example is very similar to the NOLOCK table hint example. I'll begin a transaction and update one record in the SalesHistory table.

BEGIN TRANSACTION UPDATE TOP(1) SalesHistory SET SalePrice = SalePrice + 1Because I do not commit or roll back the transaction, the locks that were placed on the record that I updated are still in effect. In a new query editor window, run the following script, which uses READPAST on the SalesHistory table to count the number of records in the table.

SELECT COUNT(*)
FROM SalesHistory WITH(READPAST)My SalesHistory table originally had 300 records in it. The UPDATE statement is currently locking one record in the table. The script above that uses READPAST returns 299 records, which means that because the record I am updating is locked, it is ignored by the READPAST hint.

Tim Chapman is a SQL Server database administrator who works for a bank in Louisville, KY, and has more than 7 years of IT experience. He is also Microsoft certified in SQL Server 2000 and SQL Server 2005. If you would like to contact Tim, please e-mail him at chapman.tim@gmail.com.

Monday, September 6, 2010

How to find duplicate records in a table ?

select * from emp

group by (empid)

having count(empid)>1

--------------------------------------------------------------------------------

select * from emp where EmpID in(select EmpID
from emp
group by EmpID having count(EmpID)>1)

SQL SCANS AND SEEKS

http://itknowledgeexchange.techtarget.com/sql-server/back-to-basics-whats-the-difference-between-a-scan-and-a-seek/

There are a few basic operations which SQL will perform when looking for the data that you need. Here they are listed in the order of worst to best.

Table Scan OR Clustered Index Scan (Essentially Same)
Index Scan
Clustered Index Seek
Index Seek

The basic rule to follow is Scans are bad, Seeks are good.

When SQL Server does a scan it loads the object which it wants to read from disk into memory, then reads through that object from top to bottom looking for the records that it needs.

When SQL Server does a seek it knows where in the index that the data is going to be, so it loads up the index from disk, goes directly to the part of the index that it needs and reads to where the data that it needs ends. This is obviously a must more efficient operation than a scan, as SQL already knows where the data is that it is looking for.

When SQL Server is looking for your data probably one of the largest things which will make SQL Server switch from a seek to a scan is when some of the columns are you looking for are not included in the index you want it to use. Most often this will have SQL Server fall back to doing a clustered index scan, since the Clustered index contains all the columns in the table. This is one of the biggest reasons (in my opinion at least) that we now have the ability to INCUDE columns in an index, without adding those columns to the indexed columns of the index. By including the additional columns in the index we increase the size of the index, but we allow SQL Server to read the index, without having to go back to the clustered index, or to the table it self to get these values.

We’ll look at this more shortly when we look at execution plans.

Sunday, September 5, 2010

A Quick Question : How will add code in your page to "Bookmark this site" or "Bookmark this page" ?

window.external.AddFavorite('URL','bookmark text')

Note that this works in IE only.

Bookmark Us

Indexes Good Practices

1. Build index on columns of integer type

Integers take less space to store, which means the query will be faster. If the column you want to build an index for is not of type integer, consider creating a surrogate integer key (or simply a surrogate column of type integer) which maps one-to-one to the column you want to build the index for.

2. Keep index as narrow as possible

Narrower indexes take less space, require less time to process, which in turn means the query will run faster.

3. Column order is important

For indexes covering multiple columns, the order of the columns in the index is important. The best practice is to use the column with the lowest cardinality first, and the column with the highest cardinality last. Recall cardinality means the number of distinct values for that column. So, if "SELECT DISTINCT (COLUMN1) FROM TABLE_NAME;" returns 5, that means the cardinality for COLUMN1 is 5.

4. Make sure the column you are building an index for is declared NOT NULL

This can decrease the size of the index, which in turn will speed up the query.

Also look at

http://blog.sqlauthority.com/2009/01/18/sql-server-difference-between-index-scan-and-index-seek/

http://blog.sqlauthority.com/2009/08/24/sql-server-index-seek-vs-index-scan-diffefence-and-usage-a-simple-note/

http://blog.sqlauthority.com/2007/03/30/sql-server-index-seek-vs-index-scan-table-scan/

Friday, September 3, 2010

How to get GridView RowIndex in RowCommand function

Protected Sub RouteDataGridView_RowCommand(ByVal sender As Object, ByVal e As System.Web.UI.WebControls.GridViewCommandEventArgs) Handles RouteDataGridView.RowCommand

''MAY BE REQUIRED LATER FOR BUTTONS
''If (e.CommandName = "DeleteRoutepoints") Then
''Dim grrow As GridViewRow = CType(CType(e.CommandSource, Button).Parent.Parent, GridViewRow)
''cRoute.DeleteRouteConfiguration(grrow.Cells(0).Text.ToString.Trim, grrow.Cells(1).Text.ToString.Trim, grrow.Cells(2).Text.ToString.Trim)
''End If

End Sub

rschandrastechblog