<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Sql By Minh&#039;s Blog</title>
	<atom:link href="http://sqlbyminh.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://sqlbyminh.wordpress.com</link>
	<description>A Microsoft SQL Server / SQL Web Blog</description>
	<lastBuildDate>Wed, 15 Dec 2010 18:29:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='sqlbyminh.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Sql By Minh&#039;s Blog</title>
		<link>http://sqlbyminh.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://sqlbyminh.wordpress.com/osd.xml" title="Sql By Minh&#039;s Blog" />
	<atom:link rel='hub' href='http://sqlbyminh.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Database Normalization &#8211; 2NF HOW-TO</title>
		<link>http://sqlbyminh.wordpress.com/2009/11/08/database-normalization-2nf-how-to/</link>
		<comments>http://sqlbyminh.wordpress.com/2009/11/08/database-normalization-2nf-how-to/#comments</comments>
		<pubDate>Sun, 08 Nov 2009 04:07:25 +0000</pubDate>
		<dc:creator>sqlbyminh</dc:creator>
				<category><![CDATA[MSSQL - Basics]]></category>
		<category><![CDATA[1NF]]></category>
		<category><![CDATA[2NF]]></category>
		<category><![CDATA[first normal form]]></category>
		<category><![CDATA[normalization]]></category>
		<category><![CDATA[repeating groups]]></category>

		<guid isPermaLink="false">http://sqlbyminh.wordpress.com/?p=513</guid>
		<description><![CDATA[In this post we continue our discussion on database normalization with regard to achieving 2NF<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=513&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Welcome back everyone. In this post we continue our discussion on the data normalization process and in particular <em>second-normal-form</em>, otherwise known as 2NF. Data normalization is an additive process. That is, if your data is 2NF compliant, it is also said to be 1NF compliant. Therefore, if you haven&#8217;t read my post on 1NF, please read it before reading this post, otherwise read on.</p>
<p>So what is 2NF? Well, 2NF is an extension of 1NF. Whereas 1NF&#8217;s intent was to eliminate <em>repeating groups</em>, 2NF&#8217;s intent is to eliminate <em>redundant data</em>. A table is said to be in 2NF if all redundant data has been eliminated. Suppose that for each student in our table, we also maintain their mailing address. Let&#8217;s assume that the student table currently looks like this:</p>
<p><strong>Student Table</strong></p>
<table border="1">
<tbody>
<tr>
<th><span style="font-size:xx-small;">StudentId</span></th>
<th><span style="font-size:xx-small;">LastName</span></th>
<th><span style="font-size:xx-small;">FirstName</span></th>
<th><span style="font-size:xx-small;">Addr1</span></th>
<th><span style="font-size:xx-small;">Addr2</span></th>
<th><span style="font-size:xx-small;">City</span></th>
<th><span style="font-size:xx-small;">State</span></th>
<th><span style="font-size:xx-small;">Zip</span></th>
</tr>
<tr>
<td><span style="font-size:xx-small;">1</span></td>
<td><span style="font-size:xx-small;">Smith</span></td>
<td><span style="font-size:xx-small;">John</span></td>
<td><span style="font-size:xx-small;">1123 Westchester Blvd.</span></td>
<td></td>
<td><span style="font-size:xx-small;">Covina</span></td>
<td><span style="font-size:xx-small;">CA</span></td>
<td><span style="font-size:xx-small;">91311</span></td>
</tr>
<tr>
<td>2</td>
<td><span style="font-size:xx-small;">Ericson</span></td>
<td><span style="font-size:xx-small;">Robert</span></td>
<td><span style="font-size:xx-small;">655 5th Street</span></td>
<td><span style="font-size:xx-small;">Apt. 2</span></td>
<td><span style="font-size:xx-small;">Marina Del Rey</span></td>
<td><span style="font-size:xx-small;">CA</span></td>
<td><span style="font-size:xx-small;">91365</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">3</span></td>
<td><span style="font-size:xx-small;">Edwards</span></td>
<td><span style="font-size:xx-small;">Abby</span></td>
<td><span style="font-size:xx-small;">944 Main Street</span></td>
<td><span style="font-size:xx-small;">P.O. Box 11</span></td>
<td><span style="font-size:xx-small;">Beverly Hills</span></td>
<td><span style="font-size:xx-small;">CA</span></td>
<td><span style="font-size:xx-small;">91202</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">4</span></td>
<td><span style="font-size:xx-small;">Richmon</span></td>
<td><span style="font-size:xx-small;">Stanley</span></td>
<td><span style="font-size:xx-small;">3900 Arbor Drive</span></td>
<td><span style="font-size:xx-small;">Apt. 7</span></td>
<td><span style="font-size:xx-small;">West Hills</span></td>
<td><span style="font-size:xx-small;">CA</span></td>
<td><span style="font-size:xx-small;">91374</span></td>
</tr>
</tbody>
</table>
<p>This table is in 1NF but has redundant data, namely, the STATE column. Now, some of you may argue that ADDRESS1 and ADDRESS2 are <em>repeating groups</em>. Afterall, I said in my last post that for the most part, any time you see a numeral behind a column name, it&#8217;s usually indicative of a repeating group. That&#8217;s still the case. However, with regard to address information there is a lot of debate as to wether ADDRESS1 and ADDRESS2 would be considered a repeating group.  After all, both columns contain address information right?</p>
<p>Keep in mind that while ADDRESS1 represents the <em>main</em> (street) address, ADDRESS2 may often represent a sub-unit within the building. While it&#8217;s true that both addresses represent a location, they are in fact different location <em>types</em>. From a relational standpoint, types often map to attributes which map to columns in a table. Therefore, since ADDRESS1 and ADDRESS2 represent different types, many DBAs (including myself) do not consider ADDRESS2 to be part of a repeating group.</p>
<p>You can see that ADDRESS2 could hold P.O. boxes, apartment units, departments, etc. It&#8217;s just easier to label the column ADDRESS2. Now back to our discussion. As mentioned earlier, this table is in 1NF but not 2NF because of the redundant data in the STATE column. Although the CITY and ZIP columns are not redundant, they have the potential to be. Therefore, they need to be in their own tables as dictated by 2NF.</p>
<p>This table also violates 2NF because the <em>non-key</em> columns ADDRESS1, ADDRESS2, CITY, STATE and ZIP are <em>not</em> fully dependant on the primary key (StudentId).  Let me put this as a question.  Given a student&#8217;s last name and first name; would you also need to know the values of ADDRESS1, ADDRESS2, CITY, STATE and ZIP in order to determine the StudentId?  The answer is obviously NO.  You would only need to know the student&#8217;s last and first name.  Given this answer, it&#8217;s clear that these <em>non-key</em> columns are <em>not</em> dependent on the primary key.  Therefore these columns need to be moved to their own table.  Below is the address table broken out:</p>
<p><strong>Student Table</strong></p>
<table border="1">
<tbody>
<tr>
<th><span style="font-size:xx-small;">StudentId</span></th>
<th><span style="font-size:xx-small;">LastName</span></th>
<th><span style="font-size:xx-small;">FirstName</span></th>
</tr>
<tr>
<td><span style="font-size:xx-small;">1</span></td>
<td><span style="font-size:xx-small;">Smith</span></td>
<td><span style="font-size:xx-small;">John</span></td>
</tr>
<tr>
<td>2</td>
<td><span style="font-size:xx-small;">Ericson</span></td>
<td><span style="font-size:xx-small;">Robert</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">3</span></td>
<td><span style="font-size:xx-small;">Edwards</span></td>
<td><span style="font-size:xx-small;">Abby</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">4</span></td>
<td><span style="font-size:xx-small;">Richmon</span></td>
<td><span style="font-size:xx-small;">Stanley</span></td>
</tr>
</tbody>
</table>
<p><strong>Address Table</strong></p>
<table border="1">
<tbody>
<tr>
<th><span style="font-size:xx-small;">StudentId</span></th>
<th><span style="font-size:xx-small;">AddressId</span></th>
<th><span style="font-size:xx-small;">Addr1</span></th>
<th><span style="font-size:xx-small;">Addr2</span></th>
<th><span style="font-size:xx-small;">City</span></th>
<th><span style="font-size:xx-small;">State</span></th>
<th><span style="font-size:xx-small;">Zip</span></th>
</tr>
<tr>
<td><span style="font-size:xx-small;">1</span></td>
<td><span style="font-size:xx-small;">1</span></td>
<td><span style="font-size:xx-small;">1123 Westchester Blvd.</span></td>
<td></td>
<td><span style="font-size:xx-small;">Covina</span></td>
<td><span style="font-size:xx-small;">CA</span></td>
<td><span style="font-size:xx-small;">91311</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">2</span></td>
<td><span style="font-size:xx-small;">2</span></td>
<td><span style="font-size:xx-small;">655 5th Street</span></td>
<td><span style="font-size:xx-small;">Apt. 2</span></td>
<td><span style="font-size:xx-small;">Marina Del Rey</span></td>
<td><span style="font-size:xx-small;">CA</span></td>
<td><span style="font-size:xx-small;">91365</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">3</span></td>
<td><span style="font-size:xx-small;">3</span></td>
<td><span style="font-size:xx-small;">944 Main Street</span></td>
<td><span style="font-size:xx-small;">P.O. Box 11</span></td>
<td><span style="font-size:xx-small;">Beverly Hills</span></td>
<td><span style="font-size:xx-small;">CA</span></td>
<td><span style="font-size:xx-small;">91202</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">4</span></td>
<td><span style="font-size:xx-small;">4</span></td>
<td><span style="font-size:xx-small;">3900 Arbor Drive</span></td>
<td><span style="font-size:xx-small;">Apt. 7</span></td>
<td><span style="font-size:xx-small;">West Hills</span></td>
<td><span style="font-size:xx-small;">CA</span></td>
<td><span style="font-size:xx-small;">91374</span></td>
</tr>
</tbody>
</table>
<p>Closer but this still violates 2NF.  The columns CITY, STATE and ZIP can still be redundant.  Therefore they need to be moved into their own tables as well.  Below is a new CITY table:</p>
<p><strong>City Table</strong></p>
<table border="1">
<tbody>
<tr>
<th><span style="font-size:xx-small;">CityId</span></th>
<th><span style="font-size:xx-small;">City</span></th>
</tr>
<tr>
<td><span style="font-size:xx-small;">1</span></td>
<td><span style="font-size:xx-small;">Covina</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">2</span></td>
<td><span style="font-size:xx-small;">Marina Del Rey</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">3</span></td>
<td><span style="font-size:xx-small;">Beverly Hills</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">4</span></td>
<td><span style="font-size:xx-small;">West Hills</span></td>
</tr>
</tbody>
</table>
<p>Now for the STATE table:</p>
<p><strong>State Table</strong></p>
<table border="1">
<tbody>
<tr>
<th><span style="font-size:xx-small;">StateId</span></th>
<th><span style="font-size:xx-small;">State</span></th>
</tr>
<tr>
<td><span style="font-size:xx-small;">1</span></td>
<td><span style="font-size:xx-small;">CA</span></td>
</tr>
</tbody>
</table>
<p>Now for the ZIP table:</p>
<p><strong>Zip Table</strong></p>
<table border="1">
<tbody>
<tr>
<th><span style="font-size:xx-small;">ZipId</span></th>
<th><span style="font-size:xx-small;">Zip</span></th>
</tr>
<tr>
<td><span style="font-size:xx-small;">1</span></td>
<td><span style="font-size:xx-small;">91311</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">2</span></td>
<td><span style="font-size:xx-small;">91365</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">3</span></td>
<td><span style="font-size:xx-small;">91202</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">4</span></td>
<td><span style="font-size:xx-small;">91374</span></td>
</tr>
</tbody>
</table>
<p>Each of the <i>lookup tables</i> above are also in 2NF because the lookup description is<i> dependent</i> on the primary keys.  That is, you must know the ID for the given record in the given table to find the description and vise-versa.</p>
<p>Here is the revised ADDRESS table.  It now uses <i>foreign keys</i> to reference the lookup tables that we just created above.</p>
<p><strong>Address Table</strong></p>
<table border="1">
<tbody>
<tr>
<th><span style="font-size:xx-small;">StudentId</span></th>
<th><span style="font-size:xx-small;">AddressId</span></th>
<th><span style="font-size:xx-small;">Addr1</span></th>
<th><span style="font-size:xx-small;">Addr2</span></th>
<th><span style="font-size:xx-small;">City</span></th>
<th><span style="font-size:xx-small;">State</span></th>
<th><span style="font-size:xx-small;">Zip</span></th>
</tr>
<tr>
<td><span style="font-size:xx-small;">1</span></td>
<td><span style="font-size:xx-small;">1</span></td>
<td><span style="font-size:xx-small;">1123 Westchester Blvd.</span></td>
<td></td>
<td><span style="font-size:xx-small;">1</span></td>
<td><span style="font-size:xx-small;">1</span></td>
<td><span style="font-size:xx-small;">1</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">2</span></td>
<td><span style="font-size:xx-small;">2</span></td>
<td><span style="font-size:xx-small;">655 5th Street</span></td>
<td><span style="font-size:xx-small;">Apt. 2</span></td>
<td><span style="font-size:xx-small;">2</span></td>
<td><span style="font-size:xx-small;">1</span></td>
<td><span style="font-size:xx-small;">2</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">3</span></td>
<td><span style="font-size:xx-small;">3</span></td>
<td><span style="font-size:xx-small;">944 Main Street</span></td>
<td><span style="font-size:xx-small;">P.O. Box 11</span></td>
<td><span style="font-size:xx-small;">3</span></td>
<td><span style="font-size:xx-small;">1</span></td>
<td><span style="font-size:xx-small;">3</span></td>
</tr>
<tr>
<td><span style="font-size:xx-small;">4</span></td>
<td><span style="font-size:xx-small;">4</span></td>
<td><span style="font-size:xx-small;">3900 Arbor Drive</span></td>
<td><span style="font-size:xx-small;">Apt. 7</span></td>
<td><span style="font-size:xx-small;">4</span></td>
<td><span style="font-size:xx-small;">1</span></td>
<td><span style="font-size:xx-small;">4</span></td>
</tr>
</tbody>
</table>
<p>That&#8217;s it for 2NF.  In the next post we will take this another step and cover 3NF.  See you then.</p>
<p>Minh</p>
<br /> Tagged: 1NF, 2NF, first normal form, normalization, repeating groups <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sqlbyminh.wordpress.com/513/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sqlbyminh.wordpress.com/513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sqlbyminh.wordpress.com/513/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sqlbyminh.wordpress.com/513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sqlbyminh.wordpress.com/513/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sqlbyminh.wordpress.com/513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sqlbyminh.wordpress.com/513/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sqlbyminh.wordpress.com/513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sqlbyminh.wordpress.com/513/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sqlbyminh.wordpress.com/513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sqlbyminh.wordpress.com/513/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sqlbyminh.wordpress.com/513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sqlbyminh.wordpress.com/513/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sqlbyminh.wordpress.com/513/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=513&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sqlbyminh.wordpress.com/2009/11/08/database-normalization-2nf-how-to/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/5f2c86e3673f90375a5664df9574fed5?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">sqlbyminh</media:title>
		</media:content>
	</item>
		<item>
		<title>Database Normalization &#8211; 1NF HOW-TO</title>
		<link>http://sqlbyminh.wordpress.com/2009/07/09/database-normalization-1nf-how-to/</link>
		<comments>http://sqlbyminh.wordpress.com/2009/07/09/database-normalization-1nf-how-to/#comments</comments>
		<pubDate>Thu, 09 Jul 2009 04:30:29 +0000</pubDate>
		<dc:creator>sqlbyminh</dc:creator>
				<category><![CDATA[MSSQL - Basics]]></category>
		<category><![CDATA[1NF]]></category>
		<category><![CDATA[data integrity]]></category>
		<category><![CDATA[first normal form]]></category>
		<category><![CDATA[normalization]]></category>
		<category><![CDATA[redundant data]]></category>
		<category><![CDATA[relation]]></category>
		<category><![CDATA[repeating groups]]></category>

		<guid isPermaLink="false">http://sqlbyminh.wordpress.com/?p=448</guid>
		<description><![CDATA[What is data normalization?  Why should we normalize our data and how do we do it?<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=448&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In this post we are going to tackle the concept of <em>database normalization</em> and how it can be applied.  Before we get into that however, let&#8217;s look at the intent of data normalization.  In short, normalization, as it relates to the field of relational database design, is an effort to reduce or eliminate redundant data.  This is done by breaking the data down into well-defined constructs and then associating them to each other through the use of defined relationships.</p>
<p>So what is <em>redundant data</em>?  By definition, it is the <em>unnecessary duplication of data</em>.  Why should we strive to eliminate it?  First, it will reduce the overall storage requirement of the data.  Secondly, it will make querying the data much easier.  Thirdly, it makes it easy to change data.  We&#8217;ll see an example of these benefits later in this series on normalization.</p>
<p>Notice that I did not say that normalization will guarantee <em>data integrity</em>.  This is also important, however, strictly speaking, normalizaiton <em>promotes</em> data integrity but does not necessarily govern it.  This is, in my experience, left to the DBA to define and implement.</p>
<p>So what is an example of redundant data?  Take a look at the table below.  It&#8217;s depicts a fictional table used to store student registration data.  Can you see the redundant data?</p>
<table border="1">
<tr>
<th>Student</th>
<th>Class1</th>
<th>Instructor1</th>
<th>Class2</th>
<th>Instructor2</th>
</tr>
<tr>
<td>Smith, John</td>
<td>Biology</td>
<td>Darwin, Charles</td>
<td>Math</td>
<td>Pascal, Blaise</td>
</tr>
<tr>
<td>Ericson, Robert</td>
<td>Biology</td>
<td>Darwin, Charles</td>
<td>Chemistry</td>
<td>Bohr, Niels</td>
</tr>
<tr>
<td>Edwards, Abby</td>
<td>Physics</td>
<td>Newton, Isaac</td>
<td>Biology</td>
<td>Darwin, Charles</td>
</tr>
<tr>
<td>Richmon, Stanley</td>
<td>Computer Science</td>
<td>Boyer, Robert</td>
<td>Archaeology</td>
<td>Howard, Carter</td>
</tr>
</table>
<p>This table design violates <em>first-normal-form</em>, also referred to as 1NF.  1NF dictates that a table must not have any <em>repeating groups</em>.  A repeating group occurs when a given <em>instance</em>, in this case STUDENT, has an attribute, in this case <em>CLASS</em> and <em>INSTRUCTOR</em> that contains multiple values.  In this example, the columns CLASS1 and CLASS2 are representitive of a repeating group.  They both convey an attribute type of CLASS with multiple values for each STUDENT.  The values are stored as CLASS1 and CLASS2.  The same is true of the INSTRUCTOR attribute.  More than likely, when you see any column with a numeral such as this example, it&#8217;s a safe bet that it&#8217;s a repeating group.  A novice DBA might redesign the table to look like this in an effort to acheive 1NF:</p>
<table border="1">
<tr>
<th>Student</th>
<th>Class</th>
<th>Instructor</th>
</tr>
<tr>
<td>Smith, John</td>
<td>Biology | Math</td>
<td>Darwin, Charles | Pascal, Blaise</td>
</tr>
<tr>
<td>Ericson, Robert</td>
<td>Biology | Chemistry</td>
<td>Darwin, Charles | Bohr, Niels</td>
</tr>
<tr>
<td>Edwards, Abby</td>
<td>Physics | Biology</td>
<td>Newton, Isaac | Darwin, Charles</td>
</tr>
<tr>
<td>Richmon, Stanley</td>
<td>Computer Science | Archaeology</td>
<td>Boyer, Robert | Howard, Carter</td>
</tr>
</table>
<p>Close but no cigar.  1NF also dictates that a column (attribute) must be <em>atomic</em>.  That is, it can contain only <em>one</em> value.  This second example violates that because the STUDENT, CLASS and INSTRUCTOR columns contain multiple values.  Below is the same data in 1NF:</p>
<p><strong>Student Table</strong></p>
<table border="1">
<tr>
<th>StudentId</th>
<th>LastName</th>
<th>FirstName</th>
</tr>
<tr>
<td>1</td>
<td>Smith</td>
<td>John</td>
</tr>
<tr>
<td>2</td>
<td>Ericson</td>
<td>Robert</td>
</tr>
<tr>
<td>3</td>
<td>Edwards</td>
<td>Abby</td>
</tr>
<tr>
<td>4</td>
<td>Richmon</td>
<td>Stanley</td>
</tr>
</table>
<p><strong>Class Table</strong></p>
<table border="1">
<tr>
<th>ClassId</th>
<th>ClassName</th>
</tr>
<tr>
<td>1</td>
<td>Archaeology</td>
</tr>
<tr>
<td>2</td>
<td>Biology</td>
</tr>
<tr>
<td>3</td>
<td>Chemistry</td>
</tr>
<tr>
<td>4</td>
<td>Computer Science</td>
</tr>
<tr>
<td>5</td>
<td>Math</td>
</tr>
<tr>
<td>6</td>
<td>Physics</td>
</tr>
</table>
<p><strong>Instructor Table</strong></p>
<table border="1">
<tr>
<th>InstructorId</th>
<th>LastName</th>
<th>FirstName</th>
</tr>
<tr>
<td>1</td>
<td>Bohr</td>
<td>Niels</td>
</tr>
<tr>
<td>2</td>
<td>Boyer</td>
<td>Robert</td>
</tr>
<tr>
<td>3</td>
<td>Carter</td>
<td>Howard</td>
</tr>
<tr>
<td>4</td>
<td>Darwin</td>
<td>Charles</td>
</tr>
<tr>
<td>5</td>
<td>Newton</td>
<td>Isaac</td>
</tr>
<tr>
<td>6</td>
<td>Pascal</td>
<td>Blaise</td>
</tr>
</table>
<p>Notice how we&#8217;ve broken down our original table into <em>three</em>, well-defined tables?  Each table contains the same data as our original table in terms of the <em>entity</em> or instance they represent.  The repeating group is gone and each column in the three tables are atomic.  What&#8217;s missing though is some construct (table) that relates the three to each other.  For that, we use what is known as a <em>relation</em> table.  A relation table is one which relates an entity to another entity.  Below is the relation table that would be used to associate the three entities to each other:</p>
<p><strong>Class Relation Table</strong></p>
<table border="1">
<tr>
<th>StudentId</th>
<th>ClassId</th>
<th>InstructorId</th>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>4</td>
</tr>
<tr>
<td>1</td>
<td>5</td>
<td>6</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
<td>4</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>6</td>
<td>5</td>
</tr>
<tr>
<td>3</td>
<td>2</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>2</td>
</tr>
<tr>
<td>4</td>
<td>1</td>
<td>3</td>
</tr>
</table>
<p>As you can see, the table above relates each of the three entities to each other.  We can tell that Smith, John (StudentId = 1) is enrolled in Biology (ClassId = 2) and Math (ClassId = 5).  We know that Biology (ClassId = 2) is taught by Charles Darwin (InstructorId = 4) and that Math (ClassId = 5) is taught by Blaise Pascal (InstructorId = 6).</p>
<p>You will find that normalized data have many tables but few columns.  This is a by-product of the normalization process.  For those of you who are astute readers, you may be pondering the fact that the relation table (Class Relation Table) seems to have some duplicative data.  You are correct.  The StudentId, ClassId and InstructorId do indeed occur multiple times in this table.  That&#8217;s fine though.  Remember that <em>redundant data</em> is defined as the <em>unnecessary duplication of data</em>.  In the case of the relation table, the duplication is necessary (and perfectly legal) to relate the three entities to each other.</p>
<p>Come back in a few days.  We&#8217;ll continue our discussion of the normalization process by exploring the second level of normalization, namely 2NF.  Feel free to comment on this post with your thoughts, suggestions or critiques.</p>
<p>Minh</p>
<br /> Tagged: 1NF, data integrity, first normal form, normalization, redundant data, relation, repeating groups <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sqlbyminh.wordpress.com/448/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sqlbyminh.wordpress.com/448/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sqlbyminh.wordpress.com/448/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sqlbyminh.wordpress.com/448/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sqlbyminh.wordpress.com/448/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sqlbyminh.wordpress.com/448/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sqlbyminh.wordpress.com/448/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sqlbyminh.wordpress.com/448/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sqlbyminh.wordpress.com/448/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sqlbyminh.wordpress.com/448/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sqlbyminh.wordpress.com/448/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sqlbyminh.wordpress.com/448/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sqlbyminh.wordpress.com/448/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sqlbyminh.wordpress.com/448/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=448&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sqlbyminh.wordpress.com/2009/07/09/database-normalization-1nf-how-to/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/5f2c86e3673f90375a5664df9574fed5?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">sqlbyminh</media:title>
		</media:content>
	</item>
		<item>
		<title>SQL &#8211; Optimizing SELECT Statement for Speed</title>
		<link>http://sqlbyminh.wordpress.com/2009/07/02/sql-optimizing-select-statement-for-speed/</link>
		<comments>http://sqlbyminh.wordpress.com/2009/07/02/sql-optimizing-select-statement-for-speed/#comments</comments>
		<pubDate>Thu, 02 Jul 2009 20:01:27 +0000</pubDate>
		<dc:creator>sqlbyminh</dc:creator>
				<category><![CDATA[MSSQL - Basics]]></category>
		<category><![CDATA[optimize]]></category>
		<category><![CDATA[select]]></category>

		<guid isPermaLink="false">http://sqlbyminh.wordpress.com/?p=397</guid>
		<description><![CDATA[These are some general performance guide lines regarding the SELECT statement.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=397&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been awhile since my last post.  Been busy with a server relocation.  Here&#8217;s my latest.</p>
<p>The SELECT statement is the most often used statment in any relational database.  It is therefore necessary to optimize any such statement so that data can be returned as quickly as possible.  SELECT statements can block, cause deadlocks, memory pressure and CPU pressure depending on how they&#8217;re written.  In this post we&#8217;ll discuss how to write a SELECT statement to mitigate some of these issues.</p>
<p>Here is a list of DOs and DON&#8217;Ts when writing a SELECT statement:</p>
<ol>
<li>DO return only the columns that you need.</li>
<li>DO return only the rows that you need.</li>
<li>DO use only the appropriate index(es) in your statement.</li>
<li>DO use the appropriate locking granularity.</li>
<li>DO cache often <em>used</em> datasets into a table variable or temp table.</li>
<li>DO use the appropiate JOIN hint.</li>
<li>DON&#8217;T use SELECT INTO if possible.</li>
<li>DON&#8217;T use FUNCTIONs if possible.</li>
<li>DON&#8217;T use CASE statements if possible.</li>
<li>DON&#8217;T put SELECT statements inside a TRANSACTION.</li>
<li>DON&#8217;T use temporary tables, use table variables if possible.</li>
</ol>
<p>Now let&#8217;s discuss each one of these items in more detail.</p>
<p>1.  <em>DO return only the columns that you need.</em></p>
<p>I often see code like this:<br />
   <code>
<pre>select * from dbo.someTable</pre>
<p></code></p>
<p>There&#8217;s nothing wrong with code like this if the client <em>really</em> needs all the columns from the table.  More often than not though, the client is only interested in a subset of the columns.  It&#8217;s a waste to return the additional columns if the client doesn&#8217;t need to consume them.</p>
<p>2.  <em>DO return only the rows that you need.</em></p>
<p>Again, we&#8217;ll use the code example in item #1.  Notice how it returns all rows in the table?  Same rationale applies here:  It&#8217;s a waste to return rows that the client may not need.  Rather, the SELECT statement should be written to return only the rows that the client is interested in like so:</p>
<p><code></p>
<pre>
select col1, col2, col3
from dbo.someTable
where
   col4 = 'xyz'
</pre>
<p></code></p>
<p>3.  <em>DO use only the appropriate index(es) in your statement.</em></p>
<p>If the table we&#8217;re selecting from has an appropriate index, we can help MSSQL to select the right index to use by using an index hint or by specifying a criteria in the WHERE clause that would help MSSQL pick that index.</p>
<p>Assume that <em>dbo.someTable</em> has an index on <em>col4</em>.  The code example in item #2 would use this index.  What if the index on <em>dbo.someTable</em> was defined such that the index is on <em>col4, col5</em>?  MSSQL can still use this index because the WHERE clause <em>partially</em> matches the index.</p>
<p>MSSQL would not use the index if it was defined using <em>col5, col4</em>.  In otherwords, MSSQL can still use an index if it&#8217;s a <em>leading index</em>.  That is, an index in which the columns that are indexed partially matches the WHERE clause starting from LEFT to RIGHT.</p>
<p>Sometimes however, no matter how you write your WHERE clause, MSSQL may not use the index you want.  In such a case, you could use an <em>index hint</em>.  An index hint is request to MSSQL that it should use a given index when executing your SELECT statement.  You would use it like so:</p>
<p><code></p>
<pre>
select col1, col2, col3
from dbo.someTable with (index(IX_COL4))
where
   col4 = 'xyz'
</pre>
<p></code></p>
<p>Assuming that the index IX_COL4 is indexed such that COL4 is the first column indexed, it instructs MSSQL to use this index.</p>
<p>4.  <em>DO use the appropriate locking granularity.</em></p>
<p>When selecting data, you need to use the right locking level for your environment.  If you can get away with using a <a href="http://sqlbyminh.wordpress.com/2009/04/28/sql-using-dirty-reads" target="_blank">dirty read</a> then use the NOLOCK hint like so:</p>
<p><code></p>
<pre>
select col1, col2, col3
from dbo.someTable with (nolock, index(IX_COL4))
where
   col4 = 'xyz'
</pre>
<p></code></p>
<p>If you can&#8217;t then use an UPDATE lock in conjunction with either a ROWLOCK or PAGLOCK.  A ROWLOCK will lock all the rows that qualify in your WHERE clause, whereas an UPDLOCK will lock all the pages that contain the rows that qualify in your WHERE clause.</p>
<p>Having said that, it may seem that using a ROWLOCK is the way to go.  While ROWLOCK will give you high concurrency, it does make MSSQL work harder.  For every row that&#8217;s locked, it must maintain information about that lock.  The same goes with a PAGLOCK.  So what&#8217;s the difference?</p>
<p>Well, assume that your query finds 10000 rows that qualify.  If you&#8217;re using a ROWLOCK, MSSQL has created 10000 lock records internally.  Now assume that you&#8217;re using a PAGLOCK and that the same 10000 rows were found on 1000 pages.  MSSQL would have created only 1000 lock records internally.</p>
<p>That&#8217;s a lot less resources that MSSQL has to maintain and free up once your query is done.</p>
<p>5.  <em>DO cache often <em>used</em> datasets into a table variable or temp table.</em></p>
<p>You&#8217;ll often find that you need to re-use a particular dataset within a given stored procedure or script.  Instead of retrieving the same data for each statement you should cache this dataset for later retrieval.  For example, suppose you want to get a list of all the customers that have brought you more than $5000 in revenue since 1/1/2008.  Furthermore, suppose that you want to get a list of the invoices.  One way to do this is to use the code below:</p>
<p><code></p>
<pre>
-- Get customers with revenue for year that is &gt; $5000.
select
  c.customerId
  ,c.balance
  ,c.customerCreateDate
  ,c.customerName
from dbo.tblCustomer c with (updlock, paglock)
join dbo.tblInvoiceHdr h with (nolock)
on h.customerId = c.customerId
where
   h.tranDate between '1/1/2008' and getdate()
   and c.balance &gt; 5000

-- Get the invoice info.
select
   h.invoiceNumber
   ,h.invoiceSysId
   ,h.tranDate
   ,h.invoiceTotal
from dbo.tblInvoiceHdr h
join
(
   select
     c.customerId
     ,c.balance
     ,c.customerCreateDate
     ,c.customerName
   from dbo.tblCustomer c with (updlock, paglock)
   join dbo.tblInvoiceHdr h with (nolock)
   on h.customerId = c.customerId
   where
      h.tranDate between '1/1/2008' and getdate()
      and c.balance &gt; 5000
) custBal5K
on custBal5K.customerId = h.customerId
</pre>
<p></code></p>
<p>Notice how we <em>re-use</em> the SELECT statement from the first statement to generate the invoice list?  We could cache this output into a table to get better performance like so:</p>
<p><code></p>
<pre>
-- Create temp table.
declare @t table
(
   customerId int not null
   ,balance money not null
   ,customerCreateDate datetime not null
   ,customerName varchar(100) not null
)

-- Get customers with revenue for year that is &gt; $5000.
insert into @t
(
   customerId
   ,balance
   ,customerCreateDate
   ,customerName
)
select
  c.customerId
  ,c.balance
  ,c.customerCreateDate
  ,c.customerName
from dbo.tblCustomer c with (updlock, paglock)
join dbo.tblInvoiceHdr h with (nolock)
on h.customerId = c.customerId
where
   h.tranDate between '1/1/2008' and getdate()
   and c.balance &gt; 5000

-- Echo customer list.
select * from @t

-- Echo invoices.
select
   h.invoiceNumber
   ,h.invoiceSysId
   ,h.tranDate
   ,h.invoiceTotal
from @t t
join dbo.tblInvoiceHdr h
on h.customerId = t.customerId
</pre>
<p></code></p>
<p>The code above caches the customer list into a temp table.  The reason why this query performs much better is it doesn&#8217;t have to fetch the data from dbo.tblCustomer again as in the first example.  Obviously, this is a trivial example.  However, if the base table that you&#8217;re fetching from has a lot of rows, caching the data in this manner can be more efficient.</p>
<p>6.  <em>DO use the appropiate JOIN hint.</em></p>
<p>Most of the time, SQL Server is smart enough to use the most efficient type of JOIN to retrieve the data it needs.  There are times however where you may want to suggest to SQL the type of join to use.  SQL uses three types of JOINs.  They are as follows:</p>
<li>LOOP JOIN</li>
<li>HASH JOIN</li>
<li>MERGE JOIN</li>
<p>A <em>loop join</em> is one in which the records in one table are scanned for matches using every record in another table.  This type of join is called a loop join because it&#8217;s akin to searching for a match by looping over a set of values to do the comparison.  This type of join is CPU-intensive and is the slowest of the three types of joins.</p>
<p>A <em>hash join</em> on the other hand uses a <em>hash table</em> instead of a loop to find the matches, hence its name.  In this method, a hash table is built to hold the values of the smaller table.  This is called the <em>build phase</em>.  Once the hash table is built, the rows in the larger table are used as inputs to scan the hash table for matches.  This is termed the <em>probe phase</em>.  Any matches are then sent back to the client.  This type of join is faster than the LOOP JOIN but it is memory-intensive.</p>
<p>A <em>merge join</em> is used when the columns to be compared have a unique index in both tables where the sort order is the same.  In other words, the inputs being compared are already sorted in exactly the same order.  A row by row comparison is made between the sorted inputs.  If a match is found it is sent to the client.  If the two rows don&#8217;t match, the row that has the <em>lesser</em> value is discarded.  SQL then advances to the next row in both tables and repeats this process until there is no more rows to compare.  This type of join is the <em>fastest</em> method since the comparision is being done on rows that are identically sorted.</p>
<p>So how do you specify a JOIN hint in a SELECT statement?  Take a look at the code examples below:</p>
<p><strong>LOOP JOIN EXAMPLE</strong><br />
<code></p>
<pre>
select
   a.col1, b.col1
from tableA a
inner loop join tableB b
on b.col1 = a.col1
</pre>
<p></code></p>
<p><strong>HASH JOIN EXAMPLE</strong><br />
<code></p>
<pre>
select
   a.col1, b.col1
from tableA a
inner hash join tableB b
on b.col1 = a.col1
</pre>
<p></code></p>
<p><strong>MERGE JOIN EXAMPLE</strong><br />
<code></p>
<pre>
select
   a.col1, b.col1
from tableA a
inner merge join tableB b
on b.col1 = a.col1
</pre>
<p></code></p>
<p>7.  <em>DON&#8217;T use SELECT INTO if possible.</em></p>
<p>Often times it&#8217;s necessary to cache data into a temporary table.  This is true if you intend to re-use the data often in your script or stored procedure.  Some DBAs (including myself) often take the lazy way out and write code to store the data using this form:</p>
<p><code></p>
<pre>
select col1, col2, col3 into #t from dbo.someTable
</pre>
<p></code></p>
<p>While there&#8217;s nothing wrong with using this form, there <em>is</em> a drawback with using this method.  The SELECT&#8230;INTO statement is doing <em>two</em> operations here.  It is creating a temporary table (#t) and then inserting the data from dbo.someTable into #t.</p>
<p>Keep in mind that every MSSQL statement that is not a SELECT statement creates an <em>implicit</em> transaction.  It makes sense when you think about it.  If a given statement fails for any reason, MSSQL has to rollback any changes as if the statement never executed.  Therefore, the first part of the SELECT&#8230;INTO operation, the one that is creating the temporary table #t, is also wrapped in a transaction until the INSERT portion is completed.</p>
<p>When an object is being modified, MSSQL places a <em>schema modification lock</em> on the object to ensure that no process can access the object until the modification has completed.  That is also true of temp tables.  However, MSSQL also places an additional schema lock on tempdb itself.  This lock on tempdb can block other processes that need to create temp tables in as well.  To get around this issue it is better to create the temp table first and then do an explicit insert as the code below demonstrates:</p>
<p><code></p>
<pre>
----- Create temp table. -----
-- While the temp table is being created, a schema lock on tempdb is enforced.
create table #t
(
   col1 int not null
   ,col2 varchar(50) not null
   ,col3 numeric(10,2) not null
)

----- Now do the INSERT. -----
-- Schema lock on tempdb is no longer in affect here!
insert into #t
(
   col1
   ,col2
   ,col3
)
select col1, col2, col3 from dbo.someTable
</pre>
<p></code></p>
<p>The code above accomplishes the same task but the schema lock placed on tempdb is so short-lived that other processes won&#8217;t be affected.</p>
<p>8.  <em>DON&#8217;T use FUNCTIONs if possible.</em></p>
<p>Let&#8217;s face it, as programmers and DBAs, we use functions all the time.  For the most part, using functions is not a big issue; unless it&#8217;s part of a large data-retrieval.  You see, functions by nature are executed <em>one row at a time, one expression at a time</em>.  Therefore, if you have a function that is part of a SELECT statement, the function is executed for each row of data that&#8217;s returned.  Consider the following code:</p>
<p><code></p>
<pre>
----- Get data from table. -----
select
   customerId
   ,customerName
   ,isnull(onCreditHold, 0) as onCreditHold
from dbo.someTable (nolock)
</pre>
<p></code></p>
<p>In the select statement above, the function ISNULL( ) is being called for each row that is returned.  That&#8217;s not an issue if we&#8217;re talking about a small number of rows.  If however, you&#8217;re returning a large amount of rows, it could impact performance since the column ONCREDITHOLD has to be evaluated for each row.</p>
<p>Instead, you may want to create a DEFAULT constraint so that ONCREDITHOLD is 0 by default for inserts.  That way you wouldn&#8217;t have to cast a NULL value to a 0. </p>
<p>9.  <em>DON&#8217;T use CASE statements if possible.</em></p>
<p>CASE statements are like FUNCTIONs in that they are examine for each row.  Avoid using them if at all possible.  If not keep them as simple as possible.  Forgo using nested CASE statements.</p>
<p>10.  <em>DON&#8217;T put SELECT statements inside a TRANSACTION.</em></p>
<p>It doesn&#8217;t make sense to me but I see a lot of T-SQL code where SELECT statements are wrapped by a transaction.  You should use transactions when inserting, deleting or updating data.  Transactions help you to rollback any changes that might have occurred in the event of an error.  Since a SELECT query doesn&#8217;t change or affect data it&#8217;s a waste and poor coding to have them inside a transaction.</p>
<p>11.  <em>DON&#8217;T use temporary tables, use table variables if possible.</em></p>
<p>Both local and global temporary tables are more resource intensive then their table variable counterparts.  Use table variables in lue of temporary tables.</p>
<p>As always I welcome any feedback, corrections or insights regarding this post.<br />
Minh</p>
<br /> Tagged: optimize, select <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sqlbyminh.wordpress.com/397/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sqlbyminh.wordpress.com/397/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sqlbyminh.wordpress.com/397/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sqlbyminh.wordpress.com/397/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sqlbyminh.wordpress.com/397/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sqlbyminh.wordpress.com/397/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sqlbyminh.wordpress.com/397/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sqlbyminh.wordpress.com/397/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sqlbyminh.wordpress.com/397/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sqlbyminh.wordpress.com/397/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sqlbyminh.wordpress.com/397/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sqlbyminh.wordpress.com/397/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sqlbyminh.wordpress.com/397/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sqlbyminh.wordpress.com/397/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=397&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sqlbyminh.wordpress.com/2009/07/02/sql-optimizing-select-statement-for-speed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/5f2c86e3673f90375a5664df9574fed5?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">sqlbyminh</media:title>
		</media:content>
	</item>
		<item>
		<title>SQL &#8211; Stored Procedures</title>
		<link>http://sqlbyminh.wordpress.com/2009/05/07/sql-stored-procedures/</link>
		<comments>http://sqlbyminh.wordpress.com/2009/05/07/sql-stored-procedures/#comments</comments>
		<pubDate>Thu, 07 May 2009 01:51:31 +0000</pubDate>
		<dc:creator>sqlbyminh</dc:creator>
				<category><![CDATA[MSSQL - Basics]]></category>
		<category><![CDATA["stored procedure"]]></category>
		<category><![CDATA[default value]]></category>
		<category><![CDATA[exec]]></category>
		<category><![CDATA[execute]]></category>
		<category><![CDATA[nocount]]></category>
		<category><![CDATA[output]]></category>
		<category><![CDATA[parameter]]></category>

		<guid isPermaLink="false">http://sqlbyminh.wordpress.com/?p=352</guid>
		<description><![CDATA[In this post we take aim at the stored procedure.  What is it?  In a nutshell, it is compiled code that resides in a given database. Why would you want to have code in a database and not in a traditional application in some cases?  Well, for one, if you have to manipulate data that resides [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=352&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In this post we take aim at the <em>stored procedure</em>.  What is it?  In a nutshell, it is compiled code that resides in a given database.</p>
<p>Why would you want to have code in a database and not in a traditional application in some cases?  Well, for one, if you have to manipulate data that resides in a database, it is often faster to manipulate the data at the server and then return it to the client.</p>
<p>Secondly, if the logic is in the database, you can make modifications to the code in the database without having to re-deploy a new client to each and every user machine.</p>
<p>Thirdly, stored procedures perform much better than a pass-through statement.  We&#8217;ll cover that in a later post.  So just how do we create a stored procedure?  Let&#8217;s look at a basic one that&#8217;s used to return a <em>scalar</em> value.  Take a look at the code below:</p>
<p><u>Code</u></p>
<p><code></p>
<pre>create procedure dbo.uspTest
as
set nocount on
select 1
set nocount off
return 0</pre>
<p></code></p>
<p>There are three things to take note of here. Notice the name of the procedure has the prefix &#8220;dbo.&#8221; in front of it. This tells SQL that the procedure is &#8220;owned&#8221; by the <strong>dbo</strong> user. &#8220;dbo&#8221; stands for &#8220;database owner&#8221;. In affect we&#8217;re saying that this procedure is owned by the account that has been defined as the owner of the database in which this procedure resides. By default the database owner is the &#8220;sa&#8221; account in MSSQL.</p>
<p>The second thing to note is the use of the command &#8220;set nocount on&#8221;. This command instructs MSSQL to forgo keeping track of how many records were modified, deleted, inserted or returned to the client. This small addition can increase performance quite a bit so I recommend that you use it as much as possible.</p>
<p>The third thing is the return code or exit code. Notice how we&#8217;re returning the value zero (0). A stored procedure is <em>not</em> required to have a return code but it is good practice to do so. Traditionally, a return code of zero indicates success, any other value indicates failure. That is because the return code typically indicates the error number. Therefore, returning a value of zero indicates that there was no error.</p>
<p>Let&#8217;s now look at how to create a stored procedure that takes arguments and returns a <em>dataset</em>. Take a look at the code below:</p>
<p><u>Code</u></p>
<p><code></p>
<pre>create procedure dbo.uspTest
(
   @custid int
)
as
set nocount on
select * from dbo.tbCustomer where customerId = @custid
set nocount off</pre>
<p></code></p>
<p>The procedure above takes one argument of type int[eger]. The name of the parameter is @custid. In MSSQL parameters are prefaced with the @ sign. The parameter type is defined as an int[eger]. The parameter name and it&#8217;s type must be seperated by at least one or more spaces.</p>
<p>Look at the SELECT statement. See how it&#8217;s constructed? In this example, we&#8217;re retrieving any row from dbo.tblCustomer where the column <em>customerId</em> contains the value that @custid evaluates to. To use this procedure to retrieve customer records where the @custid value is 50289, we would execute the procedure using one of the following forms:</p>
<p><span style="text-decoration:underline;">Example 1</span></p>
<p><code></p>
<pre>exec dbo.uspTest 50289</pre>
<p></code></p>
<p><u>Example 2</u></p>
<p><code></p>
<pre>exec dbo.uspTest @custid = 50289</pre>
<p></code></p>
<p><u>Example 3</u></p>
<p><code></p>
<pre>declare @custIdValue int
set @custIdValue = 50289
exec dbo.uspTest @custid = @custIdValue</pre>
<p></code></p>
<p><u>Example 4</u></p>
<p><code></p>
<pre>declare @custId int
set @custId = 50289
exec dbo.uspTest @custid = @custId</pre>
<p></code></p>
<p>All of the above execute statements will return the same dataset. However, example 2 uses a <em>named parameter</em>. Named parameters allow you to pass in the arguments in any order so long as you indicate the parameter that the argument is mapped to. In example 2, we&#8217;re indicating that the value <em>50289</em> maps to parameter <em>@custid</em>. I favor named parameters since you don&#8217;t have to worry about passing in the arguments in the exact position required by the procedure.</p>
<p>Example 3 also uses a named parameter but passes in the value as a variable.</p>
<p>Example 4 deserves special attention. Notice that in this example, we&#8217;ve declared a <em>variable</em> that has the same name as the parameter. Believe it or not, this will execute without error. The reason is that MSSQL is smart enough to know that the token to the LEFT of the equal sign is the <em>parameter</em> and that the token to the RIGHT of the equal sign is the <em>argument</em>.</p>
<p><span style="text-decoration:underline;">Default Parameter Values</span></p>
<p>Often times, it&#8217;s convenient to supply a <em>default</em> to a parameter. To specify a default value for a parameter you simply add the equal sign to the right of the parameter name with a default value. Below is an example:</p>
<p><u>Code</u></p>
<p><code></p>
<pre>create procedure dbo.uspTest
(
   @custid int
   ,@creditHold bit = 0
)
as
set nocount on

select *
from dbo.tbCustomer
where
   customerId = @custid
   and onCreditHold = @creditHold

set nocount off</pre>
<p></code></p>
<p>In this example, we&#8217;ve added a second parameter named @creditHold.  This flag indicates wether we should return customer records where the customer is on credit hold with us.  The default value is zero (0).  This indicates that if <em>no value</em> was specified for this parameter, the default value would be zero.</p>
<p>To return customers that are not on credit hold, we could execute the procedure in one of the following forms:</p>
<p><code></p>
<pre>
exec dbo.uspTest @custid = 50289

exec dbo.uspTest @custid = 50289, @creditHold = 0
</pre>
<p></code></p>
<p><u>Output Parameters</u></p>
<p>A stored procedure can also be defined with an <em>output</em> parameter.  This type of parameter is one in which any changes made to the argument passed in will be retained once control has been returned to the caller.  It is in essence MSSQL way of specifying a parameter that is passed by <em>reference</em>.</p>
<p>Parameters in MSSQL stored procedures are by default <em>input only</em>.  That is, any changes made to the argument by the procedure <em>is not</em> retained when control passes back to the client.  The example below shows how to define an output parameter for our procedure:</p>
<p><u>Output Parameter Example</u></p>
<p><code></p>
<pre>create procedure dbo.uspTest
(
   @custid int
   ,@creditHold bit = 0
   ,@accountBalance numeric(19,2) output
)
as
set nocount on
select * from dbo.tbCustomer where customerId = @custid and onCreditHold = @creditHold
select @accountBalance = amountOwed from dbo.tbCustomer where customerId = @custid
set nocount off</pre>
<p></code></p>
<p>If we were to call the procedure like so:</p>
<p><code></p>
<pre>
declare @amountOwed numeric(19,2)
set @amountOwed = 0

exec dbo.uspTest
   @custid = 50289
   ,@creditHold = 0
   ,@accountBalance = @amountOwed output

select @amountOwed
</pre>
<p></code></p>
<p>Assuming that the given customer has a balance on the account, you would see that the value of @amountOwed is not zero (0), which is what we initialized it to, but rather, what the customers remaining balance is.</p>
<p>Well, that&#8217;s it for this post.  There is so much more to stored procedures that I didn&#8217;t cover.  I hope however that I&#8217;ve been able to shed some light on what a stored procedure is and how to use them.  I recommend that you do further reading on them using MSSQL&#8217;s on-line help.</p>
<p>Sincerely,<br />
Minh</p>
<br /> Tagged: "stored procedure", default value, exec, execute, nocount, output, parameter <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sqlbyminh.wordpress.com/352/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sqlbyminh.wordpress.com/352/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sqlbyminh.wordpress.com/352/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sqlbyminh.wordpress.com/352/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sqlbyminh.wordpress.com/352/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sqlbyminh.wordpress.com/352/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sqlbyminh.wordpress.com/352/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sqlbyminh.wordpress.com/352/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sqlbyminh.wordpress.com/352/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sqlbyminh.wordpress.com/352/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sqlbyminh.wordpress.com/352/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sqlbyminh.wordpress.com/352/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sqlbyminh.wordpress.com/352/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sqlbyminh.wordpress.com/352/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=352&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sqlbyminh.wordpress.com/2009/05/07/sql-stored-procedures/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/5f2c86e3673f90375a5664df9574fed5?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">sqlbyminh</media:title>
		</media:content>
	</item>
		<item>
		<title>SQL &#8211; Using Transactions</title>
		<link>http://sqlbyminh.wordpress.com/2009/04/29/sql-using-transactions/</link>
		<comments>http://sqlbyminh.wordpress.com/2009/04/29/sql-using-transactions/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 17:50:59 +0000</pubDate>
		<dc:creator>sqlbyminh</dc:creator>
				<category><![CDATA[MSSQL - Basics]]></category>
		<category><![CDATA[archiving]]></category>
		<category><![CDATA[delete]]></category>
		<category><![CDATA[insert]]></category>
		<category><![CDATA[riaserror]]></category>
		<category><![CDATA[rollback]]></category>
		<category><![CDATA[transaction]]></category>

		<guid isPermaLink="false">http://sqlbyminh.wordpress.com/?p=326</guid>
		<description><![CDATA[Using transaction in MSSQL to maintain data integrity.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=326&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In this post we look at the concept of a <em>transaction</em> and how to use it to maintain data integrity.  So exactly what is a transaction?  A transaction is a mechanism used by DBA/developers to treat 1 or more units of work as one.  What does that mean?  Stay with me.  Let&#8217;s look at the classic real-world example of a transaction.</p>
<p>Every time you go to the bank you do one of three things:</p>
<ol>
<li>Deposit money into your account.</li>
<li>Withdraw money from your account.</li>
<li>Transfer funds from one account to another.</li>
</ol>
<p>Each of these activities to the average consumer would constitute a <em>single</em> unit of work.  In reality, it is made up of more than one activity.  For example,  when you are transferring funds from one account to another there are multiple steps (activities) that take place.  Let&#8217;s focus on the core activities involved in this example.</p>
<p>Funds are withdrawn from the first account (account 1) then those funds are deposited into the second account (account 2).  Notice that these are <em>two</em> units of work.  However, they must be treated as <em>one</em> unit of work to guarantee a <em>stable</em> transaction.  What do I mean by that?</p>
<p>Well, suppose that these two activities are not treated as one unit of work, that is, each step can fail and it wouldn&#8217;t affect the other step.  That&#8217;s bad.  What happens if the withdraw from account 1 succeeded but the deposit to account 2 <em>failed</em>?  The bank would show that you took out, say $100 dollars from account 1 but you would not see the $100 dollars in account 2.  You just lost money.</p>
<p>Let&#8217;s look at the flip side.  The withdraw from account 1 fails but the deposit to account 2 succeeded.  You just gained money and the bank just lost money.  Not good for them.</p>
<p>This is where the <em>transaction</em> comes in.  The bank&#8217;s system would wrap these two steps into a transaction.  It forces both steps to succeed in order for it to be considered a valid transaction.  If <em>either</em> step fails, the transaction is <em>rolled back</em> as if neither one ever took place.  So how would we use a transaction?  Take a look at the code below.  It&#8217;s a simple example of moving a record from one database to another.  This is typical when you want to <em>archive</em> a record.</p>
<p><u>Code</u></p>
<p><code></p>
<pre>
declare @emsg nvarchar(2048), @e int

set @emsg = ''
set @e = 0

begin try
   begin transaction trnArchive

   -- Copy row in tbCust to tbCustArchive
   insert into dbArchive..tbCustArchive
   (
      custid
      ,custname
      ,address1
      ,address2
      ,city
      ,state
      ,zip
      ,phone
      ,fax
   )
   select
      custid
      ,custname
      ,address1
      ,address2
      ,city
      ,state
      ,zip
      ,phone
      ,fax
   from dbCurrent..tbCust with (updlock, paglock)
   where
      custid = 12345

   -- Now delete the row in tbCust.
   delete t
   from dbCurrent..tbCust t with (updlock, paglock)
   where
      t.custid = 12345

   -- commit transaction.
   commit trnArchive

end try
begin catch

   -- capture error info.
   set @emsg = error_message()
   set @e = error_number()

   -- roll back transaction.
   rollback transaction trnArchive

   -- raise error to caller.
   raiserror('Error: %s.  Code: %d.', 16, 1, @emsg, @e)
end catch
</pre>
<p></code></p>
<p>The code above wraps both the INSERT and DELETE operation inside a BEGIN TRY and END TRY block.  This allows us to detect exceptions when and if they occur.  In addition, since both operations are wrapped inside a TRANSACTION block, we can then ROLLBACK the transaction inside the BEGIN CATCH block.</p>
<p>This ensures that we don&#8217;t have a copy of the record in both tables should the INSERT succeed but the DELETE fails.  Now that you know how to use transactions, I must point out one important fact.  <strong>KEEP YOUR TRANSACTIONS AS SHORT AS POSSIBLE</strong>.  In other words, don&#8217;t try to process a large amount of records in one transaction.</p>
<p>In the code example above, if we were to try and move 200000 customer records to archive in a high I/O environment, blocking might occur since we&#8217;re issuing a UPDLOCK and PAGLOCK when doing the INSSERT and DELETE.  Process smaller amounts of record for each transaction to minimize blocking.</p>
<p>Minh</p>
<br /> Tagged: archiving, delete, insert, riaserror, rollback, transaction <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sqlbyminh.wordpress.com/326/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sqlbyminh.wordpress.com/326/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sqlbyminh.wordpress.com/326/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sqlbyminh.wordpress.com/326/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sqlbyminh.wordpress.com/326/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sqlbyminh.wordpress.com/326/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sqlbyminh.wordpress.com/326/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sqlbyminh.wordpress.com/326/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sqlbyminh.wordpress.com/326/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sqlbyminh.wordpress.com/326/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sqlbyminh.wordpress.com/326/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sqlbyminh.wordpress.com/326/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sqlbyminh.wordpress.com/326/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sqlbyminh.wordpress.com/326/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=326&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sqlbyminh.wordpress.com/2009/04/29/sql-using-transactions/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/5f2c86e3673f90375a5664df9574fed5?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">sqlbyminh</media:title>
		</media:content>
	</item>
		<item>
		<title>SQL &#8211; Using Dirty Reads</title>
		<link>http://sqlbyminh.wordpress.com/2009/04/28/sql-using-dirty-reads/</link>
		<comments>http://sqlbyminh.wordpress.com/2009/04/28/sql-using-dirty-reads/#comments</comments>
		<pubDate>Tue, 28 Apr 2009 23:50:02 +0000</pubDate>
		<dc:creator>sqlbyminh</dc:creator>
				<category><![CDATA[MSSQL - Basics]]></category>
		<category><![CDATA[checksum]]></category>
		<category><![CDATA[dirty]]></category>
		<category><![CDATA[lock]]></category>
		<category><![CDATA[read]]></category>
		<category><![CDATA[select]]></category>
		<category><![CDATA[spid]]></category>
		<category><![CDATA[timestamp]]></category>

		<guid isPermaLink="false">http://sqlbyminh.wordpress.com/?p=265</guid>
		<description><![CDATA[In this post we are going to look at dirty reads. Specifically, we&#8217;re going to look at why and when we might want to implement a dirty read and if so, what are the pitfalls of using them and how can we address them. Remember that a dirty read is one in which we are [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=265&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In this post we are going to look at <em>dirty reads</em>. Specifically, we&#8217;re going to look at why and when we might want to implement a dirty read and if so, what are the pitfalls of using them and how can we address them.</p>
<p>Remember that a dirty read is one in which we are requesting data that may not have been committed yet.  What we&#8217;re saying to SQL is &#8220;Please return the data I asked for.  I don&#8217;t care if it&#8217;s the very latest snapshot at the time you pull it.&#8221;.</p>
<p>So <em>why</em> would we ever want to implement a dirty read?  The reasons vary depending on your environment, application architecture and business model.  A good example is when you are running a report against a database that has moderate to high updates.  Remember that the <em>default</em> setting in MSSQL for reads is to use a <em>shared-read</em> lock.</p>
<p>This means that if another SPID has been granted an <em>update</em> lock on a page that you need to read from, your SPID will be blocked until the UPDATE statement from the other SPID has been completed or rolled back.</p>
<p>In this case, it may be more desirable to have the report return quickly with the <em>possiblity</em> that the data is dirty than to have it block (you probably won&#8217;t know how long it will be blocked) and keep the user waiting.</p>
<p>Yet another example is when you need to return data to an application that will then load the data into some type of grid for the user to select from.  Again, the task here is to show the data.  The user may not even select any record to process or view.  This is often the case when an application needs to initalize some sort of data grid.</p>
<p>What happens however when the user <em>does</em> want to edit the record based on what he/she sees in the grid?  They&#8217;re making a decision based on data that might have changed already.  We would need a mechanism to determine if the row they selected has really changed.  If it has then we need to notify them and refresh the row.  Afterall, the refresh may indicate to them that they <em>don&#8217;t</em> need to make the edit.</p>
<p>So how can we detect that data has changed if we&#8217;re using dirty reads?  Well, there are several methods to detect change in data.  The first method and <em>least</em> desirable is to use a <em>timestamp</em>.  A timestamp is a type of column that will get updated with a <em>unique</em> binary number every time a record is updated.  It&#8217;s often used as a means to &#8220;version-stamp&#8221; one or more rows in a table.</p>
<p>So assuming that a table has a timestamp column called <em>current_stamp</em>, here&#8217;s the code to fetch the data along with the timestamp value.  We&#8217;ll need it to compare against the current timestamp of the row when the UPDATE request is sent.</p>
<p><u>Code</u></p>
<p><code></p>
<pre>
-- Get row data along with timestamp.
select
   custid
   ,custname
   ,address1
   ,address2
   ,city
   ,state
   ,zip
   ,phone
   ,fax
   ,current_stamp
from dbo.someTable (nolock)
where
  custname = 'acme, inc.'
</pre>
<p></code></p>
<p>Now assume that the application stores the <strong>custid</strong> and <strong>current_stamp</strong> values that were returned.  Further, assume that the user has made some changes to the data via the application.  The application then makes a call to a stored procedure to save the change.</p>
<p>We could implement a version-check inside the proc to see if the data being saved is out of date.  Below is the code:</p>
<p><u>Code</u></p>
<p><code></p>
<pre>
create procedure dbo.uspSave
(
   @custid int
   ,@custname varchar(30)
   ,@address1 varchar(50)
   ,@address2 varchar(50)
   ,@city varchar(50)
   ,@state varchar(2)
   ,@zip varchar(9)
   ,@phone varchar(20)
   ,@fax varchar(20)
   ,@last_stamp timestamp
)
as
set nocount on
declare @currentStamp timestamp, @e int
set @e = 0

select
   @currentStamp = a.current_stamp
from dbo.someTable a with (updlock, paglock)
where
   a.custid = @custid

if (@currentStamp = @last_stamp)
begin
   -- No change in data.  Allow update.
   update a
   set
      a.custname = @custname
      ,a.address1 = @address1
      ,a.address2 = @address2
      ,a.city = @city
      ,a.state = @state
      ,a.zip = @zip
      ,a.phone = @phone
      ,a.fax = @fax
   from dbo.someTable a with (updlock, paglock)
   where
      a.custid = @custid
end
else
begin
   raiserror('Error.', 16, 1)
   set @e = @@error
end
return @e
</pre>
<p></code></p>
<p>The key is the IF statment where we compare the timestamp at the time of the read to the timestamp of the record as it exists before the UPDATE statement is run.  Timestamps are a nice feature because the value is updated <em>every</em> time the record is updated.  If the timestamp is the same, we know that the data hasn&#8217;t changed, otherwise we toss an error.</p>
<p>Timestamps are a nice feature but it means that we&#8217;d have to create a TIMESTAMP column in the table.  The drawback is when you&#8217;re dealing with a <em>very</em> large table.</p>
<p>If your table has 20M rows, there would be a unique timestamp for <em>each</em> record.  This would be true <em>even</em> if the record is <em>never</em> updated after the initial INSERT.  That&#8217;s a lot of data overhead to have in order to detect change.</p>
<p>Another method is to use the <code><font color="#0000ff">CHECKSUM()</font></code> function.  Using this method, we would do a checksum of the entire record (or a subset of columns depending on your needs) during the read, then do another checksum of the record prior to the update.  We then compare the values of the two checksums to see if they are the same.</p>
<p>For example, using the SELECT statement and stored procedure above, we could modify them like so:</p>
<p><u>Code</u></p>
<p><code></p>
<pre>
-- Get row data along with value of checksum.
select *, checksum(*)
from dbo.someTable (nolock)
where
  custname = 'acme, inc.'
</pre>
<p></code></p>
<p><code></p>
<pre>
create procedure dbo.uspSave
(
   @custid int
   ,@custname varchar(30)
   ,@address1 varchar(50)
   ,@address2 varchar(50)
   ,@city varchar(50)
   ,@state varchar(2)
   ,@zip varchar(9)
   ,@phone varchar(20)
   ,@fax varchar(20)
   ,@last_checksum int)
as
set nocount on
declare @currentChecksum int, @e int
set @e = 0

select
   @currentChecksum = checksum(a.*)
from dbo.someTable a with (updlock, paglock)
where
   a.custid = @custid

if (@currentChecksum = @last_checksum)
begin
   -- No change in data.  Allow update.
   update a
   set
      a.custname = @custname
      ,a.address1 = @address1
      ,a.address2 = @address2
      ,a.city = @city
      ,a.state = @state
      ,a.zip = @zip
      ,a.phone = @phone
      ,a.fax = @fax
   from dbo.someTable a with (updlock, paglock)
   where
      a.custid = @custid
end
else
begin
   raiserror('Error.', 16, 1)
   set @e = @@error
end
return @e
</pre>
<p></code></p>
<p>Yet another method would be to create a table that would function as your &#8220;lock&#8221; table.  As users access data, a &#8220;lock&#8221; record could be created.  Once the &#8220;lock&#8221; is released, the &#8220;lock&#8221; record could be deleted from your table.  If the &#8220;lock&#8221; record to be created <em>already</em> exists in your table, you would raise an error or return an error code indicating that someone is currently viewing the record.</p>
<p>Since the user can&#8217;t view the record unless the &#8220;lock&#8221; can be created, there&#8217;s little chance that they would be looking at a &#8220;dirty&#8221; record.</p>
<p>IMHO, if your model is one that needs to support <em>high concurrency</em> with <em>minimal blocking</em> and you or your staff control the code, it&#8217;s worth the effort to implement provided you have the time and resource.</p>
<p>Some of you might have surmised (correctly so) that it&#8217;s a lot of work to implement dirty reads into your application architecture or business model.  You need to weigh the non-blocking nature of dirty reads and the cost of implementing them.  I have only touched on <em>some</em> of the techniques that can be used.  Each has it&#8217;s own strengths and weaknesses.  You&#8217;ll need to investigate and choose the one that works best for your environment.</p>
<p>As always, I welcome any counter-point(s) or comments regarding my posts.<br />
Minh</p>
<br /> Tagged: checksum, dirty, lock, read, select, spid, timestamp <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sqlbyminh.wordpress.com/265/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sqlbyminh.wordpress.com/265/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sqlbyminh.wordpress.com/265/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sqlbyminh.wordpress.com/265/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sqlbyminh.wordpress.com/265/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sqlbyminh.wordpress.com/265/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sqlbyminh.wordpress.com/265/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sqlbyminh.wordpress.com/265/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sqlbyminh.wordpress.com/265/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sqlbyminh.wordpress.com/265/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sqlbyminh.wordpress.com/265/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sqlbyminh.wordpress.com/265/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sqlbyminh.wordpress.com/265/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sqlbyminh.wordpress.com/265/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=265&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sqlbyminh.wordpress.com/2009/04/28/sql-using-dirty-reads/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/5f2c86e3673f90375a5664df9574fed5?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">sqlbyminh</media:title>
		</media:content>
	</item>
		<item>
		<title>SQL &#8211; Loosing Your Identity</title>
		<link>http://sqlbyminh.wordpress.com/2009/04/25/sql-loosing-your-identity/</link>
		<comments>http://sqlbyminh.wordpress.com/2009/04/25/sql-loosing-your-identity/#comments</comments>
		<pubDate>Sat, 25 Apr 2009 02:29:01 +0000</pubDate>
		<dc:creator>sqlbyminh</dc:creator>
				<category><![CDATA[MSSQL - Basics]]></category>
		<category><![CDATA[@@identity]]></category>
		<category><![CDATA[scope_identity]]></category>
		<category><![CDATA[trigger]]></category>

		<guid isPermaLink="false">http://sqlbyminh.wordpress.com/?p=239</guid>
		<description><![CDATA[In this post we&#8217;ll discuss the use of IDENTITY columns in MSSQL. For those who haven&#8217;t used an identity column before, it&#8217;s simply a column that contains system generated sequential numbers. It is akin to Microsoft Access&#8217; AutoNumber column. First let&#8217;s look at how you would define an identity column. Take a look at the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=239&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In this post we&#8217;ll discuss the use of IDENTITY columns in MSSQL.  For those who haven&#8217;t used an identity column before, it&#8217;s simply a column that contains <em>system generated sequential numbers</em>.  It is akin to Microsoft Access&#8217; <strong>AutoNumber</strong> column.</p>
<p>First let&#8217;s look at how you would define an identity column.  Take a look at the DDL statement below:</p>
<p><u>Code</u></p>
<p><code></p>
<pre>
create table dbo.tbUser
(
   username varchar(10) not null default ('')
   ,rowid int identity(1,1) not null
)
</pre>
<p></code></p>
<p>Notice the column named <strong>rowid</strong>.  It&#8217;s defined as an integer column (int) that will contain auto-generating values.  The generation of these numbers will be taken cared of for you by SQL.  Note that although I&#8217;ve used the <font color="#0000ff">int</font> data type for my ordinal values, SQL allows you to use <em>any</em> ordinal type.  Therefore, you could use <font color="#0000ff">decimal</font> as the type for an IDENTITY column as well.</p>
<p>The tokens <code>identity(1,1)</code> informs SQL that the column will be used as an identity column and that the seed value for the first record in this table will be 1 and that every subsequent number generated will be 1 greater than the last.  SQL stores this information internally so you don&#8217;t have to worry about it.</p>
<p>Let&#8217;s do some inserts into the table above and see what we get.  Run the code below as many times as you like:</p>
<p><u>Code</u><br />
<code></p>
<pre>
insert into dbo.tbUser
(
   username
)
values
(
   'jsmith'
)
</pre>
<p></code></p>
<p>Now do a SELECT against this table by running the following:</p>
<p><u>Code</u></p>
<p><code></p>
<pre>
select * from dbo.tbUser (nolock)
</pre>
<p></code></p>
<p>Notice how SQL has generated a <em>unique</em> value for the <em>rowid</em> column?  You might be asking &#8220;How can I retrieve the value that SQL generated after the INSERT?&#8221;.  The answer is to use one of two built-in functions supplied by MSSQL.  The first is the <code><font color="#0000ff">@@IDENTITY</font></code> function.  The second is the <code><font color="#0000ff">scope_identity()</font></code> function.</p>
<p>Let&#8217;s talk about the <code>@@IDENTITY</code> function first.  This function allows you to retrieve the <em>last</em> value that was inserted into a table.  So to get the value of the <em>rowid</em> column for a record that was just inserted into the table above, we would write the following code:</p>
<p><u>Code</u><br />
<code></p>
<pre>
declare @newRowId int
set @newRowId = 0

insert into dbo.tbUser
(
   username
)
values
(
   'jsmith'
)

set @newRowId = @@identity
select @newRowId
</pre>
<p></code></p>
<p>That all seems good.  However, the problem with the <code>@@IDENTITY</code> function is that it returns the last value used from the <em>last</em> insert made to <em>ANY</em> table.  Let me elaborate.  Suppose you had a TRIGGER on the dbo.tbUser table such that every time an INSERT was done, the TRIGGER would do an insert into another table (call this table dbo.tbLoginAudit) and this table also had an identity column.  What do you think would happen when you make a call to <code>@@IDENTITY</code>?  Well, let&#8217;s find out.  First let&#8217;s define the additional <code>dbo.tbLoginAudit</code> table.  Run the following code to create the table:</p>
<p><u>Code</u></p>
<p><code></p>
<pre>
create table dbo.tbLoginAudit
(
   username varchar(10) not null
   ,logindate datetime not null default (getdate())
   ,rowid int identity(1,1) not null
)
go
</pre>
<p></code></p>
<p>This table will be used by the trigger below to record when a user logs into some system.</p>
<p><u>Code &#8211; Create the TRIGGER</u></p>
<p><code></p>
<pre>
create trigger dbo.trgIns on dbo.tbUser for insert
as
set nocount on
insert into dbo.tbLoginAudit
(
   username
)
select
  i.username
from inserted i
set nocount off
go
</pre>
<p></code></p>
<p>The trigger above will now <em>automatically</em> insert a record into the dbo.tbLoginAudit table every time a new row is inserted into the dbo.tbUser table.  Keep in mind that the dbo.tbLoginAudit table also has an IDENTITY column.  Now before we do any more inserts into the dbo.tbUser table, let&#8217;s run a SELECT statement to find out what the current value for the <em>rowid</em> column is for this table.  Run the code below and take note of the <em>highest</em> value returned:</p>
<p><u>Code</u></p>
<p><code></p>
<pre>
select * from dbo.tbUser (nolock) order by rowid
</pre>
<p></code></p>
<p>Now run the code below.  What you would normally expect is that the value returned by the code below would be the next value.  So if your current value for the  column is 22, you&#8217;d expect the value returned by the code below to be 23.  Let&#8217;s try it out.</p>
<p><u>Code</u></p>
<p><code></p>
<pre>
declare @rowid int
set @rowid = 0

insert into dbo.tbUser
(
   username
)
values
(
   'jsmith'
)

set @rowid = @@identity
select @rowid
</pre>
<p></code></p>
<p>Did you get the expected value?  My bet is you didn&#8217;t.  The reason is that the trigger we defined earlier on the dbo.tbUser table fires <em>AFTER</em> the insert statement.  The trigger <em>ALSO</em> does an INSERT into the dbo.tbLoginAudit table, which <em>ALSO</em> has an IDENTITY column.</p>
<p>Remember, I said that the problem with the <code><font color="#0000ff">@@IDENTITY</font></code> function is that it returns the <em>LAST</em> identity value used.  Well, since the TRIGGER fires after the INSERT statement, it returns the IDENTITY value that was used for the dbo.tblLoginAudit table <em>NOT</em> the dbo.tbUser table.</p>
<p>So how might we get around this issue?  The solution is to use the other function I mentioned at the beginning of this post.  Remember the <code><font color="#0000ff">scope_identity()</font></code> function?  This function doesn&#8217;t have the same issue as <code><font color="#0000ff">@@IDENTITY</font></code>.  That is because as its name implies, it is <code><font color="#0000ff">scope aware</font></code>.  This function will return the last IDENTITY value used for any INSERT that has the same scope as the executing statement.</p>
<p>A trigger executes in a different scope from the executing statement.  The fact that <code><font color="#0000ff">@@IDENTITY</font></code> returned the value from the trigger implies that it&#8217;s scope is global.  <code><font color="#0000ff">scope_identity()</font></code> is local, therefore, it will return the IDENTITY value that was inserted in the INSERT statement&#8217;s scope.  Try out the code below to see for yourself:</p>
<p><u>Code</u></p>
<p><code></p>
<pre>
declare @rowid int
set @rowid = 0

insert into dbo.tbUser
(
 username
)
values
(
 'jsmith'
)

set @rowid = scope_identity()
select @rowid

select * from dbo.tbUser (nolock)
</pre>
<p></code></p>
<p>See how the IDENTITY value that is returned is NOT the one that was used in the dbo.tbLoginAudit table, but rather, the dbo.tbUser table?</p>
<p>Have fun with SQL!<br />
Minh</p>
<br /> Tagged: @@identity, scope_identity, trigger <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sqlbyminh.wordpress.com/239/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sqlbyminh.wordpress.com/239/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sqlbyminh.wordpress.com/239/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sqlbyminh.wordpress.com/239/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sqlbyminh.wordpress.com/239/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sqlbyminh.wordpress.com/239/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sqlbyminh.wordpress.com/239/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sqlbyminh.wordpress.com/239/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sqlbyminh.wordpress.com/239/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sqlbyminh.wordpress.com/239/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sqlbyminh.wordpress.com/239/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sqlbyminh.wordpress.com/239/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sqlbyminh.wordpress.com/239/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sqlbyminh.wordpress.com/239/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=239&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sqlbyminh.wordpress.com/2009/04/25/sql-loosing-your-identity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/5f2c86e3673f90375a5664df9574fed5?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">sqlbyminh</media:title>
		</media:content>
	</item>
		<item>
		<title>Detecting Orphaned Rows</title>
		<link>http://sqlbyminh.wordpress.com/2009/04/24/detecting-orphaned-rows/</link>
		<comments>http://sqlbyminh.wordpress.com/2009/04/24/detecting-orphaned-rows/#comments</comments>
		<pubDate>Fri, 24 Apr 2009 23:39:02 +0000</pubDate>
		<dc:creator>sqlbyminh</dc:creator>
				<category><![CDATA[MSSQL - Basics]]></category>
		<category><![CDATA["left join"]]></category>
		<category><![CDATA["orphaned row"]]></category>
		<category><![CDATA["right join"]]></category>

		<guid isPermaLink="false">http://sqlbyminh.wordpress.com/?p=206</guid>
		<description><![CDATA[This post is a continuation of the post on SQL joins. Hopefully you have a better understanding of the types of joins available to you. Now it&#8217;s time to see how we can use them to detect orphaned rows. An orphaned row is a child row that has no parent row. Going back to our [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=206&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This post is a continuation of the <a href="http://sqlbyminh.wordpress.com/2009/04/23/sql-joins/" target="_blank">post</a> on SQL joins.  Hopefully you have a better understanding of the types of joins available to you.  Now it&#8217;s time to see how we can use them to detect <em>orphaned</em> rows.</p>
<p>An orphaned row is a child row that has no parent row.  Going back to our example of the invoice metaphor, an invoice is usally broken down to two or more tables.  The <em>header</em> table contains rows that carry information about the invoice as a whole.  It would contain columns that usually have the following type of data:</p>
<ul>
<li>CUSTOMER_ID</li>
<li>INVOICE_ID</li>
<li>INVOICE_NUMBER</li>
<li>INVOICE_DATE</li>
<li>INVOICE_REF</li>
</ul>
<p>The invoice detail table usually contains information specific to an invoice line item.  An example of some columns that can be found in this type of table are:</p>
<ul>
<li>INVOICE_LINE_ID</li>
<li>INVOICE_ID</li>
<li>LINE_NUMBER</li>
<li>ITEM_NUMBER</li>
<li>QTY</li>
<li>UNIT_PRICE</li>
<li>LINE_TOTAL</li>
</ul>
<p>In the example above it is often said that the invoice header is the <em>parent</em> of the invoice detail and that the invoice detail is the <em>child</em> of the invoice header.  Normally, this relationship is easy to enforce.  Suppose however that the relationship hasn&#8217;t been enforced and you suspect that you have some orphaned detail records.  How could you go about finding them?</p>
<p>This is where the OUTER JOINS that we covered in our previous post comes in handy.  Again, using our previous test data that was defined in the previous <a href="http://sqlbyminh.wordpress.com/2009/04/23/sql-joins/" target="_blank">post</a> let&#8217;s write a query that finds any <em>invoice detail</em> line that does not have a invoice header associated to it.  Run the query below:</p>
<p><u>Code</u><br />
<code></p>
<pre>
select
   det.*
from dbo.invHdr hdr (nolock)
right join dbo.invDet det (nolock)
on det.invnum = hdr.invnum
where
   hdr.invnum is null
</pre>
<p></code></p>
<p>Notice that it returns one row.  If you were to visually inspect the data in the detail table you would find that this is the row that is missing an invoice header.  Notice that we are doing a RIGHT JOIN here.  Remember that a RIGHT JOIN returns all matching rows and all non-matching rows from the right.  Keep in mind that the header table is the &#8220;left&#8221; table and that the detail table is the &#8220;right&#8221; table.  How is it then that the query above returned only one row?</p>
<p>The answer lies in the WHERE clause that was specified.  It filters out the data after the join is made and tells SQL to return only the rows from the detail table where there is no matching header row.  This is an example of an <em>orphaned</em> record.</p>
<p>To detect header rows that have no detail (these aren&#8217;t considered orphaned rows because the term applies to child tables) we can run the query below:</p>
<p><u>Code</u></p>
<p><code></p>
<pre>
select
   hdr.*
from dbo.invHdr hdr (nolock)
left join dbo.invDet det (nolock)
on det.invnum = hdr.invnum
where
   det.invnum is null
</pre>
<p></code></p>
<p>Now let&#8217;s assume we want to find any rows in the invoice header or invoice detail table that is missing it&#8217;s associated parent or child row.  We could write the query like this:</p>
<p><u>Code</u></p>
<p><code></p>
<pre>
select
   hdr.*, det.*
from dbo.invHdr hdr (nolock)
full outer join dbo.invDet det (nolock)
on det.invnum = hdr.invnum
where
   hdr.invnum is null
   or det.invnum is null
</pre>
<p></code></p>
<p>Here we have instructed SQL (via the WHERE clause) to include any row where the invoice number is not in the header table or the detail table.  This in affect returns all rows where either a child is missing it&#8217;s parent or where the parent has no detail.</p>
<p>I hope you find this post and it&#8217;s companion <a href="http://sqlbyminh.wordpress.com/2009/04/23/sql-joins/" target="_blank">post</a> useful.  Please feel free to make any comments or suggestions regarding this post.</p>
<p>Minh</p>
<br /> Tagged: "left join", "orphaned row", "right join" <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sqlbyminh.wordpress.com/206/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sqlbyminh.wordpress.com/206/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sqlbyminh.wordpress.com/206/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sqlbyminh.wordpress.com/206/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sqlbyminh.wordpress.com/206/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sqlbyminh.wordpress.com/206/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sqlbyminh.wordpress.com/206/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sqlbyminh.wordpress.com/206/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sqlbyminh.wordpress.com/206/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sqlbyminh.wordpress.com/206/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sqlbyminh.wordpress.com/206/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sqlbyminh.wordpress.com/206/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sqlbyminh.wordpress.com/206/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sqlbyminh.wordpress.com/206/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=206&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sqlbyminh.wordpress.com/2009/04/24/detecting-orphaned-rows/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/5f2c86e3673f90375a5664df9574fed5?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">sqlbyminh</media:title>
		</media:content>
	</item>
		<item>
		<title>SQL Joins</title>
		<link>http://sqlbyminh.wordpress.com/2009/04/23/sql-joins/</link>
		<comments>http://sqlbyminh.wordpress.com/2009/04/23/sql-joins/#comments</comments>
		<pubDate>Thu, 23 Apr 2009 02:48:44 +0000</pubDate>
		<dc:creator>sqlbyminh</dc:creator>
				<category><![CDATA[MSSQL - Basics]]></category>
		<category><![CDATA["cross join"]]></category>
		<category><![CDATA["inner join"]]></category>
		<category><![CDATA["left join"]]></category>
		<category><![CDATA["outer join"]]></category>
		<category><![CDATA["right join"]]></category>
		<category><![CDATA[joins]]></category>

		<guid isPermaLink="false">http://sqlbyminh.wordpress.com/?p=100</guid>
		<description><![CDATA[What is a JOIN?  Simply put, it is an operation that combines records from two or more tables.  There are three types of joins in SQL.  They are as follows: INNER JOIN (most common type) OUTER JOIN CROSS JOIN Let&#8217;s look at an INNER JOIN.  What does it do?  An inner join is an operation [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=100&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>What is a <em>JOIN</em>?  Simply put, it is an operation that combines records from two or more tables.  There are three types of joins in SQL.  They are as follows:</p>
<ul>
<li>INNER JOIN (most common type)</li>
<li>OUTER JOIN</li>
<li>CROSS JOIN</li>
</ul>
<p>Let&#8217;s look at an INNER JOIN.  What does it do?  An inner join is an operation that combines two or more tables into one resultset based on matches found in one or more columns in each of the participating tables.  For example, let&#8217;s say you have an INVOICE_HEADER table and an INVOICE_DETAIL table.  Assume that the two tables are defined as follows:</p>
<p><u>Create Tables</u><br />
<code></p>
<pre>create table dbo.invHdr
(
  invnum varchar(5) not null
  ,customer varchar(30) not null
  ,invdate datetime not null default (getdate())
)
go

create table dbo.invDet
(
  invnum varchar(5) not null
  ,trandate datetime not null default (getdate())
  ,itemnum varchar(10) not null
  ,itemdesc varchar(100) not null
  ,qty numeric(10, 2) not null default (0)
  ,unitprice numeric(10, 2) not null
)
go</pre>
<p> <br />
</code></p>
<p>Now let&#8217;s create some test data for our examples:</p>
<p><u>Populate Tables</u><br />
<code></p>
<pre>----- Insert header records. -----
insert into dbo.invHdr
(
  invnum, customer
)
values
(
  '00001', 'CUSTA'
)

insert into dbo.invHdr
(
  invnum, customer
)
values
(
  '00002', 'CUSTA'
)

insert into dbo.invHdr
(
  invnum, customer
)
values
(
  '00003', 'CUSTB'
)

----- Insert detail records. -----
insert into dbo.invDet
(
  invnum
  ,trandate
  ,itemnum
  ,itemdesc
  ,qty
  ,unitprice
)
values
(
  '00001'
  ,'4/1/2008'
  ,'I-BEV1'
  ,'COKE, 6PACK'
  ,3
  ,5
)

insert into dbo.invDet
(
  invnum
  ,trandate
  ,itemnum
  ,itemdesc
  ,qty
  ,unitprice
)
values
(
  '00005'
  ,'4/10/2008'
  ,'I-BEV2'
  ,'COKE, 12PACK'
  ,1
  ,9
)</pre>
<p></code></p>
<p><u>INNER JOIN</u></p>
<p>Now that we&#8217;ve got some test data, let&#8217;s see how to do each type of join and look at the data it returns. First up is the INNER JOIN. Run this select statement:</p>
<p><code></p>
<pre>----- Inner join. -----
select
  a.*, b.*
from dbo.invHdr a (nolock)
join dbo.invDet b (nolock)
on b.invnum = a.invnum</pre>
<p></code></p>
<div id="attachment_142" class="wp-caption alignleft" style="width: 460px"><img class="size-full wp-image-142" title="innerjoin1" src="http://sqlbyminh.files.wordpress.com/2009/04/innerjoin11.gif?w=450&#038;h=50" alt="INNER JOIN example" width="450" height="50" /><p class="wp-caption-text">INNER JOIN example</p></div>
<p>Notice that the SELECT statement has returned a single row where a matching invoice number (invnum) was found in both the header and detail table.</p>
<p><u>LEFT JOIN</u></p>
<p>Now let&#8217;s look at what a LEFT JOIN would return. Run this statement:</p>
<p><code></p>
<pre>-- Left join.
select
  a.*, b.*
from dbo.invHdr a (nolock)
left join dbo.invDet b (nolock)
on b.invnum = a.invnum</pre>
<p></code></p>
<div id="attachment_143" class="wp-caption alignleft" style="width: 460px"><img class="size-full wp-image-142" title="leftjoin1" src="http://sqlbyminh.files.wordpress.com/2009/04/leftjoin1.gif?w=450&#038;h=50" alt="LEFT JOIN example" width="450" height="50" /><p class="wp-caption-text">LEFT JOIN example</p></div>
<p>Notice how the LEFT JOIN statement returns all matching rows from both tables <em>and</em> all rows from the &#8220;left&#8221; table <em>regardless</em> of whether there are rows in the &#8220;right&#8221; table. The image below depicts this graphically.</p>
<div id="attachment_144" class="wp-caption alignleft" style="width: 210px"><img class="size-full wp-image-142" title="leftjoin2" src="http://sqlbyminh.files.wordpress.com/2009/04/leftjoin2.gif?w=200&#038;h=50" alt="LEFT JOIN depiction" width="200" height="50" /><p class="wp-caption-text">LEFT JOIN depiction</p></div>
<p>The area in the middle are the matching rows. The area on the left (blue) are all the rows from the &#8220;left&#8221; table where no matches were found in the &#8220;right&#8221; table. However, since the type of JOIN is a LEFT JOIN, the rows from the left table are returned as well. The area on the right (white) are the rows from the &#8220;right&#8221; table. Only the <em>colored</em> areas are returned as part of the result.</p>
<p>You might be asking &#8220;How do I know which is the <em>left</em> table versus the <em>right</em> table?&#8221; The answer is simple. When you do any type of join, each table that is added to the join is considered to be the <em>right</em> table in relation to the table before it. For example, suppose we were to JOIN tables X, Y, and Z like so:</p>
<p>X &#8211;&gt; Y &#8211;&gt; Z</p>
<p>Here, X is considered to be the &#8220;left&#8221; table in relation to Y. Likewise, the table Y is considered to be the &#8220;left&#8221; table in relation to Z. It follows therefore that X is considered to be the &#8220;left&#8221; table in relation to Z. Some DBAs are under the misconception that the <em>equal</em> sign determines which table is &#8220;left&#8221; or &#8220;right&#8221;. Not so. For example, the two SELECT statements below would return identical results:</p>
<p><strong>SELECT statement 1</strong></p>
<p><code></p>
<pre>select a*, b.*
from dbo.invHdr a (nolock)
left join dbo.invDet b (nolock)
on a.invnum = b.invnum</pre>
<p></code></p>
<p><strong>SELECT statement 2</strong></p>
<p><code></p>
<pre>select a.*, b.*
from dbo.invHdr a (nolock)
left join dbo.invDet b (nolock)
on b.invnum = a.invnum</pre>
<p></code></p>
<p>As you can see, the same results are returned. The <strong><em>ON</em></strong> statement simply states the <em>relationship</em> between the two tables. That is, that they are related via the <em>invnum</em> column.</p>
<p><u>RIGHT JOIN</u></p>
<p>This type of join is simply the inverse of the LEFT JOIN. A <em>right join</em> returns all rows where matches were found in both tables as well as all rows from the right table. Here is a graphic that shows this concept:</p>
<div id="attachment_145" class="wp-caption aligncenter" style="width: 210px"><img class="size-full wp-image-142" title="rightjoin1" src="http://sqlbyminh.files.wordpress.com/2009/04/rightjoin1.gif?w=200&#038;h=50" alt="RIGHT JOIN depiction" width="200" height="50" /><p class="wp-caption-text">RIGHT JOIN depiction</p></div>
<p>If we were to re-write the SELECT statement used in the LEFT JOIN example to be a RIGHT JOIN, it would look like this:</p>
<p><code></p>
<pre>
select a.*, b.*
from dbo.invHdr a (nolock)
<strong><u>right</u></strong> join dbo.invDet b (nolock)
on b.invnum = a.invnum
</pre>
<p></code></p>
<p>Notice that the only change that was made was to change the word LEFT to RIGHT.  If you were to run this statement, you would see a result set similar to the LEFT JOIN example graphic.  The only difference would be that for some rows, the columns for the LEFT table would be NULL.</p>
<p>Keep in mind that is because we are asking for all rows that match between the two tables and ALL rows from the right table <em>regardless</em> if a match was found on the left table.</p>
<p><u>FULL OUTER JOIN</u></p>
<p>Now let&#8217;s look at a full outer join.  This type of join returns all matching rows and all non-matching rows.  Below is a depiction of this concept.  As you can see, all the colored sections indicate the rows that would be returned.</p>
<div id="attachment_146" class="wp-caption aligncenter" style="width: 210px"><img class="size-full wp-image-142" title="FULL OUTER JOIN depiction" src="http://sqlbyminh.files.wordpress.com/2009/04/fullouterjoindepiction.gif?w=200&#038;h=50" alt="FULL OUTER JOIN depiction" width="200" height="50" /><p class="wp-caption-text">FULL OUTER JOIN depiction</p></div>
<p>Run the following statement below to see the results:</p>
<p><u>Code</u></p>
<p><code></p>
<pre>
select
   a.*, b.*
from dbo.invHdr a (nolock)
full outer join dbo.invDet b (nolock)
on b.invnum = a.invnum
</pre>
<p></code></p>
<p>As you can see, the rows that are returned include both the matching rows and the <em>non-matching</em> rows.</p>
<div id="attachment_146" class="wp-caption aligncenter" style="width: 460px"><img class="size-full wp-image-147" title="FULL OUTER JOIN example" src="http://sqlbyminh.files.wordpress.com/2009/04/fullouterjoin1.gif?w=450&#038;h=50" alt="FULL OUTER JOIN example" width="450" height="50" /><p class="wp-caption-text">FULL OUTER JOIN example</p></div>
<p><u>CROSS JOIN</u></p>
<p>A CROSS JOIN is one in which <em>all possible combinations</em> of the two tables are returned.  Run the code below to see the output:</p>
<p><code></p>
<pre>
select
   a.*, b.*
from dbo.invHdr a (nolock)
cross join dbo.invDet b (nolock)
</pre>
<p></code></p>
<p>As you can see from the graphic below, all the combinations of the records from both tables are returned.</p>
<div id="attachment_146" class="wp-caption aligncenter" style="width: 460px"><img class="size-full wp-image-148" title="CROSS JOIN example" src="http://sqlbyminh.files.wordpress.com/2009/04/crossjoin1.gif?w=450&#038;h=50" alt="CROSS JOIN example" width="450" height="50" /><p class="wp-caption-text">CROSS JOIN example</p></div>
<p>In my next post I will show how you can use the LEFT JOIN, RIGHT JOIN or FULL OUTER JOIN to help you fined <em>orphaned records</em> in your tables.  Orphaned records are a violation of <em>database normalization</em> rules and should be addressed if and when they are found.</p>
<p>Check back in a couple of days.  Any feedback and/or comments are welcomed.<br />
Minh</p>
<p>My post on detecting orphaned records is available <a href="http://sqlbyminh.wordpress.com/2009/04/24/detecting-orphaned-rows/" target="_blank">here</a></p>
<br /> Tagged: "cross join", "inner join", "left join", "outer join", "right join", joins <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sqlbyminh.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sqlbyminh.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sqlbyminh.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sqlbyminh.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sqlbyminh.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sqlbyminh.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sqlbyminh.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sqlbyminh.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sqlbyminh.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sqlbyminh.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sqlbyminh.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sqlbyminh.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sqlbyminh.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sqlbyminh.wordpress.com/100/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=100&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sqlbyminh.wordpress.com/2009/04/23/sql-joins/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/5f2c86e3673f90375a5664df9574fed5?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">sqlbyminh</media:title>
		</media:content>

		<media:content url="http://sqlbyminh.files.wordpress.com/2009/04/innerjoin11.gif" medium="image">
			<media:title type="html">innerjoin1</media:title>
		</media:content>

		<media:content url="http://sqlbyminh.files.wordpress.com/2009/04/leftjoin1.gif" medium="image">
			<media:title type="html">leftjoin1</media:title>
		</media:content>

		<media:content url="http://sqlbyminh.files.wordpress.com/2009/04/leftjoin2.gif" medium="image">
			<media:title type="html">leftjoin2</media:title>
		</media:content>

		<media:content url="http://sqlbyminh.files.wordpress.com/2009/04/rightjoin1.gif" medium="image">
			<media:title type="html">rightjoin1</media:title>
		</media:content>

		<media:content url="http://sqlbyminh.files.wordpress.com/2009/04/fullouterjoindepiction.gif" medium="image">
			<media:title type="html">FULL OUTER JOIN depiction</media:title>
		</media:content>

		<media:content url="http://sqlbyminh.files.wordpress.com/2009/04/fullouterjoin1.gif" medium="image">
			<media:title type="html">FULL OUTER JOIN example</media:title>
		</media:content>

		<media:content url="http://sqlbyminh.files.wordpress.com/2009/04/crossjoin1.gif" medium="image">
			<media:title type="html">CROSS JOIN example</media:title>
		</media:content>
	</item>
		<item>
		<title>SQL &#8211; Unions</title>
		<link>http://sqlbyminh.wordpress.com/2009/04/15/sql-unions/</link>
		<comments>http://sqlbyminh.wordpress.com/2009/04/15/sql-unions/#comments</comments>
		<pubDate>Wed, 15 Apr 2009 02:32:29 +0000</pubDate>
		<dc:creator>sqlbyminh</dc:creator>
				<category><![CDATA[MSSQL - Basics]]></category>
		<category><![CDATA[select]]></category>
		<category><![CDATA[union]]></category>

		<guid isPermaLink="false">http://sqlbyminh.wordpress.com/?p=23</guid>
		<description><![CDATA[This post is in response to Mike&#8217;s request regarding UNIONs. Thanks Mike for the suggestion. What is a union?  A union is a SQL command that combines two resultsets together in one heterogenous view.  Let me elaborate.  In many database designs, it is common to partition data horizontally.  A classic example is dealing with sales [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=23&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This post is in response to Mike&#8217;s request regarding UNIONs.  Thanks Mike for the suggestion.</p>
<p>What is a <em>union</em>?  A union is a SQL command that combines two resultsets together in one heterogenous view.  Let me elaborate.  In many database designs, it is common to partition data <a href="http://blog.sqlauthority.com/2008/01/25/sql-server-2005-database-table-partitioning-tutorial-how-to-horizontal-partition-database-table/" target="_blank"><span style="color:#0000ff;">horizontally</span></a>.  A classic example is dealing with <em>sales</em> data.  As a database accumulates sales data over several years, a DBA might split the data horizontally so that the current year&#8217;s data is in one table (we&#8217;ll call this table CURRENT_SALES) and all other sales data that is older than 1 year is in an archive table (HISTORICAL_SALES).</p>
<p>Let&#8217;s say that you have a need to provide to the executive team sales data that encompasses the current year and sales data going back to 3 years prior to current year.  Of course this needs to be on the same report.  One way you might want to do this is to create a <em>temporary table</em> and insert the data from the CURRENT_SALES table then do an insert using data from the HISTORICAL_SALES table.  Finally, you would return the data in the form of a SELECT. Here&#8217;s an example:</p>
<p>Assume that your current sales table has the following structure:<br />
<code></p>
<pre>
create table dbo.current_sales
(
  customer_id int not null
  ,sale_date datetime not null default (getdate())
  ,revenue numeric(19, 5) not null
)
</pre>
<p></code></p>
<p>Further assume that your historical sales table has the following structure:</p>
<p><code></p>
<pre>
create table dbo.historical_sales
(
  customer_id int not null
  ,sale_date datetime not null
  ,revenue numeric(19, 5) not null
  ,archive_date datetime not null default (getdate())
)
</pre>
<p></code></p>
<p>One way to combine the data for the report might be something like this:</p>
<p><code></p>
<pre>
-- Create a temporary table.
create table #tSalesData
(
   customer_id int not null
   ,sale_date datetime not null
   ,revenue numeric(19, 5) not null
   ,row_display_order int identity(1,1) not null
)
go

-- Get sales data from current year.
insert into #tSalesData
(
   customer_id
   ,sale_date
   ,revenue
)
select
   customer_id
   ,sale_date
   ,revenue
from dbo.current_sales
order by
   sale_date

-- Get sales data from historical table.
insert into #tSalesData
(
   customer_id
   ,sale_date
   ,revenue
)
select
   customer_id
   ,sale_date
   ,revenue
from dbo.historical_sales
where
   sale_date between dateadd(yy, -4, getdate()) and getdate()
order by
   sale_date

-- Return data.
select * from #tSalesData
</pre>
<p></code></p>
<p>Keep in mind that this is a trivial example and one that I would not implement in a production environment. It is for illustrative purposes only. Even so, this example suffers from two main issues.</p>
<p>First, it uses a temporary table to store data that is fetched from the current sales table and historical sales table. While there is nothing <em>functionally</em> wrong with using a temporary table, it does come at a cost.  Any time you use a temporary table there are I/O costs inherent in any INSERT, UPDATE or DELETE operation performed on the table.</p>
<p>Secondly, it uses an ORDER BY statement to order the data by <em>sale_date</em>.  In the above code example, we have <em>four</em> occurrences of I/O cost:</p>
<ul>
<li>The SELECT statement to fetch data for current year.</li>
<li>The SELECT statement to fetch data from history.</li>
<li>The INSERT statement to store the data.</li>
<li>The SELECT statement to return the data.</li>
</ul>
<p>While this may be acceptable for a relatively small dataset, it&#8217;s not feasible when returning larger datasets.  The example below shows how to accomplish the same result using a UNION:</p>
<p><code></p>
<pre>
select
   customer_id
   ,sale_date
   ,revenue
from dbo.current_sales
UNION
select
   customer_id
   ,sale_date
   ,revenue
from historical_sales
where
   sale_date between dateadd(yy, -4, getdate()) and getdate()
order by sale_date
</pre>
<p></code></p>
<p>The example above would incur <em>considerably</em> less I/O since there are only two read operations.  The previous example had <em>four</em> I/O operations.  Furthermore, recall that in the initial example <em>two</em> ORDER BY clauses were used to order the data.  In the example above, note that only <em>one</em> ORDER BY clause is used.</p>
<p>Note that in the example above duplicates would be removed since we did not use the ALL argument.  Using the ALL argument <em>preserves</em> duplicates.</p>
<p>As a final note, be aware that when using the UNION statement, it is <em>required</em> that all datasets that participate in the UNION <em>must</em> have the same column names and similiar data types.</p>
<p>Some of you may have come to the conclusion that a UNION is similiar to a JOIN in that both allow you to fetch and combine data from two or more tables.  I will have a post on JOINs and their usage next week and finish up with a post on their differences.</p>
<p>As always, I welcome any comments, insights and/or corrections.</p>
<p>Minh</p>
<br /> Tagged: select, union <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sqlbyminh.wordpress.com/23/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sqlbyminh.wordpress.com/23/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/sqlbyminh.wordpress.com/23/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/sqlbyminh.wordpress.com/23/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/sqlbyminh.wordpress.com/23/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/sqlbyminh.wordpress.com/23/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/sqlbyminh.wordpress.com/23/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/sqlbyminh.wordpress.com/23/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/sqlbyminh.wordpress.com/23/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/sqlbyminh.wordpress.com/23/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/sqlbyminh.wordpress.com/23/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/sqlbyminh.wordpress.com/23/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/sqlbyminh.wordpress.com/23/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/sqlbyminh.wordpress.com/23/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sqlbyminh.wordpress.com&amp;blog=7350068&amp;post=23&amp;subd=sqlbyminh&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sqlbyminh.wordpress.com/2009/04/15/sql-unions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/5f2c86e3673f90375a5664df9574fed5?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">sqlbyminh</media:title>
		</media:content>
	</item>
	</channel>
</rss>
