Last week I was working on reporting. QA team logged a bug that was not getting reproduced on development environment. First I reviewed the code at development, but was unable to figure the reason of count mismatch with details. Then I got the code reviewed by one of my seniors. He suggested me to replace inner join with intersect in a couple of conditions. I, till that day, used to this consider inner join and intersect as similar. But to my amazement both are completely different. So sharing my findings about difference between inner join and intersect:
Let’s have 2 different tables and we want the distinct rows which are common in both the tables what will we do to get them? Most of us will answer that they will use INNER JOIN to get the common rows of two tables, right? Wrong!!!!!!
This is a big misconception that INNER JOIN will always return all the common rows between two tables. In reality, INNER JOIN treats two NULLS as two different values rather than a same value so if you are joining a row based on a NULLable column there is a chance that if both tables have NULLs in that column then INNER JOIN will ignore those rows because
Two NULLs are not same in TSQL
To correctly retrieve all common rows between two tables, SQL Server 2005 has introduced the INTERSECT keyword. INTERSECT treats two NULLs as a same value and it returns all rows which are common in both the tables.
Additionally INNER JOIN retrieves all the records from the left table and all the records from the right table. Carefully observing we can notice many of the records as duplicate records. When INNER JOIN is used it gives us duplicate records, but that is not in the case of INTERSECT operator.
The duplicate records in UNION can be removed by using DISTINCT clause in SELECT statement. DISTINCT removes the duplicate rows and final result is exactly same in INTERSECT. In this way, INNER JOIN can simulate with INTERSECT when used with DISTINCT.
The results of an UPDATE statement are undefined if the statement includes a FROM clause that is not specified in such a way that only one value is available for each column occurrence that is updated (in other words, if the UPDATE statement is not deterministic). For example, given the UPDATE statement in the following script, both rows in table s meet the qualifications of the FROM clause in the UPDATE statement, but it is undefined which row from s is used to update the row in table t.
There aren’t many articles comparing performance between the two (UNION and INTERSECT) because they’re meant to substitute one for the other. INTERSECT is meant to find the distinct values combining two queries. That’s a very different goal than the INNER JOIN which is not finding distinct values, but values that simply match between two tables, derived or otherwise.
As a matter of fact, it’s worse than that since the INTERSECT operation requires that the values returned by both queries match in data type, so you couldn’t even do a normal INNER JOIN operation between disparate tables using INTERSECT.
But with similar data intersect takes a little less effort