Archive for February, 2010

on NULL and NOT IN

February 27, 2010

I’ve been trying to think of something “big” to write about for so long, I haven’t written anything.  So I’ll write about something “small” that I found out the other day.

It turns out, ‘NOT IN’ and ‘NULL’ can have an odd (to me) effect.

Say you are doing something like

SELECT a FROM table1
WHERE a NOT IN
(SELECT a FROM table2);

If there are any NULLs in the table2’s a column, you will never get any results from this query.
Here’s an example:

— first, here are the two tables I used:

mysql> select * from test1;
+——+
| a    |
+——+
|    1 |
|    2 |
|    3 |
|    4 |
|    5 |
|    6 |
+——+
6 rows in set (0.02 sec)

mysql> select * from test2;
+——+
| a    |
+——+
|    1 |
|    2 |
|    3 |
| NULL |
+——+
4 rows in set (0.00 sec)

mysql> select a from test1 where a not in (select a from test2);
Empty set (0.02 sec)

— personally, I expected to get 4,5,6.  For that, you can do the following:

mysql> select a from test1 where a not in (select a from test2 where a is NOT NULL);
+——+
| a    |
+——+
|    4 |
|    5 |
|    6 |
+——+
3 rows in set (0.00 sec)

Why is this?  It is actually to follow the SQL standards on NULL.  As mentioned at http://dev.mysql.com/doc/refman/5.1/en/comparison-operators.html ,

“To comply with the SQL standard, IN returns NULL not only if the expression on the left hand side is NULL, but also if no match is found in the list and one of the expressions in the list is NULL.”

Just as “1=NULL” is undefined rather than false, and “1<>NULL” is too, “1 in (null)” is also undefined (or null), and “1 NOT IN (null) is null”.

mysql> select 1 in (null);
+————-+
| 1 in (null) |
+————-+
|        NULL |
+————-+
1 row in set (0.00 sec)

mysql> select 1 not in (null);
+—————–+
| 1 not in (null) |
+—————–+
|            NULL |
+—————–+
1 row in set (0.00 sec)

To simplify the examples with tables above, we can do the following, and see similar results.

mysql> select 1 not in (2);
+————–+
| 1 not in (2) |
+————–+
| 1            |
+————–+
1 row in set (0.00 sec)

mysql> select 1 not in (2,null);
+——————-+
| 1 not in (2,null) |
+——————-+
| NULL              |
+——————-+
1 row in set (0.00 sec)

Personally, once I read the reasoning, it made sense to me, but initially I was surprised.  I thought I’d write about it, because it is common to do a “SELECT … WHERE NOT IN (SELECT …)”, as in the example above with table1 and table2, and the results might not be what you expect if you have any NULLs in your table.

Advertisements