Thursday, October 18, 2012

Case Sensitive MySQL Searches

MySQL's support for case sensitive search is explained somewhat opaquely in the aptly titled Case Sensitivity in String Searches documentation. In short, it explains that by default, MySQL won't treat strings as case sensitive when executing a statement such as:

SELECT first_name FROM contacts WHERE first_name REGEXP '^[a-z]';

This simple search to look for contacts whose first name starts with a lower case letter, will return *all* contacts because in the default character set used by MySQL (latin1), upper and lower case letters share the same collation. The documentation for both MySQL and PostgreSQL have lengthy discussions on the topic.

Enough with the backstory, how do I perform case sensitive searches?!

The docs say to convert the string representation to a binary one. This allows "comparisons [to] use the numeric values of the bytes in the operands". Let's see it in action:

SELECT first_name FROM contacts WHERE BINARY(first_name) REGEXP '^[a-z]';

There are other strategies available, such as changing the character set being used for comparisons with the COLLATE function. This would likely work better for cases where you had many columns to compare.

SELECT first_name FROM contacts WHERE first_name REGEXP '^[a-z]' COLLATE latin1_bin;

You can even go so far as to have MySQL switch character sets and collations. But you do have to do this for each database, each table, and each column you need to convert. Not terribly fun.

No comments:

Post a Comment