Why I Hate Regular Expressions

I have a love-hate relationship with regular expressions.  I admire their power yet I hate the pain and agony they have caused on more than one occasion. Just recently one of my colleagues used a regular expression for validating passwords. It seems pretty fair to use regular expressions for validating character sequence patterns. After all it seems they were created exactly for that.  However when regular expressions fail they fail hard.  What I mean by that is that they usually fail in the production environment.

In case you are wondering what a regular expression is take a look at the wikipedia explanation of regular expressions.

So we go live and all of a sudden new users can't be created because the regular expression says that the password is not valid due to one of the complexity rules.  But how?  The regular expression passed all the unit tests.  In fact the application works fine in our development and test deployment.  So what happened? The validation works fine until it hits the regular expression rules so we know the javaScript is running because the rules prior to the first regex test works. However, it seems that the regular expression is not running.  Up until now I am not sure how the issue was resolved. I know it has been resolved and I can't even say for certain that it was the regular expression.  I won't get into the details why I won't discuss the outcome of the resolution but suffice it to say that is somewhat of a sensitive and restricted environment where the app was deployed.

What I will say is that when you choose regular expressions, you have to really consider the trade offs.  The regular expression in question, was testing things like checking for a fixed number of lower case characters, a fixed number of upper case characters, a fixed number of special characters (from a white list), and a fixed number of numeric characters.  These checks can easily be done with traditional checks.  The code is a lot shorter with the power of regular expressions, but what are you really gaining and what are you losing?

Lets take a look.

By using regular expressions, you gain performance and brevity.

It's really short when you code

\((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%]).{6,20}).test('MYpassword12#$').

To understand the regex above take a look at this post.

As opposed to

if(HasAtLeastTwoUpper("MYpassword12#$'")  && HasAtLeastTwoLower("MYpassword12#$'") ....more checks...


By using traditional methods you gain readability due to explicit validation rules. In other words you don't have to pull out a regex manual to understand the regex. You lose brevity because the code will be a little bit longer and you lose performance.

So you gain performance but when you consider that you are doing an iteration of 1. In other words you are only validating one password at a time the performance gain is negligible.  If on the other hand we were validating tens of thousands of passwords in a batch process there might actually be an advantage to choosing regular expressions over the traditional approach.

Like I said, I don't even know that it was the regular expressions that were at fault in this case. I can say that I have experienced this type of problem before and the cause was that one platform executed regular expressions different than another system causing the regex to work in one environment and fail in the other.  But that's just it, using regular expressions gives just more thing that can go wrong. One more thing that can be implemented incorrectly by the browser the user has.

When you add regular expressions, you are adding one more point of failure so you better make sure that you really need that regex. My advice, otherwise, would be stay away from regular expressions.  As the old saying goes "if you have a problem and you solve it with regular expressions, now you have two problems".

Comments

Popular posts from this blog

Simple Example of Using Pipes with C#

Putting Files on the Rackspace File Cloud