Monday, October 19, 2009

Regular expression to check email address (PHP)

Here's the regex to check email address according to totallyphp.co.uk
^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$

The valid characters for the username part are: underscore,hyphen,letter from a to z, number from 0 to 9.
The valid characters for the domain part are: same as username, but it doesn't include the underscore. In addition, the last part of the domain can only contain 2 or 3 letters (ex: com,au).

What's wrong with this regular expression?
The obvious one is the last part. I own a domain name that end with 'info'. 'info' has 4 characters, so it won't match the last rule, which can only accept 2 or 3 character.

Now, let's take a look at the first part: ^[_a-z0-9-]+
Based on this part, the email address can start with an underscore, letter, number, or hyphen. However, email address can only start with a letter or number. These email address are not valid: _jim@example.com, -jim@example.com

Let's improve the regular expression.
To fix the first mistake, just change the last part from {2,3} to {2,4}.

For the second one, we have to make sure that email address must not start with an underscore or hyphen.
We have to change ^[_a-z0-9-]+ to ^[a-z0-9]+
Then, we have to include the underscore and hyphen in the next part: [_a-z0-9-]*

The complete one is: ^[a-z0-9]+[_a-z0-9-]*(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$

Update:
When I did the practice, I misplaced the hyphen position, and the regex didn't work properly. Instead of [_a-z0-9-], I wrote [_-a-z0-9]
Hyphen has a special meaning in regular expression. So it's better to 'escape' the hyphen. This one will work [_\-a-z0-9]


Sources:
http://www.totallyphp.co.uk/code/validate_an_email_address_using_regular_expressions.htm
http://www.melbourneit.com.au/help/index.php?questionid=1118
http://www.alertsite.com/help/RegexMatching.html

No comments: