Comparing E-mail Address Validating Regular Expressions
Comparing E-mail Address Validating Regular Expressions
Updated: 2/3/2012
Summary
This page compares regular expressions that validate e-mail addresses in order to find the best one. The expression with the best score is currently the one used by PHP's filter_var(), which is based on a regex by Michael Rushton:
/^(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){255,})(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){65,}@)(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22))(?:\.(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\]))$/iD
The best one that's been verified to work in JavaScript is Arluison Guillaume's improvement of Warren Gaebel's regex:
/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+)*\.(aero|arpa|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org|pro|travel|mobi|[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i
Introduction
I was writing a web application and wanted to find a regular expression to validate user's e-mail addresses, but when I searched for one I found dozens of slightly different variations of the same expression. Almost all of the posts were followed by comments describing examples of false-positives or false-negatives. But dispite all the criticism, I couldn't find a definitive "best" expression. So, I setup arrays of some of the most promising ones to run against a set of known valid and invalid addresses. I think that it's better to accept a few invalid addresses than reject any valid ones, so I'm shooting for 0 false-negatives and as few false-positives as possible.
It's been about 5 years since I initially created this, and I think we've found a relatively solid answer, but if you know of an expression that tests better than the current one, or if you have some other feedback feel free to contact me. If you feel think that any of the addresses are incorrectly labeled, please take it up with Cal Henderson or Dominic Sayers, since I've used their test data.
Notes
There's no perfect regular expression to validate e-mail addresses
If you need something more advanced than a simple regex, check out RFC 822/2822/5322 Email Address Parser in PHP or is_email() Address Validation.
I think it's better to allow any TLD, even if it doesn't exist, than to put a static list of currently-existing TLDs into the regex. The problem with a static list is that the regex will be used in production environments, but probably won't be updated when new TLDs come out.
Some of these are dependent on being executed using case-insenstive regex functions.
This page uses PHP's ereg() and preg_match() functions, but some of these will work with JavaScript and other languages
These regex's only check if the address is syntactically valid. It's entirely possible that an address which does not exist on any mail server would pass this test. Checking to see if an address actually exists isn't always practical or necessary, though.
Slow regex's can be a security risk.
Remember the Robustness Principle when handling addresses that fail to validate.
Some of the test addresses below are very long, and have been truncated for displaying.
Detailed...