用正则表达式验证邮件地址似乎是一件简单的事情,但是如果要完美的验证一个合规的邮件地址,其实也许很复杂。邮件地址的规范来自于 RFC 5322 。有一个网站 emailregex.com 专门列出各种编程语言下的验证邮件地址的正则表达式,其中很多正则表达式都是我听说过而从未见过的复杂——我想说,做这个网站的程序员是被邮件验证这件事伤害了多深啊!其实,在产品环境中,一般来说并不需要这么复杂的正则表达式来做到99.99%正确。一般来说,从执行效率和测试覆盖率来说,只需要一个简单的版本即可:
/^[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}$/i
那么下面我们来看看这些更严谨、更复杂的正则表达式吧:
验证邮件地址的通用正则表达式(符合 RFC 5322 标准)
(?:[a-z0-9!#$%&"*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&"*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|\[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0bx0cx0e-x1fx21-x5ax53-x7f]|\[x01-x09x0bx0cx0e-x7f])+)])
由于各种语言对正则表达式的支持不同、语法差异和覆盖率不同,所以,不同语言里面的正则表达式也不同:
Python
这个是个简单的版本:
r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+$)"
Javascript
这个有点复杂了:
/^[-a-z0-9~!$%^&*_=+}{"?]+(.[-a-z0-9~!$%^&*_=+}{"?]+)*@([a-z0-9_][-a-z0-9_]*(.[-a-z0-9_]+)*.(aero|arpa|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org|pro|travel|mobi|[a-z][a-z])|([0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}))(:[0-9]{1,5})?$/i
Swift
[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}
PHP
PHP 的这个版本就更复杂了,覆盖率就更大一些:
/^(?!(?:(?:x22?x5C[x00-x7E]x22?)|(?:x22?[^x5Cx22]x22?)){255,})(?!(?:(?:x22?x5C[x00-x7E]x22?)|(?:x22?[^x5Cx22]x22?)){65,}@)(?:(?:[x21x23-x27x2Ax2Bx2Dx2F-x39x3Dx3Fx5E-x7E]+)|(?:x22(?:[x01-x08x0Bx0Cx0E-x1Fx21x23-x5Bx5D-x7F]|(?:x5C[x00-x7F]))*x22))(?:.(?:(?:[x21x23-x27x2Ax2Bx2Dx2F-x39x3Dx3Fx5E-x7E]+)|(?:x22(?:[x01-x08x0Bx0Cx0E-x1Fx21x23-x5Bx5D-x7F]|(?:x5C[x00-x7F]))*x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))]))$/iD
Perl / Ruby
对与 PHP 的版本,Perl 和 Ruby 表示不服,可以更严谨:
(?:(?:
)?[ ])*(?:(?:(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*|(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)*<(?:(?:
)?[ ])*(?:@(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*(?:,@(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*)*:(?:(?:
)?[ ])*)?(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*>(?:(?:
)?[ ])*)|(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)*:(?:(?:
)?[ ])*(?:(?:(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*|(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)*<(?:(?:
)?[ ])*(?:@(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*(?:,@(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*)*:(?:(?:
)?[ ])*)?(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*>(?:(?:
)?[ ])*)(?:,s*(?:(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*|(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)*<(?:(?:
)?[ ])*(?:@(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*(?:,@(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*)*:(?:(?:
)?[ ])*)?(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*>(?:(?:
)?[ ])*))*)?;s
Perl 5.10 及以后版本
上面的版本,嗯,我可以说是天书吗?反正我是没有解读的想法了。当然,新版本的 Perl 语言还有一个更易读的版本(你是说真的么?)
/(?(DEFINE)(?<address>(?&mailbox)|(?&group))(?<mailbox>(?&name_addr)|(?&addr_spec))(?<name_addr>(?&display_name)?(?&angle_addr))(?<angle_addr>(?&CFWS)?<(?&addr_spec)>(?&CFWS)?)(?<group>(?&display_name):(?:(?&mailbox_list)|(?&CFWS))?;(?&CFWS)?)(?<display_name>(?&phrase))(?<mailbox_list>(?&mailbox)(?:,(?&mailbox))*)(?<addr_spec>(?&local_part) @ (?&domain))(?<local_part>(?&dot_atom)|(?"ed_string))(?<domain>(?&dot_atom)|(?&domain_literal))(?<domain_literal>(?&CFWS)? [ (?:(?&FWS)?(?&dcontent))*(?&FWS)?] (?&CFWS)?)(?<dcontent>(?&dtext)|(?"ed_pair))(?<dtext>(?&NO_WS_CTL)|[x21-x5ax5e-x7e])(?<atext>(?&ALPHA)|(?&DIGIT)|[!#$%&"*+-/=?^_`{|}~])(?<atom> (?&CFWS)? (?&atext)+ (?&CFWS)?)(?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)?)(?<dot_atom_text> (?&atext)+ (?: . (?&atext)+)*)(?<text> [x01-x09x0bx0cx0e-x7f])(?<quoted_pair> \ (?&text))(?<qtext> (?&NO_WS_CTL) | [x21x23-x5bx5d-x7e])(?<qcontent> (?&qtext) | (?"ed_pair))(?<quoted_string> (?&CFWS)? (?&DQUOTE) (?:(?&FWS)? (?&qcontent))*(?&FWS)? (?&DQUOTE) (?&CFWS)?)(?<word> (?&atom) | (?"ed_string))(?<phrase> (?&word)+)# Folding white space(?<FWS> (?: (?&WSP)* (?&CRLF))? (?&WSP)+)(?<ctext> (?&NO_WS_CTL) | [x21-x27x2a-x5bx5d-x7e])(?<ccontent> (?&ctext) | (?"ed_pair) | (?&comment))(?<comment> ( (?: (?&FWS)? (?&ccontent))* (?&FWS)? ) )(?<CFWS> (?: (?&FWS)? (?&comment))*(?: (?:(?&FWS)? (?&comment)) | (?&FWS)))# No whitespace control(?<NO_WS_CTL> [x01-x08x0bx0cx0e-x1fx7f])(?<ALPHA> [A-Za-z])(?<DIGIT> [0-9])(?<CRLF> x0d x0a)(?<DQUOTE> ")(?<WSP> [x20x09]))(?&address)/x
Ruby (简单版)
Ruby 表示,其实人家还有个简单版本:/A([w+-].?)+@[a-zd-]+(.[a-z]+)*.[a-z]+z/i
.NET
这样的版本谁没有啊——.NET 说:
^w+([-+."]w+)*@w+([-.]w+)*.w+([-.]w+)*$
grep 命令
用 grep 命令在文件中查找邮件地址,我想你不会写个若干行的正则表达式吧,意思一下就行了:
$ grep -E -o "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,6}" filename.txt
SQL Server
在 SQL Server 中也是可以用正则表达式的,不过这个代码片段应该是来自某个产品环境中的,所以,还体贴的照顾了那些把邮件地址写错的人:
select email from table_name wherepatindex ("%[ &"",":;!+=/()<>]%", email)>0--Invalid charactersor patindex ("[@.-_]%", email)>0--Valid but cannot be starting characteror patindex ("%[@.-_]", email)>0--Valid but cannot be ending characteror email not like "%@%.%"--Must contain at least one @and one .or email like "%..%"--Cannot have two periods in a rowor email like "%@%@%"--Cannot have two @ anywhereor email like "%.@%"or email like "%@.%"--Cannot have @and.next to each otheror email like "%.cm"or email like "%.co"--CamaroonorColombia?Typos.or email like "%.or"or email like "%.ne"--Missinglast letter
Oracle PL/SQL
这个是不是有点偷懒?尤其是在那些“复杂”的正则表达式之后:
SELECT email FROM table_nameWHERE REGEXP_LIKE (email,"[A-Z0-9._%-]+@[A-Z0-9._%-]+.[A-Z]{2,4}");
MySQL
好吧,看来最后也一样懒:
SELECT * FROM `users` WHERE `email` NOT REGEXP "^[A-Z0-9._%-]+@[A-Z0-9.-]+.[A-Z]{2,4}$";
那么,你有没有关于验证邮件地址的正则表达式分享给大家?Linux正则表达式sed 详述 http://www.linuxidc.com/Linux/2015-04/116309.htmLinux正则表达式特性及BRE与ERE的区别 http://www.linuxidc.com/Linux/2014-03/99152.htmgrep使用简明及正则表达式 http://www.linuxidc.com/Linux/2013-08/88534.htm正则表达式的用法 http://www.linuxidc.com/Linux/2013-03/81897.htm正则表达式之零宽断言 http://www.linuxidc.com/Linux/2013-03/81897.htmLinux中正则表达式与文件格式化处理命令(awk/grep/sed) http://www.linuxidc.com/Linux/2013-03/81018.htm基础正则表达式 http://www.linuxidc.com/Linux/2014-09/106296.htm常用正则表达式整理 http://www.linuxidc.com/Linux/2014-10/108076.htm
本文永久更新链接地址:http://www.linuxidc.com/Linux/2015-08/121155.htm