Welcome 微信登录
编程资源 图片资源库 蚂蚁家优选 PDF转换器

首页 / 操作系统 / Linux / 史上最复杂的验证邮件地址的正则表达式

用正则表达式验证邮件地址似乎是一件简单的事情,但是如果要完美的验证一个合规的邮件地址,其实也许很复杂。邮件地址的规范来自于 RFC 5322 。有一个网站 emailregex.com 专门列出各种编程语言下的验证邮件地址的正则表达式,其中很多正则表达式都是我听说过而从未见过的复杂——我想说,做这个网站的程序员是被邮件验证这件事伤害了多深啊!其实,在产品环境中,一般来说并不需要这么复杂的正则表达式来做到99.99%正确。一般来说,从执行效率和测试覆盖率来说,只需要一个简单的版本即可:
  1. /^[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}$/i
那么下面我们来看看这些更严谨、更复杂的正则表达式吧: 

验证邮件地址的通用正则表达式(符合 RFC 5322 标准)

  1. (?:[a-z0-9!#$%&"*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&"*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|\[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0bx0cx0e-x1fx21-x5ax53-x7f]|\[x01-x09x0bx0cx0e-x7f])+)])
由于各种语言对正则表达式的支持不同、语法差异和覆盖率不同,所以,不同语言里面的正则表达式也不同: 

Python

这个是个简单的版本:
  1. r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+$)"
 

Javascript

这个有点复杂了:
  1. /^[-a-z0-9~!$%^&*_=+}{"?]+(.[-a-z0-9~!$%^&*_=+}{"?]+)*@([a-z0-9_][-a-z0-9_]*(.[-a-z0-9_]+)*.(aero|arpa|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org|pro|travel|mobi|[a-z][a-z])|([0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}))(:[0-9]{1,5})?$/i
 

Swift

  1. [A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}
 

PHP

PHP 的这个版本就更复杂了,覆盖率就更大一些:
  1. /^(?!(?:(?:x22?x5C[x00-x7E]x22?)|(?:x22?[^x5Cx22]x22?)){255,})(?!(?:(?:x22?x5C[x00-x7E]x22?)|(?:x22?[^x5Cx22]x22?)){65,}@)(?:(?:[x21x23-x27x2Ax2Bx2Dx2F-x39x3Dx3Fx5E-x7E]+)|(?:x22(?:[x01-x08x0Bx0Cx0E-x1Fx21x23-x5Bx5D-x7F]|(?:x5C[x00-x7F]))*x22))(?:.(?:(?:[x21x23-x27x2Ax2Bx2Dx2F-x39x3Dx3Fx5E-x7E]+)|(?:x22(?:[x01-x08x0Bx0Cx0E-x1Fx21x23-x5Bx5D-x7F]|(?:x5C[x00-x7F]))*x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))]))$/iD
 

Perl / Ruby

对与 PHP 的版本,Perl 和 Ruby 表示不服,可以更严谨:
  1. (?:(?: )?[ ])*(?:(?:(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*)*<(?:(?: )?[ ])*(?:@(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*)|(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*)*:(?:(?: )?[ ])*(?:(?:(?:[^()<>@,;:\".[]00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*)*<(?:(?: )?[ ])*(?:@(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[]00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*)(?:,s*(?:(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*)*<(?:(?: )?[ ])*(?:@(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|"(?:[^" \]|\.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:\".[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*))*)?;s
 

Perl 5.10 及以后版本

上面的版本,嗯,我可以说是天书吗?反正我是没有解读的想法了。当然,新版本的 Perl 语言还有一个更易读的版本(你是说真的么?) 
  1. /(?(DEFINE)
  2. (?<address>(?&mailbox)|(?&group))
  3. (?<mailbox>(?&name_addr)|(?&addr_spec))
  4. (?<name_addr>(?&display_name)?(?&angle_addr))
  5. (?<angle_addr>(?&CFWS)?<(?&addr_spec)>(?&CFWS)?)
  6. (?<group>(?&display_name):(?:(?&mailbox_list)|(?&CFWS))?;
  7. (?&CFWS)?)
  8. (?<display_name>(?&phrase))
  9. (?<mailbox_list>(?&mailbox)(?:,(?&mailbox))*)
  10. (?<addr_spec>(?&local_part) @ (?&domain))
  11. (?<local_part>(?&dot_atom)|(?&quoted_string))
  12. (?<domain>(?&dot_atom)|(?&domain_literal))
  13. (?<domain_literal>(?&CFWS)? [ (?:(?&FWS)?(?&dcontent))*(?&FWS)?
  14. ] (?&CFWS)?)
  15. (?<dcontent>(?&dtext)|(?&quoted_pair))
  16. (?<dtext>(?&NO_WS_CTL)|[x21-x5ax5e-x7e])
  17. (?<atext>(?&ALPHA)|(?&DIGIT)|[!#$%&"*+-/=?^_`{|}~])
  18. (?<atom> (?&CFWS)? (?&atext)+ (?&CFWS)?)
  19. (?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)?)
  20. (?<dot_atom_text> (?&atext)+ (?: . (?&atext)+)*)
  21. (?<text> [x01-x09x0bx0cx0e-x7f])
  22. (?<quoted_pair> \ (?&text))
  23. (?<qtext> (?&NO_WS_CTL) | [x21x23-x5bx5d-x7e])
  24. (?<qcontent> (?&qtext) | (?&quoted_pair))
  25. (?<quoted_string> (?&CFWS)? (?&DQUOTE) (?:(?&FWS)? (?&qcontent))*
  26. (?&FWS)? (?&DQUOTE) (?&CFWS)?)
  27. (?<word> (?&atom) | (?&quoted_string))
  28. (?<phrase> (?&word)+)
  29. # Folding white space
  30. (?<FWS> (?: (?&WSP)* (?&CRLF))? (?&WSP)+)
  31. (?<ctext> (?&NO_WS_CTL) | [x21-x27x2a-x5bx5d-x7e])
  32. (?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment))
  33. (?<comment> ( (?: (?&FWS)? (?&ccontent))* (?&FWS)? ) )
  34. (?<CFWS> (?: (?&FWS)? (?&comment))*
  35. (?: (?:(?&FWS)? (?&comment)) | (?&FWS)))
  36. # No whitespace control
  37. (?<NO_WS_CTL> [x01-x08x0bx0cx0e-x1fx7f])
  38. (?<ALPHA> [A-Za-z])
  39. (?<DIGIT> [0-9])
  40. (?<CRLF> x0d x0a)
  41. (?<DQUOTE> ")
  42. (?<WSP> [x20x09])
  43. (?&address)/x
 

Ruby (简单版)

Ruby 表示,其实人家还有个简单版本:/A([w+-].?)+@[a-zd-]+(.[a-z]+)*.[a-z]+z/i 

.NET

这样的版本谁没有啊——.NET 说:
  1. ^w+([-+."]w+)*@w+([-.]w+)*.w+([-.]w+)*$
 

grep 命令

用 grep 命令在文件中查找邮件地址,我想你不会写个若干行的正则表达式吧,意思一下就行了:
  1. $ grep -E -o "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,6}" filename.txt

SQL Server 

在 SQL Server 中也是可以用正则表达式的,不过这个代码片段应该是来自某个产品环境中的,所以,还体贴的照顾了那些把邮件地址写错的人:
  1. select email
  2. from table_name where
  3. patindex ("%[ &"",":;!+=/()<>]%", email)>0--Invalid characters
  4. or patindex ("[@.-_]%", email)>0--Valid but cannot be starting character
  5. or patindex ("%[@.-_]", email)>0--Valid but cannot be ending character
  6. or email not like "%@%.%"--Must contain at least one @and one .
  7. or email like "%..%"--Cannot have two periods in a row
  8. or email like "%@%@%"--Cannot have two @ anywhere
  9. or email like "%.@%"or email like "%@.%"--Cannot have @and.next to each other
  10. or email like "%.cm"or email like "%.co"--CamaroonorColombia?Typos.
  11. or email like "%.or"or email like "%.ne"--Missinglast letter
 

Oracle PL/SQL

这个是不是有点偷懒?尤其是在那些“复杂”的正则表达式之后:
  1. SELECT email
  2. FROM table_name
  3. WHERE REGEXP_LIKE (email,"[A-Z0-9._%-]+@[A-Z0-9._%-]+.[A-Z]{2,4}");
 

MySQL

好吧,看来最后也一样懒:
  1. SELECT * FROM `users` WHERE `email` NOT REGEXP "^[A-Z0-9._%-]+@[A-Z0-9.-]+.[A-Z]{2,4}$";
那么,你有没有关于验证邮件地址的正则表达式分享给大家?Linux正则表达式sed 详述  http://www.linuxidc.com/Linux/2015-04/116309.htmLinux正则表达式特性及BRE与ERE的区别 http://www.linuxidc.com/Linux/2014-03/99152.htmgrep使用简明及正则表达式 http://www.linuxidc.com/Linux/2013-08/88534.htm正则表达式的用法 http://www.linuxidc.com/Linux/2013-03/81897.htm正则表达式之零宽断言 http://www.linuxidc.com/Linux/2013-03/81897.htmLinux中正则表达式与文件格式化处理命令(awk/grep/sed) http://www.linuxidc.com/Linux/2013-03/81018.htm基础正则表达式 http://www.linuxidc.com/Linux/2014-09/106296.htm常用正则表达式整理 http://www.linuxidc.com/Linux/2014-10/108076.htm本文永久更新链接地址:http://www.linuxidc.com/Linux/2015-08/121155.htm