Java正则表达式学习教程

本教程旨在帮助你驾驭Java正则表达式，同时也帮助我复习正则表达式。
什么是正则表达式?
正则表达式定义了字符串的模式。正则表达式可以用来搜索、编辑或处理文本。正则表达式并不仅限于某一种语言，但是在每种语言中有细微的差别。Java正则表达式和Perl的是最为相似的。
Java正则表达式的类在 java.util.regex 包中，包括三个类：Pattern,Matcher 和 PatternSyntaxException。
Pattern对象是正则表达式的已编译版本。他没有任何公共构造器，我们通过传递一个正则表达式参数给公共静态方法 compile 来创建一个pattern对象。
Matcher是用来匹配输入字符串和创建的 pattern 对象的正则引擎对象。这个类没有任何公共构造器，我们用patten对象的matcher方法，使用输入字符串作为参数来获得一个Matcher对象。然后使用matches方法，通过返回的布尔值判断输入字符串是否与正则匹配。
如果正则表达式语法不正确将抛出PatternSyntaxException异常。
让我们在一个简单的例子里看看这些类是怎么用的吧

package com.journaldev.util; import java.util.regex.Matcher;import java.util.regex.Pattern; public class RegexExamples { public static void main（String[] args） {// using pattern with flagsPattern pattern = Pattern.compile（"ab", Pattern.CASE_INSENSITIVE）;Matcher matcher = pattern.matcher（"ABcabdAb"）;// using Matcher find（）, group（）, start（） and end（） methodswhile （matcher.find（）） {System.out.println（"Found the text "" + matcher.group（）+ "" starting at " + matcher.start（）+ " index and ending at index " + matcher.end（））;} // using Pattern split（） methodpattern = Pattern.compile（"\W"）;String[] words = pattern.split（"one@two#three:four$five"）;for （String s : words） {System.out.println（"Split using Pattern.split（）: " + s）;} // using Matcher.replaceFirst（） and replaceAll（） methodspattern = Pattern.compile（"1*2"）;matcher = pattern.matcher（"11234512678"）;System.out.println（"Using replaceAll: " + matcher.replaceAll（"_"））;System.out.println（"Using replaceFirst: " + matcher.replaceFirst（"_"））;} }

既然正则表达式总是和字符串有关， Java 1.4对String类进行了扩展，提供了一个matches方法来匹配pattern。在方法内部使用Pattern和Matcher类来处理这些东西，但显然这样减少了代码的行数。
Pattern类同样有matches方法，可以让正则和作为参数输入的字符串匹配，输出布尔值结果。
下述的代码可以将输入字符串和正则表达式进行匹配。

String str = "bbb";System.out.println（"Using String matches method: "+str.matches（".bb"））;System.out.println（"Using Pattern matches method: "+Pattern.matches（".bb", str））;

所以如果你的需要仅仅是检查输入字符串是否和pattern匹配，你可以通过调用String的matches方法省下时间。只有当你需要操作输入字符串或者重用pattern的时候，你才需要使用Pattern和Matches类。
注意由正则定义的pattern是从左至右应用的，一旦一个原字符在一次匹配中使用过了，将不会再次使用。
例如，正则“121”只会匹配两次字符串“31212142121″，就像这样“_121____121″。
正则表达式通用匹配符号

Java正则表达式元字符

有两种方法可以在正则表达式中像一般字符一样使用元字符。
在元字符前添加反斜杠（）
将元字符置于Q（开始引用）和E（结束引用）间
正则表达式量词
量词指定了字符匹配的发生次数。

量词可以和character classes和capturing group一起使用。
例如，[abc]+表示a,b或c出现一次或者多次。
（abc）+表示capturing group “abc”出现一次或多次。我们即将讨论capturing group。
正则表达式capturing group
Capturing group是用来对付作为一个整体出现的多个字符。你可以通过使用（）来建立一个group。输入字符串中和capturing group相匹配的部分将保存在内存里，并且可以通过使用Backreference调用。
你可以使用matcher.groupCount方法来获得一个正则pattern中capturing groups的数目。例如（（a）（bc））包含3个capturing groups; （（a）（bc））, （a）和（bc）。
你可以使用在正则表达式中使用Backreference，一个反斜杠（）接要调用的group号码。
Capturing groups和Backreferences可能很令人困惑，所以我们通过一个例子来理解。

System.out.println（Pattern.matches（"（\w\d）\1", "a2a2"））; //trueSystem.out.println（Pattern.matches（"（\w\d）\1", "a2b2"））; //falseSystem.out.println（Pattern.matches（"（AB）（B\d）\2\1", "ABB2B2AB"））; //trueSystem.out.println（Pattern.matches（"（AB）（B\d）\2\1", "ABB2B3AB"））; //false

在第一个例子里，运行的时候第一个capturing group是（wd），在和输入字符串“a2a2″匹配的时候获取“a2″并保存到内存里。因此1是”a2”的引用，并且返回true。基于相同的原因，第二行代码打印false。
试着自己理解第三行和第四行代码。:）
现在我们来看看Pattern和Matcher类中一些重要的方法。
我们可以创建一个带有标志的Pattern对象。例如Pattern.CASE_INSENSITIVE可以进行大小写不敏感的匹配。Pattern类同样提供了和String类相似的split（String）方法
Pattern类toString（）方法返回被编译成这个pattern的正则表达式字符串。
Matcher类有start（）和end（）索引方法，他们可以显示从输入字符串中匹配到的准确位置。
Matcher类同样提供了字符串操作方法replaceAll（String replacement）和replaceFirst（String replacement）。
现在我们在一个简单的java类中看看这些函数是怎么用的。

package com.journaldev.util; import java.util.regex.Matcher;import java.util.regex.Pattern; public class RegexExamples { public static void main（String[] args） {// using pattern with flagsPattern pattern = Pattern.compile（"ab", Pattern.CASE_INSENSITIVE）;Matcher matcher = pattern.matcher（"ABcabdAb"）;// using Matcher find（）, group（）, start（） and end（） methodswhile （matcher.find（）） {System.out.println（"Found the text "" + matcher.group（）+ "" starting at " + matcher.start（）+ " index and ending at index " + matcher.end（））;} // using Pattern split（） methodpattern = Pattern.compile（"\W"）;String[] words = pattern.split（"one@two#three:four$five"）;for （String s : words） {System.out.println（"Split using Pattern.split（）: " + s）;} // using Matcher.replaceFirst（） and replaceAll（） methodspattern = Pattern.compile（"1*2"）;matcher = pattern.matcher（"11234512678"）;System.out.println（"Using replaceAll: " + matcher.replaceAll（"_"））;System.out.println（"Using replaceFirst: " + matcher.replaceFirst（"_"））;} }

上述程序的输出：

Found the text "AB" starting at 0 index and ending at index 2Found the text "ab" starting at 3 index and ending at index 5Found the text "Ab" starting at 6 index and ending at index 8Split using Pattern.split（）: oneSplit using Pattern.split（）: twoSplit using Pattern.split（）: threeSplit using Pattern.split（）: fourSplit using Pattern.split（）: fiveUsing replaceAll: _345_678Using replaceFirst: _34512678

这是不是一个很全面的Java正则表达式学习教程，希望对大家的学习有所帮助。