提取代码列表

提取代码列表2007-05-29 yycnet.yeah.net yyc译对于本书每一个完整的代码列表（不是代码段），大家无疑会注意到它们都用特殊的注释记号起始与结束（"//:"和"///:~"）。之所以要包括这种标志信息，是为了能将代码从本书自动提取到兼容的源码文件中。在我的前一本书里，我设计了一个系统，可将测试过的代码文件自动合并到书中。但对于这本书，我发现一种更简便的做法是一旦通过了最初的测试，就把代码粘贴到书中。而且由于很难第一次就编译通过，所以我在书的内部编辑代码。但如何提取并测试代码呢？这个程序就是关键。如果你打算解决一个文字处理的问题，那么它也很有利用价值。该例也演示了String类的许多特性。
我首先将整本书都以ASCII文本格式保存成一个独立的文件。CodePackager程序有两种运行模式（在usageString有相应的描述）：如果使用-p标志，程序就会检查一个包含了ASCII文本（即本书的内容）的一个输入文件。它会遍历这个文件，按照注释记号提取出代码，并用位于第一行的文件名来决定创建文件使用什么名字。除此以外，在需要将文件置入一个特殊目录的时候，它还会检查package语句（根据由package语句指定的路径选择）。
但这样还不够。程序还要对包（package）名进行跟踪，从而监视章内发生的变化。由于每一章使用的所有包都以c02，c03，c04等等起头，用于标记它们所属的是哪一章（除那些以com起头的以外，它们在对不同的章进行跟踪的时候会被忽略）——只要每一章的第一个代码列表包含了一个package，所以CodePackager程序能知道每一章发生的变化，并将后续的文件放到新的子目录里。
每个文件提取出来时，都会置入一个SourceCodeFile对象，随后再将那个对象置入一个集合（后面还会详尽讲述这个过程）。这些SourceCodeFile对象可以简单地保存在文件中，那正是本项目的第二个用途。如果直接调用CodePackager，不添加-p标志，它就会将一个“打包”文件作为输入。那个文件随后会被提取（释放）进入单独的文件。所以-p标志的意思就是提取出来的文件已被“打包”（packed）进入这个单一的文件。
但为什么还要如此麻烦地使用打包文件呢？这是由于不同的计算机平台用不同的方式在文件里保存文本信息。其中最大的问题是换行字符的表示方法；当然，还有可能存在另一些问题。然而，Java有一种特殊类型的IO数据流——DataOutputStream——它可以保证“无论数据来自何种机器，只要使用一个DataInputStream收取这些数据，就可用本机正确的格式保存它们”。也就是说，Java负责控制与不同平台有关的所有细节，而这正是Java最具魅力的一点。所以-p标志能将所有东西都保存到单一的文件里，并采用通用的格式。用户可从Web下载这个文件以及Java程序，然后对这个文件运行CodePackager，同时不指定-p标志，文件便会释放到系统中正确的场所（亦可指定另一个子目录；否则就在当前目录创建子目录）。为确保不会留下与特定平台有关的格式，凡是需要描述一个文件或路径的时候，我们就使用File对象。除此以外，还有一项特别的安全措施：在每个子目录里都放入一个空文件；那个文件的名字指出在那个子目录里应找到多少个文件。
下面是完整的代码，后面会对它进行详细的说明：

//: CodePackager.java// "Packs" and "unpacks" the code in "Thinking // in Java" for cross-platform distribution./* Commented so CodePackager sees it and starts a new chapter directory, but so you don"thave to worry about the directory where this program lives:package c17;*/import java.util.*;import java.io.*;class Pr {static void error（String e） {System.err.println（"ERROR: " + e）;System.exit（1）;}}class IO {static BufferedReader disOpen（File f） {BufferedReader in = null;try {in = new BufferedReader（new FileReader（f））;} catch（IOException e） {Pr.error（"could not open " + f）;}return in;}static BufferedReader disOpen（String fname） {return disOpen（new File（fname））;}static DataOutputStream dosOpen（File f） {DataOutputStream in = null;try {in = new DataOutputStream（new BufferedOutputStream（new FileOutputStream（f）））;} catch（IOException e） {Pr.error（"could not open " + f）;}return in;}static DataOutputStream dosOpen（String fname） {return dosOpen（new File（fname））;}static PrintWriter psOpen（File f） {PrintWriter in = null;try {in = new PrintWriter（new BufferedWriter（new FileWriter（f）））;} catch（IOException e） {Pr.error（"could not open " + f）;}return in;}static PrintWriter psOpen（String fname） {return psOpen（new File（fname））;}static void close（Writer os） {try {os.close（）;} catch（IOException e） {Pr.error（"closing " + os）;}}static void close（DataOutputStream os） {try {os.close（）;} catch（IOException e） {Pr.error（"closing " + os）;}}static void close（Reader os） {try {os.close（）;} catch（IOException e） {Pr.error（"closing " + os）;}}}class SourceCodeFile {public static final String startMarker = "//:", // Start of source fileendMarker = "} ///:~", // End of sourceendMarker2 = "}; ///:~", // C++ file endbeginContinue = "} ///:Continued",endContinue = "///:Continuing",packMarker = "###", // Packed file header tageol = // Line separator on current systemSystem.getProperty（"line.separator"）,filesep = // System"s file path separatorSystem.getProperty（"file.separator"）;public static String copyright = "";static {try {BufferedReader cr =new BufferedReader（new FileReader（"Copyright.txt"））;String crin;while（（crin = cr.readLine（）） ！= null）copyright += crin + "
";cr.close（）;} catch（Exception e） {copyright = "";}}private String filename, dirname,contents = new String（）;private static String chapter = "c02";// The file name separator from the old system:public static String oldsep;public String toString（） {return dirname + filesep + filename;}// Constructor for parsing from document file:public SourceCodeFile（String firstLine, BufferedReader in） {dirname = chapter;// Skip past marker:filename = firstLine.substring（startMarker.length（））.trim（）;// Find space that terminates file name:if（filename.indexOf（" "） ！= -1）filename = filename.substring（0, filename.indexOf（" "））;System.out.println（"found: " + filename）;contents = firstLine + eol;if（copyright.length（） ！= 0）contents += copyright + eol;String s;boolean foundEndMarker = false;try {while（（s = in.readLine（）） ！= null） {if（s.startsWith（startMarker））Pr.error（"No end of file marker for " +filename）;// For this program, no spaces before // the "package" keyword are allowed// in the input source code:else if（s.startsWith（"package"）） {// Extract package name:String pdir = s.substring（s.indexOf（" "））.trim（）;pdir = pdir.substring（0, pdir.indexOf（";"））.trim（）;// Capture the chapter from the package// ignoring the "com" subdirectories:if（！pdir.startsWith（"com"）） {int firstDot = pdir.indexOf（"."）;if（firstDot ！= -1）chapter = pdir.substring（0,firstDot）;elsechapter = pdir;}// Convert package name to path name:pdir = pdir.replace（".", filesep.charAt（0））;System.out.println（"package " + pdir）;dirname = pdir;}contents += s + eol;// Move past continuations:if（s.startsWith（beginContinue））while（（s = in.readLine（）） ！= null）if（s.startsWith（endContinue）） {contents += s + eol;break;}// Watch for end of code listing:if（s.startsWith（endMarker） || s.startsWith（endMarker2）） {foundEndMarker = true;break;}}if（！foundEndMarker）Pr.error（"End marker not found before EOF"）;System.out.println（"Chapter: " + chapter）;} catch（IOException e） {Pr.error（"Error reading line"）;}}// For recovering from a packed file:public SourceCodeFile（BufferedReader pFile） {try {String s = pFile.readLine（）;if（s == null） return;if（！s.startsWith（packMarker））Pr.error（"Can"t find " + packMarker+ " in " + s）;s = s.substring（packMarker.length（））.trim（）;dirname = s.substring（0, s.indexOf（"#"））;filename = s.substring（s.indexOf（"#"） + 1）;dirname = dirname.replace（oldsep.charAt（0）, filesep.charAt（0））;filename = filename.replace（oldsep.charAt（0）, filesep.charAt（0））;System.out.println（"listing: " + dirname + filesep + filename）;while（（s = pFile.readLine（）） ！= null） {// Watch for end of code listing:if（s.startsWith（endMarker） || s.startsWith（endMarker2）） {contents += s;break;}contents += s + eol;}} catch（IOException e） {System.err.println（"Error reading line"）;}}public boolean hasFile（） { return filename ！= null; }public String directory（） { return dirname; }public String filename（） { return filename; }public String contents（） { return contents; }// To write to a packed file:public void writePacked（DataOutputStream out） {try {out.writeBytes（packMarker + dirname + "#" + filename + eol）;out.writeBytes（contents）;} catch（IOException e） {Pr.error（"writing " + dirname + filesep + filename）;}}// To generate the actual file:public void writeFile（String rootpath） {File path = new File（rootpath, dirname）;path.mkdirs（）;PrintWriter p =IO.psOpen（new File（path, filename））;p.print（contents）;IO.close（p）;}}class DirMap {private Hashtable t = new Hashtable（）;private String rootpath;DirMap（） {rootpath = System.getProperty（"user.dir"）;}DirMap（String alternateDir） {rootpath = alternateDir;}public void add（SourceCodeFile f）{String path = f.directory（）;if（！t.containsKey（path））t.put（path, new Vector（））;（（Vector）t.get（path））.addElement（f）;}public void writePackedFile（String fname） {DataOutputStream packed = IO.dosOpen（fname）;try {packed.writeBytes（"###Old Separator:" +SourceCodeFile.filesep + "###
"）;} catch（IOException e） {Pr.error（"Writing separator to " + fname）;}Enumeration e = t.keys（）;while（e.hasMoreElements（）） {String dir = （String）e.nextElement（）;System.out.println（"Writing directory " + dir）;Vector v = （Vector）t.get（dir）;for（int i = 0; i < v.size（）; i++） {SourceCodeFile f = （SourceCodeFile）v.elementAt（i）;f.writePacked（packed）;}}IO.close（packed）;}// Write all the files in their directories:public void write（） {Enumeration e = t.keys（）;while（e.hasMoreElements（）） {String dir = （String）e.nextElement（）;Vector v = （Vector）t.get（dir）;for（int i = 0; i < v.size（）; i++） {SourceCodeFile f = （SourceCodeFile）v.elementAt（i）;f.writeFile（rootpath）;}// Add file indicating file quantity// written to this directory as a check:IO.close（IO.dosOpen（new File（new File（rootpath, dir）,Integer.toString（v.size（））+".files"）））;}}}public class CodePackager {private static final String usageString ="usage: java CodePackager packedFileName" +"
Extracts source code files from packed 
" +"version of Tjava.doc sources into " +"directories off current directory
" +"java CodePackager packedFileName newDir
" +"Extracts into directories off newDir
" +"java CodePackager -p source.txt packedFile" +"
Creates packed version of source files" +"
from text version of Tjava.doc";private static void usage（） {System.err.println（usageString）;System.exit（1）;}public static void main（String[] args） {if（args.length == 0） usage（）;if（args[0].equals（"-p"）） {if（args.length ！= 3）usage（）;createPackedFile（args）;}else {if（args.length > 2）usage（）;extractPackedFile（args）;}}private static String currentLine; private static BufferedReader in;private static DirMap dm;private static void createPackedFile（String[] args） {dm = new DirMap（）;in = IO.disOpen（args[1]）;try {while（（currentLine = in.readLine（）） ！= null） {if（currentLine.startsWith（SourceCodeFile.startMarker）） {dm.add（new SourceCodeFile（ currentLine, in））;}else if（currentLine.startsWith（SourceCodeFile.endMarker））Pr.error（"file has no start marker"）;// Else ignore the input line}} catch（IOException e） {Pr.error（"Error reading " + args[1]）;}IO.close（in）;dm.writePackedFile（args[2]）;}private static void extractPackedFile（String[] args） {if（args.length == 2） // Alternate directorydm = new DirMap（args[1]）;else // Current directorydm = new DirMap（）;in = IO.disOpen（args[0]）;String s = null;try { s = in.readLine（）;} catch（IOException e） {Pr.error（"Cannot read from " + in）;}// Capture the separator used in the system// that packed the file:if（s.indexOf（"###Old Separator:"） ！= -1 ） {String oldsep = s.substring（"###Old Separator:".length（））;oldsep = oldsep.substring（0, oldsep. indexOf（"#"））;SourceCodeFile.oldsep = oldsep;}SourceCodeFile sf = new SourceCodeFile（in）;while（sf.hasFile（）） {dm.add（sf）;sf = new SourceCodeFile（in）;}dm.write（）;}} ///:~

我们注意到package语句已经作为注释标志出来了。由于这是本章的第一个程序，所以package语句是必需的，用它告诉CodePackager已改换到另一章。但是把它放入包里却会成为一个问题。当我们创建一个包的时候，需要将结果程序同一个特定的目录结构联系在一起，这一做法对本书的大多数例子都是适用的。但在这里，CodePackager程序必须在一个专用的目录里编译和运行，所以package语句作为注释标记出去。但对CodePackager来说，它“看起来”依然象一个普通的package语句，因为程序还不是特别复杂，不能侦查到多行注释（没有必要做得这么复杂，这里只要求方便就行）。
头两个类是“支持／工具”类，作用是使程序剩余的部分在编写时更加连贯，也更便于阅读。第一个是Pr，它类似ANSI C的perror库，两者都能打印出一条错误提示消息（但同时也会退出程序）。第二个类将文件的创建过程封装在内，这个过程已在第10章介绍过了；大家已经知道，这样做很快就会变得非常累赘和麻烦。为解决这个问题，第10章提供的方案致力于新类的创建，但这儿的“静态”方法已经使用过了。在那些方法中，正常的违例会被捕获，并相应地进行处理。这些方法使剩余的代码显得更加清爽，更易阅读。
帮助解决问题的第一个类是SourceCodeFile（源码文件），它代表本书一个源码文件包含的所有信息（内容、文件名以及目录）。它同时还包含了一系列String常数，分别代表一个文件的开始与结束；在打包文件内使用的一个标记；当前系统的换行符；文件路径分隔符（注意要用System.getProperty（）侦查本地版本是什么）；以及一大段版权声明，它是从下面这个Copyright.txt文件里提取出来的：

//////////////////////////////////////////////////// Copyright （c） Bruce Eckel, 1998// Source code file from the book "Thinking in Java"// All rights reserved EXCEPT as allowed by the// following statements: You may freely use this file// for your own work （personal or commercial）,// including modifications and distribution in// executable form only. Permission is granted to use// this file in classroom situations, including its// use in presentation materials, as long as the book// "Thinking in Java" is cited as the source. // Except in classroom situations, you may not copy// and distribute this code; instead, the sole// distribution point is http://www.BruceEckel.com // （and official mirror sites） where it is// freely available. You may not remove this// copyright and notice. You may not distribute// modified versions of the source code in this// package. You may not use this file in printed// media without the express permission of the// author. Bruce Eckel makes no representation about// the suitability of this software for any purpose.// It is provided "as is" without express or implied// warranty of any kind, including any implied// warranty of merchantability, fitness for a// particular purpose or non-infringement. The entire// risk as to the quality and performance of the// software is with you. Bruce Eckel and the// publisher shall not be liable for any damages// suffered by you or any third party as a result of// using or distributing software. In no event will// Bruce Eckel or the publisher be liable for any// lost revenue, profit, or data, or for direct,// indirect, special, consequential, incidental, or// punitive damages, however caused and regardless of// the theory of liability, arising out of the use of// or inability to use software, even if Bruce Eckel// and the publisher have been advised of the// possibility of such damages. Should the software// prove defective, you assume the cost of all// necessary servicing, repair, or correction. If you// think you"ve found an error, please email all// modified files with clearly commented changes to:// Bruce@EckelObjects.com. （please use the same// address for non-code errors found in the book）.//////////////////////////////////////////////////

从一个打包文件中提取文件时，当初所用系统的文件分隔符也会标注出来，以便用本地系统适用的符号替换它。
当前章的子目录保存在chapter字段中，它初始化成c02（大家可注意一下第2章的列表正好没有包含一个打包语句）。只有在当前文件里发现一个package（打包）语句时，chapter字段才会发生改变。

1. 构建一个打包文件
第一个构建器用于从本书的ASCII文本版里提取出一个文件。发出调用的代码（在列表里较深的地方）会读入并检查每一行，直到找到与一个列表的开头相符的为止。在这个时候，它就会新建一个SourceCodeFile对象，将第一行的内容（已经由调用代码读入了）传递给它，同时还要传递BufferedReader对象，以便在这个缓冲区中提取源码列表剩余的内容。
从这时起，大家会发现String方法被频繁运用。为提取出文件名，需调用substring（）的过载版本，令其从一个起始偏移开始，一直读到字串的末尾，从而形成一个“子串”。为算出这个起始索引，先要用length（）得出startMarker的总长，再用trim（）删除字串头尾多余的空格。第一行在文件名后也可能有一些字符；它们是用indexOf（）侦测出来的。若没有发现找到我们想寻找的字符，就返回-1；若找到那些字符，就返回它们第一次出现的位置。注意这也是indexOf（）的一个过载版本，采用一个字串作为参数，而非一个字符。
解析出并保存好文件名后，第一行会被置入字串contents中（该字串用于保存源码清单的完整正文）。随后，将剩余的代码行读入，并合并进入contents字串。当然事情并没有想象的那么简单，因为特定的情况需加以特别的控制。一种情况是错误检查：若直接遇到一个startMarker（起始标记），表明当前操作的这个代码列表没有设置一个结束标记。这属于一个出错条件，需要退出程序。
另一种特殊情况与package关键字有关。尽管Java是一种自由形式的语言，但这个程序要求package关键字必须位于行首。若发现package关键字，就通过检查位于开头的空格以及位于末尾的分号，从而提取出包名（注意亦可一次单独的操作实现，方法是使用过载的substring（），令其同时检查起始和结束索引位置）。随后，将包名中的点号替换成特定的文件分隔符——当然，这里要假设文件分隔符仅有一个字符的长度。尽管这个假设可能对目前的所有系统都是适用的，但一旦遇到问题，一定不要忘了检查一下这里。
默认操作是将每一行都连接到contents里，同时还有换行字符，直到遇到一个endMarker（结束标记）为止。该标记指出构建器应当停止了。若在endMarker之前遇到了文件结尾，就认为存在一个错误。

2. 从打包文件中提取
第二个构建器用于将源码文件从打包文件中恢复（提取）出来。在这儿，作为调用者的方法不必担心会跳过一些中间文本。打包文件包含了所有源码文件，它们相互间紧密地靠在一起。需要传递给该构建器的仅仅是一个BufferedReader，它代表着“信息源”。构建器会从中提取出自己需要的信息。但在每个代码列表开始的地方还有一些配置信息，它们的身份是用packMarker（打包标记）指出的。若packMarker不存在，意味着调用者试图用错误的方法来使用这个构建器。
一旦发现packMarker，就会将其剥离出来，并提取出目录名（用一个"#"结尾）以及文件名（直到行末）。不管在哪种情况下，旧分隔符都会被替换成本地适用的一个分隔符，这是用String replace（）方法实现的。老的分隔符被置于打包文件的开头，在代码列表稍靠后的一部分即可看到是如何把它提取出来的。
构建器剩下的部分就非常简单了。它读入每一行，把它合并到contents里，直到遇见endMarker为止。

3. 程序列表的存取
接下来的一系列方法是简单的访问器：directory（）、filename（）（注意方法可能与字段有相同的拼写和大小写形式）和contents（）。而hasFile（）用于指出这个对象是否包含了一个文件（很快就会知道为什么需要这个）。
最后三个方法致力于将这个代码列表写进一个文件——要么通过writePacked（）写入一个打包文件，要么通过writeFile（）写入一个Java源码文件。writePacked（）需要的唯一东西就是DataOutputStream，它是在别的地方打开的，代表着准备写入的文件。它先把头信息置入第一行，再调用writeBytes（）将contents（内容）写成一种“通用”格式。
准备写Java源码文件时，必须先把文件建好。这是用IO.psOpen（）实现的。我们需要向它传递一个File对象，其中不仅包含了文件名，也包含了路径信息。但现在的问题是：这个路径实际存在吗？用户可能决定将所有源码目录都置入一个完全不同的子目录，那个目录可能是尚不存在的。所以在正式写每个文件之前，都要调用File.mkdirs（）方法，建好我们想向其中写入文件的目录路径。它可一次性建好整个路径。

4. 整套列表的包容
以子目录的形式组织代码列表是非常方便的，尽管这要求先在内存中建好整套列表。之所以要这样做，还有另一个很有说服力的原因：为了构建更“健康”的系统。也就是说，在创建代码列表的每个子目录时，都会加入一个额外的文件，它的名字包含了那个目录内应有的文件数目。
DirMap类可帮助我们实现这一效果，并有效地演示了一个“多重映射”的概述。这是通过一个散列表（Hashtable）实现的，它的“键”是准备创建的子目录，而“值”是包含了那个特定目录中的SourceCodeFile对象的Vector对象。所以，我们在这儿并不是将一个“键”映射（或对应）到一个值，而是通过对应的Vector，将一个键“多重映射”到一系列值。尽管这听起来似乎很复杂，但具体实现时却是非常简单和直接的。大家可以看到，DirMap类的大多数代码都与向文件中的写入有关，而非与“多重映射”有关。与它有关的代码仅极少数而已。
可通过两种方式建立一个DirMap（目录映射或对应）关系：默认构建器假定我们希望目录从当前位置向下展开，而另一个构建器让我们为起始目录指定一个备用的“绝对”路径。
add（）方法是一个采取的行动比较密集的场所。首先将directory（）从我们想添加的SourceCodeFile里提取出来，然后检查散列表（Hashtable），看看其中是否已经包含了那个键。如果没有，就向散列表加入一个新的Vector，并将它同那个键关联到一起。到这时，不管采取的是什么途径，Vector都已经就位了，可以将它提取出来，以便添加SourceCodeFile。由于Vector可象这样同散列表方便地合并到一起，所以我们从两方面都能感觉得非常方便。
写一个打包文件时，需打开一个准备写入的文件（当作DataOutputStream打开，使数据具有“通用”性），并在第一行写入与老的分隔符有关的头信息。接着产生对Hashtable键的一个Enumeration（枚举），并遍历其中，选择每一个目录，并取得与那个目录有关的Vector，使那个Vector中的每个SourceCodeFile都能写入打包文件中。
用write（）将Java源码文件写入它们对应的目录时，采用的方法几乎与writePackedFile（）完全一致，因为两个方法都只需简单调用SourceCodeFile中适当的方法。但在这里，根路径会传递给SourceCodeFile.writeFile（）。所有文件都写好后，名字中指定了已写文件数量的那个附加文件也会被写入。

5. 主程序
前面介绍的那些类都要在CodePackager中用到。大家首先看到的是用法字串。一旦最终用户不正确地调用了程序，就会打印出介绍正确用法的这个字串。调用这个字串的是usage（）方法，同时还要退出程序。main（）唯一的任务就是判断我们希望创建一个打包文件，还是希望从一个打包文件中提取什么东西。随后，它负责保证使用的是正确的参数，并调用适当的方法。
创建一个打包文件时，它默认位于当前目录，所以我们用默认构建器创建DirMap。打开文件后，其中的每一行都会读入，并检查是否符合特殊的条件：
（1）若行首是一个用于源码列表的起始标记，就新建一个SourceCodeFile对象。构建器会读入源码列表剩下的所有内容。结果产生的句柄将直接加入DirMap。
（2）若行首是一个用于源码列表的结束标记，表明某个地方出现错误，因为结束标记应当只能由SourceCodeFile构建器发现。

提取／释放一个打包文件时，提取出来的内容可进入当前目录，亦可进入另一个备用目录。所以需要相应地创建DirMap对象。打开文件，并将第一行读入。老的文件路径分隔符信息将从这一行中提取出来。随后根据输入来创建第一个SourceCodeFile对象，它会加入DirMap。只要包含了一个文件，新的SourceCodeFile对象就会创建并加入（创建的最后一个用光输入内容后，会简单地返回，然后hasFile（）会返回一个错误）。