Java utf-8文件处理bom头

UTF？UTF，是UnicodeTransformationFormat的缩写，意为Unicode转换格式。即怎样将Unicode定义的数字转换成程序数据。utf是对Unicode的一种编码格式化。 JVM里面的任何字符串资源都是Unicode，就是说，任何String类型的数据都是Unicode编码。没有例外。既然只有一种编码，那么，我们可以这么说，JVM里面的String是不带编码的。String相当于 char[]。JVM里面的 byte[] 数据是带编码的。比如，Big5，GBK，GB2312，UTF-8之类的（GBK并不属于utf）。一个GBK编码的byte[] 转换成 String，其实就是从GBK编码向Unicode编码转换。一个String转换成一个Big5编码的byte[]，其实就是从Unicode编码向Big5编码转换。我们在解析的时候就要注意是不是utf编码。有几种UTF？这里用char、char16_t、char32_t分别表示无符号8位整数，无符号16位整数和无符号32位整数。UTF-8、UTF-16、UTF-32分别以char、char16_t、char32_t作为编码单位。什么是bom？放在文件头用于标示Unicode编码格式。bom会引起什么问题？记事本保存的文件会存储bom，在解析的时候，在头部会多出一个乱码。如何解决：编程时根据具体的编码类型剔除头bom public static String ReadFile（String path,StringFilter filter） throws IOException { File file = new File（ path）; if （！ file.exists（）） { throw new IOException（ "文件不存在" ）; } BufferedReader reader = null; StringBuffer laststr = new StringBuffer（）; InputStream in= new FileInputStream（ file）; try { reader = new BufferedReader（ new UnicodeReader（in,"utf-8" ））; String tempString = null; while （（ tempString = reader.readLine（））！= null） { if （ filter！= null） { tempString= filter.RemoveString（ tempString）; } laststr.append（ tempString）; } reader.close（）; } catch （IOException e） { throw new IOException（ "文件读写错误" ）; } finally { if （ reader ！= null） { try { reader.close（）; } catch （IOException e1） { throw new IOException（ "文件流关闭错误" ）; } } } return laststr.toString（）; }本文永久更新链接地址：http://www.linuxidc.com/Linux/2016-04/129739.htm