使用Python以及工具包进行简单的验证码识别

使用Python以及工具包进行简单的验证码识别,直接开始。原始图像 Step 1 打开图像吧。im = Image.open（"temp1.jpg"）Step 2 把彩色图像转化为灰度图像。彩色图像转化为灰度图像的方法很多，这里采用RBG转化到HSI彩色空间，采用I分量。imgry = im.convert（"L"）灰度看起来是这样的Step 3 需要把图像中的噪声去除掉。这里的图像比较简单，直接阈值化就行了。我们把大于阈值threshold的像素置为1，其他的置为0。对此，先生成一张查找表，映射过程让库函数帮我们做。threshold = 140
table = []
for i in range（256）:
if i < threshold:
table.append（0）
else:
table.append（1）阈值为什么是140呢？试出来的，或者参考直方图。映射过程为 out = imgry.point（table,"1"）此时图像看起来是这样的 Step 4 把图片中的字符转化为文本。采用pytesser 中的image_to_string函数text = image_to_string（out）Step 5 优化。根据观察，验证码中只有数字，并且上面的文字识别程序经常把8识别为S。因此，对于识别结果，在进行一些替换操作。#由于都是数字
#对于识别成字母的采用该表进行修正
rep={"O":"0",
"I":"1","L":"1",
"Z":"2",
"S":"8"
}; for r in rep:
text = text.replace（r,rep[r]）好了，text中为最终结果。7025
0195
7039
6716

程序需要PIL库和pytesser库支持。最后，整个程序看起来是这样的 import Image
import ImageEnhance
import ImageFilter
import sys
from pytesser import *# 二值化
threshold = 140
table = []
for i in range（256）:
if i < threshold:
table.append（0）
else:
table.append（1）#由于都是数字
#对于识别成字母的采用该表进行修正
rep={"O":"0",
"I":"1","L":"1",
"Z":"2",
"S":"8"
};def getverify1（name）:

#打开图片
im = Image.open（name）
#转化到亮度
imgry = im.convert（"L"）
imgry.save（"g"+name）
#二值化
out = imgry.point（table,"1"）
out.save（"b"+name）
#识别
text = image_to_string（out）
#识别对吗
text = text.strip（）
text = text.upper（）; for r in rep:
text = text.replace（r,rep[r]） #out.save（text+".jpg"）
print text
return text
getverify1（"v1.jpg"）
getverify1（"v2.jpg"）
getverify1（"v3.jpg"）
getverify1（"v4.jpg"）程序以及测试数据在这里**************************************************************下载在Linux公社的1号FTP服务器里，下载地址：FTP地址：ftp://www.linuxidc.com用户名：www.linuxidc.com密码：www.muu.cc在 2013年LinuxIDC.com1月使用Python以及工具包进行简单的验证码识别下载方法见 http://www.linuxidc.net/thread-1187-1-1.html**************************************************************