在用QTP做automation的时候总会遇到烦人的验证码,尝试从技术角度出发去解决,不知为什么QTP10之后对OCR的识别能力有所下降,无奈考虑其他的办法今天搜索了大量的网站,终于有了一些小成就,不过还是只能识别一些简单的验证码,当验证码中的干扰素多的情况下识别能力还是不够给力以下为代码+测试片段:
- from PIL import Image
- from pytesser import *
-
- def captcha(inputPic):
-
- img = Image.open(inputPic) # Your image here!
- img = img.convert("RGBA")
-
- pixdata = img.load()
-
- # Make the letters bolder for easier recognition
-
- for y in xrange(img.size[1]):
- for x in xrange(img.size[0]):
- if pixdata[x, y][0] < 90:
- pixdata[x, y] = (0, 0, 0, 255)
-
- for y in xrange(img.size[1]):
- for x in xrange(img.size[0]):
- if pixdata[x, y][1] < 136:
- pixdata[x, y] = (0, 0, 0, 255)
-
- for y in xrange(img.size[1]):
- for x in xrange(img.size[0]):
- if pixdata[x, y][2] > 0:
- pixdata[x, y] = (255, 255, 255, 255)
-
- img.save("c:\input-black.gif", "GIF")
-
- # Perform OCR using tesseract-ocr library
- return image_file_to_string("c:\input-black.gif")
-
- if __name__ == "__main__":
- print captcha("c:\untitled.bmp")
注:需要依赖于PIL和pyTesserPIL:http://www.pythonware.com/products/pil/pyTesser:http://code.google.com/p/pytesser/