Python学习之urlib模块和urllib2模块学习

一 urlib模块利用urllib模块可以打开任意个url。
1.
urlopen（）打开一个url返回一个文件对象，可以进行类似文件对象的操作。In [308]: import urllib In [309]: file=urllib.urlopen（"In [310]: file.readline（）Out[310]: "<！DOCTYPE html><！--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.com"/><link rel="dns-prefetch" href="//t10.baidu.com"/><link rel="dns-prefetch" href="//t11.baidu.com"/><link rel="dns-prefetch" href="//t12.baidu.com"/><link rel="dns-prefetch" href="//b1.bdstatic.com"/><title>xe7x99xbexe5xbaxa6xe4xb8
可以用read（）,readlines（）,fileno（）,close（）这些函数In [337]: file.info（）Out[337]: <httplib.HTTPMessage instance at 0x2394a70> In [338]: file.getcode（）Out[338]: 200 In [339]: file.geturl（）Out[339]: "http://www.baidu.com/"2.urlretrieve（）将url对应的html页面保存为文件In [404]: filename=urllib.urlretrieve（"http://www.baidu.com/",filename="/tmp/baidu.html"）In [405]: type （filename）Out[405]: <type "tuple"> In [406]: filename[0]Out[406]: "/tmp/baidu.html" In [407]: filenameOut[407]: （"/tmp/baidu.html", <httplib.HTTPMessage instance at 0x23ba878>） In [408]: filename[1]Out[408]: <httplib.HTTPMessage instance at 0x23ba878>
3.urlcleanup（）清除由urlretrieve（）产生的缓存In [454]: filename=urllib.urlretrieve（"http://www.baidu.com/",filename="/tmp/baidu.html"）In [455]: urllib.urlcleanup（）4.urllib.quote（）和urllib.quote_plus（）将url进行编码In [483]: urllib.quote（"http://www.baidu.com"）Out[483]: "http%3A//www.baidu.com" In [484]: urllib.quote_plus（"http://www.baidu.com"）Out[484]: "http%3A%2F%2Fwww.baidu.com"
5.urllib.unquote（）和urllib.unquote_plus（）将编码后的url解码In [514]: urllib.unquote（"http%3A//www.baidu.com"）Out[514]: "http://www.baidu.com" In [515]: urllib.unquote_plus（"http%3A%2F%2Fwww.baidu.com"）Out[515]: "http://www.baidu.com"
6.urllib.urlencode（）将url中的键值对以&划分，可以结合urlopen（）实现POST方法和GET方法In [560]: import urllibIn [561]: params=urllib.urlencode（{"spam":1,"eggs":2,"bacon":0}）In [562]: f=urllib.urlopen（"http://python.org/query？%s" %params）In [563]: f.readline（）Out[563]: "<！doctype html> " In [564]: f.readlines（）Out[564]:["<！--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <！[endif]--> ", "<！--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <！[endif]--> ", "<！--[if IE 8]> <html class="no-js ie8 lt-ie9"> <！[endif]--> ", "<！--[if gt IE 8]><！--><html class="no-js" lang="en" dir="ltr"> <！--<！[endif]--> ", " ",二 urllib2模块urllib2比urllib多了些功能，例如提供基本的认证，重定向，cookie等功能https://docs.python.org/2/library/urllib2.htmlhttps://docs.python.org/2/howto/urllib2.htmlIn [566]: import urllib2 In [567]: f=urllib2.urlopen（"http://www.python.org/"） In [568]: print f.read（100）--------> print（f.read（100））<！doctype html><！--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <！[endif]-->
打开python的官网并返回头100个字节内容HTTP基于请求和响应，客户端发送请求，服务器响应请求。urllib2使用一个Request对象代表发送的请求，调用urlopen（）打开Request对象可以返回一个response对象。reponse对象是一个类似文件的对象，可以像文件一样进行操作In [630]: import urllib2 In [631]: req=urllib2.Request（"http://www.baidu.com"） In [632]: response=urllib2.urlopen（req） In [633]: the_page=response.read（） In [634]: the_pageOut[634]: "<！DOCTYPE html><！--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.通常情况下需要向一个url以POST的方式发送数据。In [763]: import urllib In [764]: import urllib2 In [765]: url="http://xxxxxx/login.php" In [766]: values={"ver" : "1.7.1", "email" : "xxxxx", "password" : "xxxx", "mac" : "111111111111"} In [767]: data=urllib.urlencode（values） In [768]: req=urllib2.Request（url,data） In [769]: response=urllib2.urlopen（req） In [770]: the_page=response.read（） In [771]: the_page如果不使用urllib2.Request（）发送data参数，urllib2使用GET请求，GET请求和POST请求差别在于POST请求常有副作用，POST请求会通过某些方式改变系统的状态。也可以通过GET请求发送数据。In [55]: import urllib2 In [56]: import urllib In [57]: url="http://xxx/login.php" In [58]: values={"ver" : "xxx", "email" : "xxx", "password" : "xxx", "mac" : "xxx"} In [59]: data=urllib.urlencode（values） In [60]: full_url=url + "？" + data In [61]: the_page=urllib2.urlopen（full_url） In [63]: the_page.read（）Out[63]: "{"result":0,"data":0}" 默认情况下,urllib2使用Python-urllib/2.6 表明浏览器类型，可以通过增加User-Agent HTTP头In [107]: import urllib In [108]: import urllib2 In [109]: url="http://xxx/login.php" In [110]: user_agent="Mozilla/4.0 （compatible; MSIE 5.5; Windows NT）" In [111]: values={"ver" : "xxx", "email" : "xxx", "password" : "xxx", "mac" : "xxxx"} In [112]: headers={"User-Agent" : user_agent} In [114]: data=urllib.urlencode（values） In [115]: req=urllib2.Request（url,data,headers） In [116]: response=urllib2.urlopen（req） In [117]: the_page=response.read（） In [118]: the_page当给定的url不能连接时，urlopen（）将报URLError异常，当给定的url内容不能访问时，urlopen（）会报HTTPError异常#/usr/bin/python from urllib2 import Request,urlopen,URLError,HTTPErrorreq=Request（"http://10.10.41.42/index.html"）try: response=urlopen（req）except HTTPError as e: print "The server couldn"t fulfill the request." print "Error code:",e.code except URLError as e: print "We failed to fetch a server." print "Reason:",e.reasonelse: print "Everything is fine"这里需要注意的是在写异常处理时，HTTPError必须要写在URLError前面#/usr/bin/python from urllib2 import Request,urlopen,URLError,HTTPErrorreq=Request（"http://10.10.41.42"）try: response=urlopen（req） except URLError as e: if hasattr（e,"reason"）: print "We failed to fetch a server." print "Reason:",e.reason elif hasattr（e,"code"）: print "The server couldn"t fulfill the request." print "Error code:",e.codeelse: print "Everything is fine"hasattr（）函数判断一个对象是否有给定的属性《Python开发技术详解》.（周伟,宗杰）.[高清PDF扫描版+随书视频+代码] http://www.linuxidc.com/Linux/2013-11/92693.htmPython脚本获取Linux系统信息 http://www.linuxidc.com/Linux/2013-08/88531.htmPython下使用MySQLdb模块 http://www.linuxidc.com/Linux/2012-06/63620.htmPython 的详细介绍：请点这里
Python 的下载地址：请点这里本文永久更新链接地址：http://www.linuxidc.com/Linux/2014-07/104697.htm