Python进阶强化训练之字符串处理技巧

如何拆分含有多种分隔符的字符串？

实际案例

解决方案

连续使用split（）方法，每次处理一种分隔符

# 使用Python2def mySplit（s,ds）:res = [s]for d in ds:t = []map（lambda x: t.extend（x.split（d））, res）res = treturn [x for x in res if x]s = "asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd"result = mySplit（s, ";,|	"）print（result）

C:UsersAdministrator>C:PythonPython27python.exe E:python-intensive-trainings2.py["asd", "aad", "dasd", "dasd", "sdasd", "asd", "Adas", "sdasd", "Asdasd", "d", "asd"]

使用正则表达式的re.split（）方法，一次性拆分字符串

>>>import re>>>re.split（"[,;	|]+","asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd"）["asd", "aad", "dasd", "dasd", "sdasd", "asd", "Adas", "sdasd", "Asdasd", "d", "asd"]

如何判断字符串a是否以字符串b开头或结尾？

实际案例

如某目录有如下文件：quicksort.cgraph.pyheap.javainstall.shstack.cpp......现在需要给.sh和.py结尾的文件夹上可执行权限

解决方案

使用字符串的startswith（）和endswith（）方法

>>> import os, stat>>> os.listdir（"./"）["heap.java", "quicksort.c", "stack.cpp", "install.sh", "graph.py"]>>> [name for name in os.listdir（"./"） if name.endswith（（".sh",".py"））] ["install.sh", "graph.py"]>>> os.chmod（"install.sh", os.stat（"install.sh"）.st_mode | stat.S_IXUSR）

[root@iZ28i253je0Z t]# ls -l install.sh -rwxr--r-- 1 root root 0 Sep 15 18:13 install.sh

如何调整字符串中文本的格式？

实际案例

某软件的日志文件，其中日期格式为yyy-mm-dd:

2016-09-15 18:27:26 statu unpacked python3-pip:all2016-09-15 19:27:26 statu half-configured python3-pip:all2016-09-15 20:27:26 statu installd python3-pip:all2016-09-15 21:27:26 configure asdasdasdas:all python3-pip:all

需要把其中日期改为美国日期的格式mm/dd/yyy,2016-09-15 --> 09/15/2016,要如何处理？

解决方案

使用正则表达式re.sub（）方法做字符串替换

利用正则表达式的捕获组，捕获每个部分内容，在替换字符串中各个捕获组的顺序。

>>>log = "2016-09-15 18:27:26 statu unpacked python3-pip:all">>>import re# 按顺序>>>re.sub（"（d{4}）-（d{2}）-（d{2}）", r"2/3/1" , log）"09/15/2016 18:27:26 statu unpacked python3-pip:all"# 使用正则表达式的分组>>>re.sub（"（？P<year>d{4}）-（？P<month>d{2}）-（？P<day>d{2}）", r"g<month>/g<day>/g<year>" , log）"09/15/2016 18:27:26 statu unpacked python3-pip:all"

如何将多个小字符串拼接成一个大的字符串？

实际案例

在设计某网络程序时，我们自定义了一个基于UDP的网络协议，按照固定次序向服务器传递一系列参数：

hwDetect: "<0112>"gxDepthBits:"<32>"gxResolution: "<1024x768>"gxRefresh:"<60>"fullAlpha:"<1>"lodDist:"<100.0>"DistCull: "<500.0>"

在程序中我们将各个参数按次序收集到列表中：["<0112>","<32>","<1024x768>","<60>","<1>","<100.0>","<500.0>"]最终我们要把各个参数拼接成一个数据包进行发送："<0112><32><1024x768><60><1><100.0><500.0>"

结局方案

迭代列表，连续使用"+"操作依次拼接每一个字符串

>>> for n in ["<0112>","<32>","<1024x768>","<60>","<1>","<100.0>","<500.0>"]:...result += n...>>> result"<0112><32><1024x768><60><1><100.0><500.0>"

使用str.join（）方法，更加快速的拼接列表中所有字符串

>>> result = "".join（["<0112>","<32>","<1024x768>","<60>","<1>","<100.0>","<500.0>"]）>>> result"<0112><32><1024x768><60><1><100.0><500.0>"

如果列表中有数字，可以使用生成器进行转换:>>>hello = [222,"sd",232,"2e",0.2]>>>"".join（str（x） for x in hello）"222sd2322e0.2"

如何对字符串进行左, 右, 居中对齐？

实际案例

某个字典中存储了一系列属性值：{"ip":"127.0.0.1","blog": "www.anshengme.com","title": "Hello world","port": "80"}在程序中，我们想以以下格式将其内容输出，如何处理？ip: 127.0.0.1 blog: www.anshengme.com title : Hello world port: 80

解决方案

使用字符串的str.ljust（）,str.rjust,str.cente（）进行左右居中对齐

>>>info = {"ip":"127.0.0.1","blog": "www.anshengme.com","title": "Hello world","port": "80"}# 获取字典中的keys最大长度>>>max（map（len, info.keys（）））5>>>w = max（map（len, info.keys（）））>>>for k in info:... print（k.ljust（w）, ":",info[k]）...# 获取到的结果port: 80blog: www.anshengme.comip: 127.0.0.1title : Hello world

使用format（）方法，传递类似"<20",">20","^20"参数完成同样任务

>>>for k in info:... print（format（k,"^"+str（w））, ":",info[k]）...port: 80blog: www.anshengme.com ip : 127.0.0.1title : Hello world

如何去掉字符串中不需要的字符？

实际案例

过滤掉用户输入卡后多余的空白字符: anshengm.com@gmail.com
过滤某windows下编辑文本中的" ": hello word
去掉文本中的unicode组合符号（音调）: "ni? ha?o, chi? fa?n"

解决方案

字符串strip（）,lstrip（）,rstrip（）方法去掉字符串两端字符

>>>email = "anshengm.com@gmail.com ">>>email.strip（）"anshengm.com@gmail.com">>>email.lstrip（）"anshengm.com@gmail.com ">>>email.rstrip（）"anshengm.com@gmail.com">>>

删除某个固定位置的字符，可以使用切片+拼接的方法

>>>s[:3] + s[4:]"abc123"

字符串的replace（）方法或正则表达式re.sub（）删除任意位置字符

>>>s = " abc 123 xyz">>>s.replace（" ", ""）"abc123xyz"使用re.sub（）删除多个

>>>import re>>>re.sub（"[	
]","", string）"abc123xyzopq"

字符串translate（）方法，可以同时删除多种不同字符

>>>import string>>>s = "abc123xyz">>>s.translate（string.maketrans（"abcxyz","xyzabc"））"xyz123abc"

>>>s = "
asd	23Ads">>>s.translate（None, "
	"） "asd23Ads"

# python2.7>>>i = u"ni? ha?o, chi? fa?n">>>iu"niu0301 hau030co, chiu0304 fau0300n">>>i.translate（dict.fromkeys（[0x0301, 0x030c, 0x0304, 0x0300]））u"ni hao, chi fan"

Ubuntu 14.04安装Python 3.3.5 http://www.linuxidc.com/Linux/2014-05/101481.htmCentOS上源码安装Python3.4 http://www.linuxidc.com/Linux/2015-01/111870.htm《Python核心编程第二版》.（Wesley J. Chun ）.[高清PDF中文版] http://www.linuxidc.com/Linux/2013-06/85425.htm《Python开发技术详解》.（周伟,宗杰）.[高清PDF扫描版+随书视频+代码] http://www.linuxidc.com/Linux/2013-11/92693.htmPython脚本获取Linux系统信息 http://www.linuxidc.com/Linux/2013-08/88531.htm在Ubuntu下用Python搭建桌面算法交易研究环境 http://www.linuxidc.com/Linux/2013-11/92534.htmPython 语言的发展简史 http://www.linuxidc.com/Linux/2014-09/107206.htmPython 的详细介绍：请点这里
Python 的下载地址：请点这里本文永久更新链接地址：http://www.linuxidc.com/Linux/2016-09/135215.htm