python 4-1-2 正则表达式一张图清晰归纳和实现细节

python 4-1-2 正则表达式一张图清晰归纳和实现细节

ben1949 2017-01-18 08:50:55

1892

收藏

分类专栏: python实战

版权

python 正则表达式一张图清晰归纳和实现细节

正则表达式一定要和Linux Shell 通配符需要分开,不然会很混乱的
Linux shell通配符 * ? [a-z] {“a”,”x”} [!a-z] \

-- coding: utf-8 --

import re
from string import ljust

1.1 re.match首先是从字符串的头开始匹配的,如果匹配不到,就失败了,如果匹配不到,访问m.group(),AttributeError:

regx = re.compile("abc")m11 = re.match(regx,"abcdefg")print "m11 match is ",m11.group() 123

1.2 . 匹配除\n之外所有字符(.|\n)

m12 = re.match("ab(.|\n)cd","ab\ncd")print "m12 match is ", m12.group()12

1.3 \ 转义字符 ,可以用* []

m13 = re.match('\*',"*abc") m13 = re.match('[*]',"*abc")print "m13 match is ",m13.group()123

1.4 […]字符集 匹配字符中任意一个字符,[^abc]内的^代表不包含[]内的字符集

m14 = re.match("[^abcdef]","zabcxyz")print "m14 match is ",m14.group()12

2.1 预定义字符 \d 等价于 [0-9]

m21 = re.match("\d","321a")print "m21 match is ",m21.group()123

2.2 预定义字符\D 等价于[^0-9]

m22 = re.match("\D","a321")print "m22 match is ",m22.group()12

2.3 \s 空白字符 等价于[\t\r\n\f\v]

m23 = re.match("\s","    abc")print "m23 match is ",m23.group()12

2.4 \S 非空白字符 等价于[^\s]

m24 = re.match("\S","abc")print "m24 match is ",m24.group()12

2.5 单词字符 \w [A-Za-z0-9]

m25 = re.match("\w","abc")print "m25 match is ",m25.group()12

2.6 \W 等价于 [^\w]

m26 = re.match("\W","?abc")print "m26 match is ",m26.group()12

3.1 * 匹配前一个字符0次或者多次,linux shell 通配符*代表任意字符并且数量不限

m31 = re.match("ab*","abbbbbbbbbbc")print "m31.match is ",m31.group()m311 = re.match("ab*","a")print "m311 match is ",m311.group() 1234

3.2 + 匹配前一个字符一次或者多次

m32 = re.match("ab+","abbbbbbbbbbc")print "m32 match is ",m32.group()m321 = re.match("ab+","ab")print "m321 match is ",m321.group()12345

3.3 ?匹配前一个字符0次或者1次

m33 = re.match("ab?","a")print "m33 match is ",m33.group()12

3.4 {m} 匹配前一个字符m次

m34 = re.match("a{5}","aaaaab")print "m34 match is ",m34.group()12

3.5 {m,n}匹配前一个字符每m次到n次至少m次,至多n次,m < n

m35 = re.match("a{3,5}","aaabbb")print "m35 match is ",m35.group()12

4.1 ^匹配^后面一个字符开头的字符串

m41 = re.match("^abc","abcdef")print "m41 is ",m41.group()123

4.2 匹配以前面一个字符结尾的字符串

m42 = re.search(r"c$","2abc")print "m42 match is ",m42.group()12

4.3 \A 匹配一后面一个字符开头的字符串

m43 = re.match("\Aa","abc")print "m43 match is ",m43.group()12

4.4 \Z以前一个字符结束的字符串

m44 = re.search(r"c\Z","abc")print "m44 match is ",m44.group()123

4.5 \b 匹配前面一个字符\w且后面一个字符\W的字符串

m45 = re.match(r"a\\bc","a?c")#print "m45 is ",m45.group()1234

4.6 [^\b]

5.1 | 匹配|左右两边任意一串字符串

m51 = re.match("abc|abd","abdxyz")print "m51 match is ",m51.group()12

5.2 () 作为分组匹配

m52 = re.match("(abc)","abcxabc")print "m52 match is ",m52.group()12

5.3 (?P )分组,除原有编号外指定一个名为name的别名

m53 = re.match("(?P<name>123)","123")print "m53 match is ",m53.group()123

5.4 \ r”(abc)-(\1)” 将编号为number的分组匹配到字符串

m54 = re.match("(abc)-\\1","abc-abc")print "m54 match is ",m54.group()m541 = re.match(r"(abc)-(\1)","abc-abc")print "m541 match is ",m541.group()12345

5.5 (?P)(?P=name)分组 将别名为name的分组匹配到字符串

m55 = re.match("(?P<name>abc)-(?P=name)","abc-abc")print "m55 match is ",m55.group()12

6.1 (?#..) #后面的作为注释,

m61 = re.match("asb(?#iambaby)123","asb123")print "m61 match is ",m61.group()1234

6.2 (?= ..) 前一个字符等于后一个字符才能匹配

6.3 (?!…)

6.4 (?<= …)

6.5 (?

6.6 (?(id/name)yes-pattern/no-pattern)

7.1 贪婪模式 m71.group(1) 打印出来是6-789-123 本来期望123456-789-123,是因为.+ 匹配了大部分数字

m71 = re.match(".+(\d+-\d+-\d+)","abcedfasdfa;lasdfjasdfkasdf::123456-789-123")print "m71 match is ",m71.group(1)12

m7.2 非贪婪模式 .+? *? ?? {m,n}? 非贪婪模式就变成了非贪婪模式打印出来是123456-789-123

m72 = re.match(".+?(\d+-\d+-\d+)","abcedfasdfa;lasdfjasdfkasdf::123456-789-123")print "m72 matchis is ",m72.group(1)123

8.1 返回pattern对象

p = re.compile("abc")m81 = re.match(p, "abc")print "m81 match is ",m81.group() 123

8.2 re.match(pattern,string,flags)

m82 = re.match("(ben1949)-(\d{4}-\d{2}-\d{2})-(\\1)","ben1949-2017-01-18-ben1949xxxyyywww")print "m82 match is ",m82.group(1),m82.group(2)12

8.3 re.search(pattern,string,flags)

m83 = re.search("(ben1949)-(\d{4}-\d{2}-\d{2})-(\\1)","aaaaaaben1949-2017-01-18-ben1949xxxyyywwwaaaaaa")print "m83 match is ",m83.group(1),m83.group(2)123

8.4 re.split(pattern,string)

str1 = "ben1949-2017-01-18-ben1949xxxyyywww"m84 = re.split("-",str1)print "m84 is ",m841234

8.5 re.findall(pattern,string,flags)

m85 = re.findall("-", str1)print "m85 is ",m85123

8.6 re.finditer(pattern,string,flags) 返回的是迭代器,sre.SRE_Match object at 0x0251F480

m86 = re.finditer("-", str1)for m in m86:    print "m86 is ",m.group()12345

8.7 re.sub(pattern,repl,string),将书籍卖价都提高2.02

def func(m):    #print "func was called"    price = float(m.group(2))    price += 2.02    return "%s%s"%(ljust(m.group(1).strip(),10),price)str3 = ["english  100.0","china       120.0"]m87 = []for i in xrange(2):    m87.append(re.sub(r"(\w+\s+)(\d+\.?\d?)",func,str3[i]))for i in xrange(2):    print "m87 is %s"%(m87[i])12345678910111213

8.8 re.subn(pattern,repl,string)

m88 = []for i in xrange(2):    m88.append(re.sub(r"(\w+\s+)(\d+\.?\d?)",func,str3[i]))for i in xrange(2):    print "m88 is %s"%(m88[i])#8.9 (?P<name>...) "(?P<year>\d{4})-(?P<month>\d{2})-(?P<date>\d{2})","\g<date>-\g<month>-\g<year>","2017-01-18"print re.sub("(?P<year>\d{4})-(?P<month>\d{2})-(?P<date>\d{2})","\g<date>-\g<month>-\g<year>","2017-01-18")print re.sub("(?P<year>\d{4})-(?P<month>\d{2})-(?P<date>\d{2})","\g<date>-\g<month>-\g<year>","2017-01-18")12345678910111213141516

8.9 (?P…) “(?P\d{4})-(?P\d{2})-(?P\d{2})”,”\g-\g-\g”,”2017-01-18”

print re.sub("(?P<year>\d{4})-(?P<month>\d{2})-(?P<date>\d{2})","\g<date>-\g<month>-\g<year>","2017-01-18")print re.sub("(\d{4})-(\d{2})-(\d{2})", r"\2/\3/\1", "2017-01-18")\2因为存在转义字符,因此我们需要用r 表示原始字符,避免使用了转义后的字符
(0)

相关推荐