python基础13-正则表达式

正则表达式是我们运维工作中比较常用，在工作中用的比较多的应该就是配合grep命令从某个文本中匹配出我们需要的内容，Python中集成了正则模块，可以直接调用正则调用来实现正则匹配，今天这篇我们就来介绍下Python的正则使用，因为正则表达式系统的展开说都够写一本书了，因为咱们主要介绍Python, 所以在这里我只介绍在python下的使用，不会系统的介绍什么是正则表达式和正则表达式的规则，不过本篇中的所有正则内容我都会解释，保证你看这篇文章无任何障碍。

python中集成正则模块re，所以我们使用就是Import re就可以了，import后就可以直接使用re模块的函数了，我们先来看第一个re.match的用法：

re.match函数从字符串的起始部分进行模式匹配，如果匹配成功就返回匹配对象，看个例子：

>>> line = "please think it"                   
>>> m = re.match('please', line)
>>> m.group()
'please'

>>> line = "please think it"

>>> m = re.match('please', line)

>>> m.group()

'please'

其中m是返回的匹配对象，group()是匹配对象的方法，返回整个匹配对象，刚才说了mach只能匹配字符串的开头，所以如果可以看一个匹配不成功的例子：

>>> line = "think it please"        
>>> m = re.match('please', line)
>>> m.group()                   
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
>>> print m
None

>>> line = "think it please"

>>> m = re.match('please', line)

>>> m.group()

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

AttributeError: 'NoneType' object has no attribute 'group'

>>> print m

None

大家可以看到匹配不成功是,m就变成了None.

其实，我们大多数搜索特定字符串都在一个字符串的中间部门，这时候match就不行了，这时候就需要用到search()了，它的工作方式跟match一样，不同的是它可以搜索字符串的任意位置，匹配到就算成功，还用刚才的例子：

>>> line = "think it please"
>>> m = re.search('please', line, re.M|re.I)     
>>> m.group()
'please'

>>> line = "think it please"

>>> m = re.search('please', line, re.M|re.I)

>>> m.group()

'please'

显示匹配成功，这里顺便介绍2个参数：

re.M:表示^和$分别匹配目标字符串的起始和结尾，而不是严格匹配整个字符串本身的起始和结束。

re.I表示不区分大小写

search()是搜索字符串中第一次出现的正则表达式的模式，所以让我们稍微修改下上面的例子：

>>> line = "think it please, think it please"
>>> m = re.search('please', line, re.M|re.I)
>>> m.group()
'please'

>>> line = "think it please, think it please"

>>> m = re.search('please', line, re.M|re.I)

>>> m.group()

'please'

明明有二个please，为什么只匹配了一个，这个没错，看我们刚说的功能介绍就明白了，那如果我想要都匹配呢，那就要用另一个函数findal()了，这个函数查找字符串中所有不是重复出现的正则模式，然后返回一个列表，看清楚，它是直接返回一个列表，还是刚才的例子：

>>> m = re.findall('please', line, re.M|re.I)
>>> m
['please', 'please']

>>> m = re.findall('please', line, re.M|re.I)

>>> m

['please', 'please']

除了字符串查询匹配，还有个一个分割函数也非常有用split，我们都知道字符串本身也带一个split方法，也能进行字符串分割，那它们有什么不同呢，其实如果不适用特殊的正则表达式进行模式匹配，它们的工作方式相同，但re.split适用于更复杂的的分割处理，我们来个例子看看re.split的厉害之处：

>>> line = 'There is a cat on the table'
>>> print line.split(' ')
['There', 'is', 'a', 'cat', 'on', 'the', 'table']
>>> print re.split(r'\s*', 'There is a cat on the table')
['There', 'is', 'a', 'cat', 'on', 'the', 'table']

>>> line = 'There is a cat on the table'

>>> print line.split(' ')

['There', 'is', 'a', 'cat', 'on', 'the', 'table']

>>> print re.split(r'\s*', 'There is a cat on the table')

['There', 'is', 'a', 'cat', 'on', 'the', 'table']

这么看它们基本是的功能是一样的，正则里\s表示空格，*表示0个或多个，但我们遇到一个不完全规则的字符串，比如以不固定的字符或数字来分割，例如：line = ‘abc12def34ghi’, 要求这个以2个数字来分割，这个时候就要用到re.split了，看个例子：

>>> line = 'abc12def34ghi'          
>>> print re.split(r'\d\d', line)
['abc', 'def', 'ghi']

>>> line = 'abc12def34ghi'

>>> print re.split(r'\d\d', line)

['abc', 'def', 'ghi']

正则里\d匹配任何十进制数字。

关于re的函数方法我们就介绍到这里，接下来我们说一下匹配对象的group和groups方法的区别，因为这2个会使人比较迷惑，我们来看个网上例子：

>>> m = re.match(r"(..)+", "a1b2c3")
>>> m.group(1) 
'c3'
>>> m.group(0)
'a1b2c3'
>>> m.groups()
('c3',)

>>> m = re.match(r"(..)+", "a1b2c3")

>>> m.group(1)

'c3'

>>> m.group(0)

'a1b2c3'

>>> m.groups()

('c3',)

首先这例子里有2个问题需要去理解：

1.首先是match的问题.match是从开头匹配,为什么会匹配到c3呢?
2.group(0)是整个匹配项,为什么groups()中没有呢?

这里我说一下我的解释，match匹配从头匹配没错，但如果加上数字参数，它就可以显示特定子组的，第二个问题，没有的原因是groups()是列出从编号1开始的所有group，所以没有group(0), 因为这里只有一个分组，所以编号1就是c3,官方解释：

Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern. The default argument is used for groups that did not participate in the match; it defaults to None.

1	Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern. The default argument is used for groups that did not participate in the match; it defaults to None.

写在最后，grep命令是我们用来过滤某些字段信息的必备工具，我们就来实现一个简单的grep功能的脚本，代码如下：

import re
import argparse

def Main():
    parser = argparse.ArgumentParser()
    parser.add_argument('word', help='specify word to search for')
    parser.add_argument('fname', help='specify file to search')
    args = parser.parse_args()

    searchFile = open(args.fname)
    lineNum = 0
    
    for line in searchFile.readlines():
        line = line.strip('\n\r')
        lineNum += 1
        searchResult = re.search(args.word, line, re.M|re.I)
        if searchResult:
           print(str(lineNum) + ': ' + line)


if __name__ == '__main__':
    Main()

import re

import argparse

def Main():

parser = argparse.ArgumentParser()

parser.add_argument('word', help='specify word to search for')

parser.add_argument('fname', help='specify file to search')

args = parser.parse_args()

searchFile = open(args.fname)

lineNum = 0

for line in searchFile.readlines():

line = line.strip('\n\r')

lineNum += 1

searchResult = re.search(args.word, line, re.M|re.I)

if searchResult:

print(str(lineNum) + ': ' + line)

if __name__ == '__main__':

Main()

这个脚本配合argparse模块，可以指定关键字和要搜索的文件内容，最后显示出匹配的行。

截止目前我们已经了解了Python大部分的功能，一路跟过来的小伙伴我想也应该能写些自己的脚本了，所以基本的内容就到这里了，下篇开始我们要讲解python的面向对象编程了，下篇见。

M	T	W	T	F	S	S
« Jul
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

头脑的思考

头脑的思考

python基础13-正则表达式

发表评论取消回复

发表评论 取消回复

发表评论取消回复