[Python] 正規表示法 Regular Expression

程式語言：Python
Package：re
re 官方文件

debug online：
Debuggex
regex101

功能：處理匹配字串

import re
re.search(pattern, string)

語法

-------------------------------------------------------------------------------------

'.'

匹配除「\n」之外的任何單個字符
例： a.c 匹配 abc1234 => abc

'^'

匹配輸入字串的開始位置
設定 RegExp 物件的 Multiline 屬性，可匹配「\n」或「\r」之後的位置
例： ^ab 匹配 abc1234 => ab

'$'

匹配輸入字串的結束位置
設定 RegExp 物件的 Multiline 屬性，可匹配「\n」或「\r」之前的位置
例： 34$ 匹配 abc1234 => 34

'*'

匹配至少零次
例： ac* 匹配 abc1234 => a

'+'

匹配至少一次
例： ab.+ 匹配 abc1234 => abc1234

'?'

匹配零次或一次
例： ab? 匹配 abc1234 => ab

'*?' '+?' '??'

匹配模式是 non-greedy，符合的最少字串，預設為 greedy
可用在（*，+，?，{n}，{n,}，{n,m}）後面
例： a+? 匹配 aaaaa => a

{m}

匹配 m 次
例： a{3} 匹配 aaaaa => aaa

{m,n}

匹配次數介於 m & n 之間，省略 n 表示無限次
例： a{1,3} 匹配 aaaaa => aaa

'\'

使用限制字元
例： a\+ 匹配 a+aaaa => a+

[]

匹配字元集合
- 表示從某字元到某字元，例：[a-z]
^ 表示排除字元，例：[^a-z]，需放置最前面，不然當作字元 '^'
例： [a\-z]b 匹配 a-baaa => -b

'|'

或
例： (a|b)aa 匹配 a-baaa => baa

(...)

取得匹配的子字串，並放進 group
例： (a|b)aa 匹配 a-baaa => baa, \1 = b (match.group(0) = baa, match.group(1) = b)

(?aiLmsux)

指定匹配方式
(?a) 讓 \w, \W, \b, \B, \d, \D, \s and \S 只依 ASCII 匹配
(?i) 忽略大小寫
(?L) 讓 \w, \W, \b, \B, \s and \S 依本地字符編碼 (Python 3.6 已被移除)
(?m) ^ $ 匹配不同行的頭和尾
(?s) '.' 匹配全部，包括 \n
(?u) unicode 匹配 (Python 3 已移除，預設已為 unicode 匹配)
(?x) 忽略空白字符，且可以用 # 當作註解，可多行建立 pattern，利用 """abc string"""

例： (?x)t es #測試匹配 test => tes

可同時使用，例：(?imx)^aa 匹配 a-bA\n#AA => None 因 AA 被註解

(?:...)

不取得匹配的子字串
例： (?:a|b|c)1 匹配 abc1234 => c1

(?P<name>)

增加別名
例： (?P<aa>a|b|c)1 匹配 abc1234 => c1, group('aa') = c

(?P=name)

與別名的字串匹配
例： (?P<aa>a|b|c)1234(?P=aa) 匹配 abc1234c => c1234c

(?#...)

註解
例： (?P:a|b|c)(?#comment)1234 匹配 abc1234 => c1234

(?=...)

之後的字串需匹配，但不消耗字串且不放進 group
例： abc(?=1234)123 匹配 abc1234 => abc123

(?!...)

之後的字串需不匹配，但不消耗字串且不放進 group
例： ab(?!\d).123 匹配 abc1234 => abc123

(?<=...)

之前的字串需匹配，但不消耗字串且不放進 group
例： c(?<=abc)123 匹配 abc1234 => c123

(?<!...)

之前的字串需不匹配，但不消耗字串且不放進 group
例： (?<!\d)c123 匹配 abc1234 => c123

(?(id/name)yes-pattern|no-pattern)

若子字串匹配成立，則為 yes-pattern 否則為 no-pattern (可省略)
例： (\d)?abc(?(1)\d|) 匹配 1abc1 或 abc 或 1abc 或 abc1

匹配輸入字串的開始位置，不受 Multiline 影響
例： (?m)\Aabc 匹配 de\nabc => None

匹配單詞的開頭或結尾，也就是單詞的分界處
\b 在字符類裡使用代表退格，故建議使用 r'string' 或 \\
例： \\bhi\\b.* 匹配 history, hi a => hi a

匹配不是單詞開頭或結束的位置
例： \B1234\B. 匹配 1234a1234c => 1234c

匹配數字 unicode 包括全部數字，在 (?a) 下同 [0-9]
unicode 數字
例： 1\d 匹配 1234 => 12

匹配非數字 unicode 包括全部數字，在 (?a) 下同 [^0-9]
unicode 數字
例： a\D 匹配 abc => ab

匹配空白字符 unicode 包括全部空白字符，在 (?a) 下同 [ \t\n\r\f\v]
unicode 空白字符
例： a\s 匹配 a\nbc => a\n

匹配非空白字符 unicode 包括全部空白字符，在 (?a) 下同 [^ \t\n\r\f\v]
unicode 空白字符
例： b\S 匹配 a\nbc => bc

匹配 word， unicode 包括全部 word，在 (?a) 下同 [a-zA-Z0-9_]
例： b\w 匹配 a\nbc => bc

匹配非 word， unicode 包括全部 word，在 (?a) 下同 [^a-zA-Z0-9_]
例： a\W 匹配 a\nbc => a\n

匹配輸入字串的結尾位置，不受 Multiline 影響
例： (?m)de\Z 匹配 de\nabc => None

re Module

-------------------------------------------------------------------------------------
簡單範例

import re
 
# 編譯成 Pattern 對象
pattern = re.compile(r'hello')
 
# 取得匹配結果，無法匹配返回 None
match = pattern.match('hello world!')
 
if match:
    # 得到匹配結果
    print(match.group())

flags 設定，可同時使用，用 | 隔開

re.A(re.ASCII) 同(?i)
re.I(re.IGNORECASE) 同(?i)
re.L(re.LOCALE) 同(?L)
re.M(re.MULTILINE) 同(?m)
re.S(re.DOTALL) 同(?s)
re.U(re.UNICODE) 同(?u)
re.X(re.VERBOSE) 同(?x)
re.DEBUG：顯示 pattern 的邏輯

re.compile(pattern, flags=0)

import re
pattern = re.compile(r'hello', re.I | re.M)

pattern = re.compile(r"""\d +  # 數字部分
                         \.    # 小數點
                         \d *  # 小數部分""", re.X)

re.escape(string)

import re
str = '(123)'
# 除了英文字母、數字和 '_' 以外，對所有字進行反斜線處理，可用在變數上
pattern = re.escape(str) # '\\(123\\)'

re.search(pattern, string, flags=0)

import re
match = re.search(r'world', 'hello world!')

re.match(pattern, string, flags=0)

import re
# 從字串開頭找，結果會是 None
match = re.match(r'world', 'hello world!')

re.fullmatch(pattern, string, flags=0)

import re
# 需完全符合
match = re.fullmatch(r'hello world', 'hello world')

re.split(pattern, string, maxsplit=0, flags=0)

import re
# >> ['ab', 'cd', 'd']
match = re.split(r'\d', 'ab2cd5d')

re.findall(pattern, string, flags=0)

import re
# >> ['2', '5']
match = re.findall(r'\d', 'ab2cd5d')

re.finditer(pattern, string, flags=0)

import re
# iterator object
match = re.finditer(r'\d', 'ab2cd5d')

re.sub(pattern, repl, string, count=0, flags=0)
template 可使用 \id 或 \g<id>、\g<name> 引用 group
\10 會認為是第 10 個 group，若表達 \1 接著是 '0'，可使用 \g<1>0

import re
# >> 'ab_cd_d'
str = re.sub(r'\d','_', 'ab2cd5d')

re.subn(pattern, repl, string, count=0, flags=0)
template 可使用 \id 或 \g<id>、\g<name> 引用 group
\10 會認為是第 10 個 group，若表達 \1 接著是 '0'，可使用 \g<1>0

import re
#同 sub，但回傳 (newString, 取代次數) >> ('abc2dd5', 2)
tupleA = re.subn(r'(\d)(.)',r'\2\1', 'ab2cd5d')

re Match Objects

-------------------------------------------------------------------------------------
match.expand(template)
template 可使用 \id 或 \g<id>、\g<name> 引用 group
\10 會認為是第 10 個 group，若表達 \1 接著是 '0'，可使用 \g<1>0

import re 
pattern = re.compile(r'(\w*) (\w*)')
match = pattern.match('hello world!')
# >> world hello
print(match.expand(r'\2 \1'))

match.group([group1, ...])

import re 
pattern = re.compile(r'(\w*) (\w*)(?P<tt>.*)')
match = pattern.match('hello world!!!')
# >> hello world
print(match.group(0))
# >> hello
print(match.group(1))
# >> world
print(match.group(2))
# >> !!!
print(match.group(3))
#同上 >> !!!
print(match.group('tt'))

match.groups(default=None)

import re 
pattern = re.compile(r'(\w*) (\w*)(?P<tt>.*)')
match = pattern.match('hello world!!!')
#>> ('hello', 'world', '!!!')
print(match.groups())

match.groupdict(default=None)

import re 
pattern = re.compile(r'(\w*) (\w*)(?P<tt>.*)')
match = pattern.match('hello world!!!')
#>> {'tt': '!!!'}
print(match.groupdict())

match.start([group])
match.end([group])
match.span([group])

import re 
pattern = re.compile(r'(\d*) (\d*)(?P<tt>.*)')
match = pattern.match('012345 789!!')
#第一個字符的索引 >> 7
print(match.start(2))
#最後一個字符的索引 + 1 >> 6
print(match.end(1))
#(start, end) >> (10, 12)
print(match.span(3))
#同上 >> (10, 12)
print(match.span('tt'))
#預設為 group(0) >> (0, 12)
print(match.span())

參考：

正規表示式 - 維基百科
正則表達式30分鐘入門教程
Python正則表達式指南

子風的知識庫

搜尋此網誌