Cheat Sheet for Python Regular Expressions
Regular expressions (regex) are a powerful tool for searching and processing text using specific patterns. In Python, you can work with regex through the re
module.
1import re
To escape special characters, you can use:
- A backslash (
\
) - Raw strings (
r"..."
)
Main Functions for Working with Regex
re.match(r'pattern', string)
โ Matches the pattern at the beginning of the string.re.search()
โ Finds the first occurrence and returns amatch
object.re.span()
โ Returns a tuple with the start and end positions of the match.re.string()
โ Returns the string passed tore.search()
.re.group()
โ Returns the matched substring.re.findall()
โ Finds all matches and returns them as a list.re.split()
โ Splits the string by the given pattern.re.sub()
โ Replaces a matched substring with another string.re.compile()
โ Compiles a regex pattern into a regex object for reuse.
Basic Syntax Elements
.
โ Any character except a newline.^
โ Start of the string.$
โ End of the string.*
โ 0 or more occurrences.+
โ 1 or more occurrences.?
โ 0 or 1 occurrence.{n}
โ Exactly n occurrences.{n,m}
โ Between n and m occurrences.[]
โ Character set.\
โ Escape character.|
โ Logical OR.()
โ Grouping expressions.[^...]
โ Any character except the ones listed in the brackets.
Special Characters
\w
โ Any letter, digit, or underscore.\W
โ Anything except letters, digits, and underscores.\d
โ Any digit (0-9).\D
โ Any non-digit character.\s
โ Space or newline character.\S
โ Anything except whitespace.\A
โ Start of a string.\Z
โ End of a string.\b
โ Word boundary.\B
โ Non-word boundary.\n, \t, \r
โ Newline, tab, carriage return.
Regular Expression Flags
g
(global) โ Continues searching for matches after the first one.m
(multi-line) โ Allows^
and$
to match the start and end of each line in multi-line text.i
(insensitive) โ Makes the regex case-insensitive.
Simple Regular Expression Examples
1. Finding words starting with "a" and ending with "z"
1import re
2
3pattern = r'a\w*z'
4text = "apple, azure, amazing, puzzle, jazz"
5result = re.findall(pattern, text)
6print(result) # ['azure', 'amazing']
\w*
โ matches any number of word characters (letters, digits, or underscores) between "a" and "z".
2. Finding all numbers in a string
1pattern = r'\d+'
2text = "Prices: 500 dollars, 3000 euros, and 45 pounds."
3result = re.findall(pattern, text)
4print(result) # ['500', '3000', '45']
\d+
โ matches one or more digits.
3. Finding words containing the letter "e"
1pattern = r'\b\w*e\w*\b'
2text = "Hello there, welcome to the world of regex!"
3result = re.findall(pattern, text)
4print(result) # ['there', 'welcome', 'regex']
\b
โ word boundary,\w*
โ matches any characters.
4. Finding words of a specific length
1pattern = r'\b\w{5}\b'
2text = "Apple is a fruit, and tiger is an animal."
3result = re.findall(pattern, text)
4print(result) # ['Apple', 'tiger']
\w{5}
โ matches words exactly 5 characters long.
Complex Regular Expression Examples
1. Phone number matching and validation (format: +1 (999) 123-4567)
1pattern = r'\+1\s?\(\d{3}\)\s?\d{3}-\d{4}'
2text = "My number is +1 (999) 123-4567, and another one is +1(888)987-6543."
3result = re.findall(pattern, text)
4print(result) # ['+1 (999) 123-4567', '+1(888)987-6543']
\+1
โ matches the country code.\(\d{3}\)
โ matches the area code in parentheses (three digits).\d{3}-\d{4}
โ matches the phone number part with hyphen.
2. Email validation
1pattern = r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+'
2text = "My emails are [email protected] and [email protected]."
3result = re.findall(pattern, text)
4print(result) # ['[email protected]', '[email protected]']
[a-zA-Z0-9_.+-]+
โ matches the username, allowing letters, numbers, underscores, dots, and dashes.@[a-zA-Z0-9-]+
โ matches the domain before the dot.\.[a-zA-Z0-9-.]+
โ matches the domain extension, such as.com
or.uk
.
3. Finding IP addresses (format: 192.168.0.1)
1pattern = r'\b(?:\d{1,3}\.){3}\d{1,3}\b'
2text = "Servers: 192.168.0.1, 10.0.0.255, 8.8.8.8."
3result = re.findall(pattern, text)
4print(result) # ['192.168.0.1', '10.0.0.255', '8.8.8.8']
\d{1,3}
โ matches 1 to 3 digits.(?:\d{1,3}\.){3}
โ repeats the block of digits and dot three times.\d{1,3}
โ the final block of digits.
4. Finding dates in the format (dd.mm.yyyy)
1pattern = r'\b\d{2}\.\d{2}\.\d{4}\b'
2text = "The event is on 25.12.2024, the last one was on 01.01.2023."
3result = re.findall(pattern, text)
4print(result) # ['25.12.2024', '01.01.2023']
\d{2}
โ matches the day and month (two digits each).\d{4}
โ matches the year (four digits).
5. Removing all HTML tags from a text
1pattern = r'<.*?>'
2html_text = "<div><h1>Title</h1><p>Article text here</p></div>"
3clean_text = re.sub(pattern, '', html_text)
4print(clean_text) # TitleArticle text here
<.*?>
โ matches anything between angle brackets (HTML tags).
These examples demonstrate how you can use regular expressions to solve both simple and complex tasks, such as validating input and processing text. You can modify these patterns based on your specific needs.
Useful Links
- Python Regular Expression Documentation
- regexlearn.com โ Learn regex step by step.
- regex101.com โ Test regular expressions online.
comments powered by Disqus