Cheat Sheet for Python Regular Expressions

Regular expressions (regex) are a powerful tool for searching and processing text using specific patterns. In Python, you can work with regex through the re module.

1import re

To escape special characters, you can use:

  • A backslash (\)
  • Raw strings (r"...")

Main Functions for Working with Regex

  • re.match(r'pattern', string) โ€” Matches the pattern at the beginning of the string.
  • re.search() โ€” Finds the first occurrence and returns a match object.
  • re.span() โ€” Returns a tuple with the start and end positions of the match.
  • re.string() โ€” Returns the string passed to re.search().
  • re.group() โ€” Returns the matched substring.
  • re.findall() โ€” Finds all matches and returns them as a list.
  • re.split() โ€” Splits the string by the given pattern.
  • re.sub() โ€” Replaces a matched substring with another string.
  • re.compile() โ€” Compiles a regex pattern into a regex object for reuse.

Basic Syntax Elements

  • . โ€” Any character except a newline.
  • ^ โ€” Start of the string.
  • $ โ€” End of the string.
  • * โ€” 0 or more occurrences.
  • + โ€” 1 or more occurrences.
  • ? โ€” 0 or 1 occurrence.
  • {n} โ€” Exactly n occurrences.
  • {n,m} โ€” Between n and m occurrences.
  • [] โ€” Character set.
  • \ โ€” Escape character.
  • | โ€” Logical OR.
  • () โ€” Grouping expressions.
  • [^...] โ€” Any character except the ones listed in the brackets.

Special Characters

  • \w โ€” Any letter, digit, or underscore.
  • \W โ€” Anything except letters, digits, and underscores.
  • \d โ€” Any digit (0-9).
  • \D โ€” Any non-digit character.
  • \s โ€” Space or newline character.
  • \S โ€” Anything except whitespace.
  • \A โ€” Start of a string.
  • \Z โ€” End of a string.
  • \b โ€” Word boundary.
  • \B โ€” Non-word boundary.
  • \n, \t, \r โ€” Newline, tab, carriage return.

Regular Expression Flags

  • g (global) โ€” Continues searching for matches after the first one.
  • m (multi-line) โ€” Allows ^ and $ to match the start and end of each line in multi-line text.
  • i (insensitive) โ€” Makes the regex case-insensitive.

Simple Regular Expression Examples

1. Finding words starting with "a" and ending with "z"

1import re
2
3pattern = r'a\w*z'
4text = "apple, azure, amazing, puzzle, jazz"
5result = re.findall(pattern, text)
6print(result)  # ['azure', 'amazing']
  • \w* โ€” matches any number of word characters (letters, digits, or underscores) between "a" and "z".

2. Finding all numbers in a string

1pattern = r'\d+'
2text = "Prices: 500 dollars, 3000 euros, and 45 pounds."
3result = re.findall(pattern, text)
4print(result)  # ['500', '3000', '45']
  • \d+ โ€” matches one or more digits.

3. Finding words containing the letter "e"

1pattern = r'\b\w*e\w*\b'
2text = "Hello there, welcome to the world of regex!"
3result = re.findall(pattern, text)
4print(result)  # ['there', 'welcome', 'regex']
  • \b โ€” word boundary, \w* โ€” matches any characters.

4. Finding words of a specific length

1pattern = r'\b\w{5}\b'
2text = "Apple is a fruit, and tiger is an animal."
3result = re.findall(pattern, text)
4print(result)  # ['Apple', 'tiger']
  • \w{5} โ€” matches words exactly 5 characters long.

Complex Regular Expression Examples

1. Phone number matching and validation (format: +1 (999) 123-4567)

1pattern = r'\+1\s?\(\d{3}\)\s?\d{3}-\d{4}'
2text = "My number is +1 (999) 123-4567, and another one is +1(888)987-6543."
3result = re.findall(pattern, text)
4print(result)  # ['+1 (999) 123-4567', '+1(888)987-6543']
  • \+1 โ€” matches the country code.
  • \(\d{3}\) โ€” matches the area code in parentheses (three digits).
  • \d{3}-\d{4} โ€” matches the phone number part with hyphen.

2. Email validation

1pattern = r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+'
2text = "My emails are [email protected] and [email protected]."
3result = re.findall(pattern, text)
4print(result)  # ['[email protected]', '[email protected]']
  • [a-zA-Z0-9_.+-]+ โ€” matches the username, allowing letters, numbers, underscores, dots, and dashes.
  • @[a-zA-Z0-9-]+ โ€” matches the domain before the dot.
  • \.[a-zA-Z0-9-.]+ โ€” matches the domain extension, such as .com or .uk.

3. Finding IP addresses (format: 192.168.0.1)

1pattern = r'\b(?:\d{1,3}\.){3}\d{1,3}\b'
2text = "Servers: 192.168.0.1, 10.0.0.255, 8.8.8.8."
3result = re.findall(pattern, text)
4print(result)  # ['192.168.0.1', '10.0.0.255', '8.8.8.8']
  • \d{1,3} โ€” matches 1 to 3 digits.
  • (?:\d{1,3}\.){3} โ€” repeats the block of digits and dot three times.
  • \d{1,3} โ€” the final block of digits.

4. Finding dates in the format (dd.mm.yyyy)

1pattern = r'\b\d{2}\.\d{2}\.\d{4}\b'
2text = "The event is on 25.12.2024, the last one was on 01.01.2023."
3result = re.findall(pattern, text)
4print(result)  # ['25.12.2024', '01.01.2023']
  • \d{2} โ€” matches the day and month (two digits each).
  • \d{4} โ€” matches the year (four digits).

5. Removing all HTML tags from a text

1pattern = r'<.*?>'
2html_text = "<div><h1>Title</h1><p>Article text here</p></div>"
3clean_text = re.sub(pattern, '', html_text)
4print(clean_text)  # TitleArticle text here
  • <.*?> โ€” matches anything between angle brackets (HTML tags).

These examples demonstrate how you can use regular expressions to solve both simple and complex tasks, such as validating input and processing text. You can modify these patterns based on your specific needs.

comments powered by Disqus

Translations: