Python Regular Expressions (Regex)

Regular Expressions (Regex) are powerful patterns used for searching, extracting, replacing, and manipulating text data. Python provides a built-in module called re for using regex effectively.

This tutorial covers:

  • What is a Regular Expression?
  • Python’s re module.
  • Basic Regex patterns and syntax.
  • Common Regex functions (match(), search(), findall(), sub(), etc.)
  • Advanced regex concepts with practical examples.
  • Regex flags for modifying matching behavior.

1. Introduction to Regular Expressions

Regular Expressions are special text strings that describe search patterns. They help you efficiently search for patterns within text or strings, validating data, parsing logs, extracting structured information, and more.

Examples of Regex Usage:
  • Validate email addresses or phone numbers.
  • Extract information from web pages.
  • Search through log files.
  • Replace or format strings.

2. Python’s Regex Module (re)

Python has a built-in module named re, specifically designed for handling regular expressions.

Importing the Module:

import re

3. Basic Regex Patterns

Commonly used regex characters and syntax:
Regex PatternMeaningExample
.Any single charactera.b matches “acb”, “adb”
^Start of a string^hello matches strings starting with “hello”
$End of a stringworld$ matches strings ending with “world”
\dAny digit (0-9)\d\d\d matches “123”
\DAny non-digit character\D\D matches “ab”
\wAny alphanumeric or underscore\w\w matches “A1”
\WAny non-alphanumeric\W matches “!”
\sAny whitespacehello\sWorld matches “hello World”
\SAny non-whitespace\S\S matches “ab”
[abc]Any character a, b, or c[aeiou] matches vowels
[0-9]Any digit from 0 to 9[1-5] matches “2” or “4”
( )Grouping(ab)+ matches “ab”, “abab”, “ababab”
+One or more occurrencesa+ matches “a”, “aa”, “aaa”
*Zero or more occurrencesab* matches “a”, “ab”, “abb”, “abbb”
?Zero or one occurrenceab? matches “a” or “ab”
{n}Exactly n occurrencesa{3} matches “aaa”
{n,m}Between n and m occurrencesa{2,4} matches “aa”, “aaa”, or “aaaa”

4. Python Regex Functions

Common methods of the re module:
  • re.match() – Checks for a match only at the beginning of the string.
  • re.search() – Searches throughout the entire string and returns the first match.
  • re.findall() – Returns a list of all matches.
  • re.finditer() – Returns an iterator yielding match objects.
  • re.sub() – Replaces matches with specified text.
  • re.split() – Splits strings based on a regex pattern.

5. Practical Examples with Regex

Example 1: Searching for a pattern
import re

text = "Hello, my number is 123-456-7890."
pattern = r"\d{3}-\d{3}-\d{4}"

match = re.search(pattern, text)

if match:
    print("Phone number found:", match.group())
else:
    print("No match found.")

Explanation:

  • The regex pattern \d{3}-\d{3}-\d{4} looks for the standard US phone number format.
  • re.search() returns the first occurrence of this pattern.

Output:

Example 2: Extracting Multiple Matches (findall())
import re

text = "Emails: alice@example.com, bob@gmail.com, carol@outlook.com"
pattern = r"\w+@\w+\.\w+"

emails = re.findall(pattern, text)
print(emails)

Explanation:

  • Finds all email addresses in the provided text.

Output:

Example 3: Replacing Text (sub())
import re

text = "Today is 2025-03-28"
pattern = r"\d{4}-\d{2}-\d{2}"

new_text = re.sub(pattern, "DATE", text)
print(new_text)

Explanation:

  • Replaces the date format with the word “DATE”.

Output:

Example 4: Splitting Text (split())
import re

text = "one,two;three four"
pattern = r"[,;\s]"  # split on comma, semicolon, or whitespace

words = re.split(pattern, text)
print(words)

Explanation:

  • Splits the text at commas, semicolons, and spaces.

Output:

6. Regex Flags

Regex flags modify the behavior of regex matching:

  • re.I or re.IGNORECASE: Ignore uppercase/lowercase distinctions.
  • re.M or re.MULTILINE: ^ and $ match the start/end of each line.
  • re.S or re.DOTALL: . matches newline characters as well.

Example Using Flags:

import re

text = "Hello\nWorld"
pattern = r".+"

match = re.match(pattern, text, re.S)
print(match.group())

Output:

7. Advanced Concepts: Grouping and Capturing

Using parentheses () in patterns creates groups that can be extracted separately.

Example: Grouping

import re

text = "My name is Alice and I am 30"
pattern = r"My name is (\w+) and I am (\d+)"

match = re.search(pattern, text)

if match:
    name = match.group(1)
    age = match.group(2)
    print(f"Name: {name}, Age: {age}")
    

Output:

8. Tips & Best Practices

  • Always test your regex patterns.
  • Use raw strings (r"pattern") to avoid escaping (\) issues.
  • Regex can become complex; keep them readable and add comments (re.VERBOSE).

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top