Regular Expressions (Regex) are powerful patterns used for searching, extracting, replacing, and manipulating text data. Python provides a built-in module called re
for using regex effectively.
This tutorial covers:
- What is a Regular Expression?
- Python’s
re
module. - Basic Regex patterns and syntax.
- Common Regex functions (
match()
,search()
,findall()
,sub()
, etc.) - Advanced regex concepts with practical examples.
- Regex flags for modifying matching behavior.
1. Introduction to Regular Expressions
Regular Expressions are special text strings that describe search patterns. They help you efficiently search for patterns within text or strings, validating data, parsing logs, extracting structured information, and more.
Examples of Regex Usage:
- Validate email addresses or phone numbers.
- Extract information from web pages.
- Search through log files.
- Replace or format strings.
2. Python’s Regex Module (re
)
Python has a built-in module named re
, specifically designed for handling regular expressions.
Importing the Module:
import re
3. Basic Regex Patterns
Commonly used regex characters and syntax:
Regex Pattern | Meaning | Example |
---|---|---|
. | Any single character | a.b matches “acb”, “adb” |
^ | Start of a string | ^hello matches strings starting with “hello” |
$ | End of a string | world$ matches strings ending with “world” |
\d | Any digit (0-9) | \d\d\d matches “123” |
\D | Any non-digit character | \D\D matches “ab” |
\w | Any alphanumeric or underscore | \w\w matches “A1” |
\W | Any non-alphanumeric | \W matches “!” |
\s | Any whitespace | hello\sWorld matches “hello World” |
\S | Any non-whitespace | \S\S matches “ab” |
[abc] | Any character a, b, or c | [aeiou] matches vowels |
[0-9] | Any digit from 0 to 9 | [1-5] matches “2” or “4” |
( ) | Grouping | (ab)+ matches “ab”, “abab”, “ababab” |
+ | One or more occurrences | a+ matches “a”, “aa”, “aaa” |
* | Zero or more occurrences | ab* matches “a”, “ab”, “abb”, “abbb” |
? | Zero or one occurrence | ab? matches “a” or “ab” |
{n} | Exactly n occurrences | a{3} matches “aaa” |
{n,m} | Between n and m occurrences | a{2,4} matches “aa”, “aaa”, or “aaaa” |
4. Python Regex Functions
Common methods of the re
module:
re.match()
– Checks for a match only at the beginning of the string.re.search()
– Searches throughout the entire string and returns the first match.re.findall()
– Returns a list of all matches.re.finditer()
– Returns an iterator yielding match objects.re.sub()
– Replaces matches with specified text.re.split()
– Splits strings based on a regex pattern.
5. Practical Examples with Regex
Example 1: Searching for a pattern
import re
text = "Hello, my number is 123-456-7890."
pattern = r"\d{3}-\d{3}-\d{4}"
match = re.search(pattern, text)
if match:
print("Phone number found:", match.group())
else:
print("No match found.")
Explanation:
- The regex pattern
\d{3}-\d{3}-\d{4}
looks for the standard US phone number format. re.search()
returns the first occurrence of this pattern.
Output:
Phone number found: 123-456-7890
Example 2: Extracting Multiple Matches (findall()
)
import re
text = "Emails: alice@example.com, bob@gmail.com, carol@outlook.com"
pattern = r"\w+@\w+\.\w+"
emails = re.findall(pattern, text)
print(emails)
Explanation:
- Finds all email addresses in the provided text.
Output:
[‘alice@example.com’, ‘bob@gmail.com’, ‘carol@outlook.com’]
Example 3: Replacing Text (sub()
)
import re
text = "Today is 2025-03-28"
pattern = r"\d{4}-\d{2}-\d{2}"
new_text = re.sub(pattern, "DATE", text)
print(new_text)
Explanation:
- Replaces the date format with the word “DATE”.
Output:
Today is DATE
Example 4: Splitting Text (split()
)
import re
text = "one,two;three four"
pattern = r"[,;\s]" # split on comma, semicolon, or whitespace
words = re.split(pattern, text)
print(words)
Explanation:
- Splits the text at commas, semicolons, and spaces.
Output:
[‘one’, ‘two’, ‘three’, ‘four’]
6. Regex Flags
Regex flags modify the behavior of regex matching:
re.I
orre.IGNORECASE
: Ignore uppercase/lowercase distinctions.re.M
orre.MULTILINE
:^
and$
match the start/end of each line.re.S
orre.DOTALL
:.
matches newline characters as well.
Example Using Flags:
import re
text = "Hello\nWorld"
pattern = r".+"
match = re.match(pattern, text, re.S)
print(match.group())
Output:
Hello
World
7. Advanced Concepts: Grouping and Capturing
Using parentheses ()
in patterns creates groups that can be extracted separately.
Example: Grouping
import re
text = "My name is Alice and I am 30"
pattern = r"My name is (\w+) and I am (\d+)"
match = re.search(pattern, text)
if match:
name = match.group(1)
age = match.group(2)
print(f"Name: {name}, Age: {age}")
Output:
Name: Alice, Age: 30
8. Tips & Best Practices
- Always test your regex patterns.
- Use raw strings (
r"pattern"
) to avoid escaping (\
) issues. - Regex can become complex; keep them readable and add comments (
re.VERBOSE
).