What is Regex?
Regular expressions (regex) are patterns for matching text. They're powerful for:
- Searching and finding
- Validating formats
- Extracting data
- Search and replace
Basic Syntax
Literal Characters
Most characters match themselves:
Pattern: hello
Matches: "hello"
Special Characters
These have special meaning (escape with \ to match literally):
. ^ $ * + ? { } [ ] \ | ( )
Character Classes
Basic Classes
. Any character (except newline)
\d Digit [0-9]
\D Non-digit
\w Word character [a-zA-Z0-9_]
\W Non-word character
\s Whitespace
\S Non-whitespace
Custom Classes
[abc] a, b, or c
[a-z] a through z
[A-Z] A through Z
[0-9] 0 through 9
[^abc] NOT a, b, or c
Examples
Pattern: \d\d\d
Matches: "123", "456", "789"
Pattern: [aeiou]
Matches: any vowel
Pattern: [A-Za-z]
Matches: any letter
Quantifiers
How Many
* 0 or more
+ 1 or more
? 0 or 1
{n} Exactly n
{n,} n or more
{n,m} Between n and m
Examples
Pattern: \d+
Matches: "1", "42", "12345"
Pattern: \d{3}
Matches: "123" (exactly 3 digits)
Pattern: \d{2,4}
Matches: "12", "123", "1234"
Anchors
Position Markers
^ Start of string/line
$ End of string/line
\b Word boundary
\B Not word boundary
Examples
Pattern: ^hello
Matches: "hello world" (starts with hello)
No match: "say hello"
Pattern: world$
Matches: "hello world" (ends with world)
Pattern: \bcat\b
Matches: "the cat sat" (whole word)
No match: "category"
Groups and Alternation
Groups
(abc) Capturing group
(?:abc) Non-capturing group
Alternation
a|b a OR b
(cat|dog) cat OR dog
Examples
Pattern: (Mr|Mrs|Ms)\. \w+
Matches: "Mr. Smith", "Mrs. Jones", "Ms. Wilson"
Pattern: (\d{3})-(\d{4})
Captures: "555-1234" → Group 1: "555", Group 2: "1234"
Common Patterns
[\w.-]+@[\w.-]+\.\w+
Matches: user@example.com, name.last@domain.org
Phone Number
\d{3}[-.]?\d{3}[-.]?\d{4}
Matches: 555-123-4567, 555.123.4567, 5551234567
URL
https?://[\w.-]+(?:/[\w./-]*)?
Matches: http://example.com, https://site.org/path
Date (YYYY-MM-DD)
\d{4}-\d{2}-\d{2}
Matches: 2025-02-01
IP Address
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Matches: 192.168.1.1
Hex Color
#[0-9A-Fa-f]{6}
Matches: #FF5500, #aabbcc
Using Regex
In Python
import re
# Search
match = re.search(r'\d+', 'abc123def')
if match:
print(match.group()) # "123"
# Find all
numbers = re.findall(r'\d+', 'a1 b2 c3')
# ['1', '2', '3']
# Replace
result = re.sub(r'\d', 'X', 'a1b2c3')
# 'aXbXcX'
# Split
parts = re.split(r'\s+', 'hello world')
# ['hello', 'world']
In JavaScript
// Search
const match = 'abc123def'.match(/\d+/);
// ["123"]
// Find all
const numbers = 'a1 b2 c3'.match(/\d+/g);
// ["1", "2", "3"]
// Replace
const result = 'a1b2c3'.replace(/\d/g, 'X');
// 'aXbXcX'
// Test
/^\d+$/.test('123'); // true
In Bash
# grep
echo "abc123" | grep -o '[0-9]\+'
# 123
# sed
echo "a1b2c3" | sed 's/[0-9]/X/g'
# aXbXcX
Practical Examples
Validate Input
def is_valid_email(email):
pattern = r'^[\w.-]+@[\w.-]+\.\w+
Extract Data
text = "Contact: john@example.com or call 555-1234"
email = re.search(r'[\w.-]+@[\w.-]+\.\w+', text)
# john@example.com
phone = re.search(r'\d{3}-\d{4}', text)
# 555-1234
Clean Text
# Remove extra whitespace
text = re.sub(r'\s+', ' ', ' too many spaces ')
# "too many spaces"
# Remove non-alphanumeric
text = re.sub(r'[^\w\s]', '', 'Hello, World!')
# "Hello World"
Parse Logs
log = '2025-02-01 10:30:45 ERROR Connection failed'
pattern = r'(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) (\w+) (.+)'
match = re.match(pattern, log)
# Groups: date, time, level, message
Common Mistakes
Not Escaping
# Wrong - . matches any character
re.search(r'file.txt', 'fileXtxt') # Matches!
# Right
re.search(r'file\.txt', 'file.txt')
Greedy Matching
text = '<tag>content</tag>'
# Greedy - matches too much
re.search(r'<.*>', text) # '<tag>content</tag>'
# Non-greedy
re.search(r'<.*?>', text) # '<tag>'
Forgetting Anchors
# Without anchors - matches partial
re.search(r'\d{3}', '12345') # Matches '123'
# With anchors - exact match
re.match(r'^\d{3}
Tips
Keep It Simple
Complex regex is hard to debug. Break into parts:
# Instead of one monster regex
# Build and test incrementally
Use Raw Strings
In Python, use r'' to avoid escaping backslashes:
r'\d+' # Good
'\\d+' # Also works but uglier
Test Thoroughly
Regex can have edge cases:
- Empty strings
- Very long strings
- Special characters
- Unicode
Comment Complex Patterns
pattern = r'''
^ # Start
[\w.-]+ # Username
@ # At sign
[\w.-]+ # Domain
\. # Dot
\w+ # TLD
$ # End
'''
re.match(pattern, email, re.VERBOSE)
Conclusion
Regex is a powerful tool for text processing. Master the basics:
- Character classes:
\d, \w, \s, [...]
- Quantifiers:
*, +, ?, {n,m}
- Anchors:
^, $, \b
- Groups:
(...), |
Start simple, test thoroughly, and build complexity gradually.
Next: Shell Scripting Basics - Automating with bash
return bool(re.match(pattern, email))
def is_valid_phone(phone):
pattern = r'^\d{3}[-.]?\d{3}[-.]?\d{4}
Extract Data
__CODE_BLOCK_22__
Clean Text
__CODE_BLOCK_23__
Parse Logs
__CODE_BLOCK_24__
Common Mistakes
Not Escaping
__CODE_BLOCK_25__
Greedy Matching
__CODE_BLOCK_26__
Forgetting Anchors
__CODE_BLOCK_27__
Tips
Keep It Simple
Complex regex is hard to debug. Break into parts:
__CODE_BLOCK_28__
Use Raw Strings
In Python, use __INLINE_CODE_12__ to avoid escaping backslashes:
__CODE_BLOCK_29__
Test Thoroughly
Regex can have edge cases:
- Empty strings
- Very long strings
- Special characters
- Unicode
Comment Complex Patterns
__CODE_BLOCK_30__
Conclusion
Regex is a powerful tool for text processing. Master the basics:
- Character classes: __INLINE_CODE_13__, __INLINE_CODE_14__, __INLINE_CODE_15__, __INLINE_CODE_16__
- Quantifiers: __INLINE_CODE_17__, __INLINE_CODE_18__, __INLINE_CODE_19__, __INLINE_CODE_20__
- Anchors: __INLINE_CODE_21__, __INLINE_CODE_22__, __INLINE_CODE_23__
- Groups: __INLINE_CODE_24__, __INLINE_CODE_25__
Start simple, test thoroughly, and build complexity gradually.
Next: Shell Scripting Basics - Automating with bash
return bool(re.match(pattern, phone))
Extract Data
__CODE_BLOCK_22__
Clean Text
__CODE_BLOCK_23__
Parse Logs
__CODE_BLOCK_24__
Common Mistakes
Not Escaping
__CODE_BLOCK_25__
Greedy Matching
__CODE_BLOCK_26__
Forgetting Anchors
__CODE_BLOCK_27__
Tips
Keep It Simple
Complex regex is hard to debug. Break into parts:
__CODE_BLOCK_28__
Use Raw Strings
In Python, use __INLINE_CODE_12__ to avoid escaping backslashes:
__CODE_BLOCK_29__
Test Thoroughly
Regex can have edge cases:
- Empty strings
- Very long strings
- Special characters
- Unicode
Comment Complex Patterns
__CODE_BLOCK_30__
Conclusion
Regex is a powerful tool for text processing. Master the basics:
- Character classes: __INLINE_CODE_13__, __INLINE_CODE_14__, __INLINE_CODE_15__, __INLINE_CODE_16__
- Quantifiers: __INLINE_CODE_17__, __INLINE_CODE_18__, __INLINE_CODE_19__, __INLINE_CODE_20__
- Anchors: __INLINE_CODE_21__, __INLINE_CODE_22__, __INLINE_CODE_23__
- Groups: __INLINE_CODE_24__, __INLINE_CODE_25__
Start simple, test thoroughly, and build complexity gradually.
Next: Shell Scripting Basics - Automating with bash
, '12345') # No match
re.match(r'^\d{3}
Tips
Keep It Simple
Complex regex is hard to debug. Break into parts:
__CODE_BLOCK_28__
Use Raw Strings
In Python, use __INLINE_CODE_12__ to avoid escaping backslashes:
__CODE_BLOCK_29__
Test Thoroughly
Regex can have edge cases:
- Empty strings
- Very long strings
- Special characters
- Unicode
Comment Complex Patterns
__CODE_BLOCK_30__
Conclusion
Regex is a powerful tool for text processing. Master the basics:
- Character classes: __INLINE_CODE_13__, __INLINE_CODE_14__, __INLINE_CODE_15__, __INLINE_CODE_16__
- Quantifiers: __INLINE_CODE_17__, __INLINE_CODE_18__, __INLINE_CODE_19__, __INLINE_CODE_20__
- Anchors: __INLINE_CODE_21__, __INLINE_CODE_22__, __INLINE_CODE_23__
- Groups: __INLINE_CODE_24__, __INLINE_CODE_25__
Start simple, test thoroughly, and build complexity gradually.
Next: Shell Scripting Basics - Automating with bash
return bool(re.match(pattern, email))
def is_valid_phone(phone):
pattern = r'^\d{3}[-.]?\d{3}[-.]?\d{4}
Extract Data
__CODE_BLOCK_22__
Clean Text
__CODE_BLOCK_23__
Parse Logs
__CODE_BLOCK_24__
Common Mistakes
Not Escaping
__CODE_BLOCK_25__
Greedy Matching
__CODE_BLOCK_26__
Forgetting Anchors
__CODE_BLOCK_27__
Tips
Keep It Simple
Complex regex is hard to debug. Break into parts:
__CODE_BLOCK_28__
Use Raw Strings
In Python, use __INLINE_CODE_12__ to avoid escaping backslashes:
__CODE_BLOCK_29__
Test Thoroughly
Regex can have edge cases:
- Empty strings
- Very long strings
- Special characters
- Unicode
Comment Complex Patterns
__CODE_BLOCK_30__
Conclusion
Regex is a powerful tool for text processing. Master the basics:
- Character classes: __INLINE_CODE_13__, __INLINE_CODE_14__, __INLINE_CODE_15__, __INLINE_CODE_16__
- Quantifiers: __INLINE_CODE_17__, __INLINE_CODE_18__, __INLINE_CODE_19__, __INLINE_CODE_20__
- Anchors: __INLINE_CODE_21__, __INLINE_CODE_22__, __INLINE_CODE_23__
- Groups: __INLINE_CODE_24__, __INLINE_CODE_25__
Start simple, test thoroughly, and build complexity gradually.
Next: Shell Scripting Basics - Automating with bash
return bool(re.match(pattern, phone))
Extract Data
__CODE_BLOCK_22__Clean Text
__CODE_BLOCK_23__Parse Logs
__CODE_BLOCK_24__Common Mistakes
Not Escaping
__CODE_BLOCK_25__Greedy Matching
__CODE_BLOCK_26__Forgetting Anchors
__CODE_BLOCK_27__Tips
Keep It Simple
Complex regex is hard to debug. Break into parts:
__CODE_BLOCK_28__
Use Raw Strings
In Python, use __INLINE_CODE_12__ to avoid escaping backslashes:
__CODE_BLOCK_29__
Test Thoroughly
Regex can have edge cases:
- Empty strings
- Very long strings
- Special characters
- Unicode
Comment Complex Patterns
__CODE_BLOCK_30__Conclusion
Regex is a powerful tool for text processing. Master the basics:
- Character classes: __INLINE_CODE_13__, __INLINE_CODE_14__, __INLINE_CODE_15__, __INLINE_CODE_16__
- Quantifiers: __INLINE_CODE_17__, __INLINE_CODE_18__, __INLINE_CODE_19__, __INLINE_CODE_20__
- Anchors: __INLINE_CODE_21__, __INLINE_CODE_22__, __INLINE_CODE_23__
- Groups: __INLINE_CODE_24__, __INLINE_CODE_25__
Start simple, test thoroughly, and build complexity gradually.
Next: Shell Scripting Basics - Automating with bash
, '123') # MatchesTips
Keep It Simple
Complex regex is hard to debug. Break into parts:
__CODE_BLOCK_28__
Use Raw Strings
In Python, use __INLINE_CODE_12__ to avoid escaping backslashes:
__CODE_BLOCK_29__
Test Thoroughly
Regex can have edge cases:
- Empty strings
- Very long strings
- Special characters
- Unicode
Comment Complex Patterns
__CODE_BLOCK_30__Conclusion
Regex is a powerful tool for text processing. Master the basics:
- Character classes: __INLINE_CODE_13__, __INLINE_CODE_14__, __INLINE_CODE_15__, __INLINE_CODE_16__
- Quantifiers: __INLINE_CODE_17__, __INLINE_CODE_18__, __INLINE_CODE_19__, __INLINE_CODE_20__
- Anchors: __INLINE_CODE_21__, __INLINE_CODE_22__, __INLINE_CODE_23__
- Groups: __INLINE_CODE_24__, __INLINE_CODE_25__
Start simple, test thoroughly, and build complexity gradually.
Next: Shell Scripting Basics - Automating with bash
return bool(re.match(pattern, email)) def is_valid_phone(phone): pattern = r'^\d{3}[-.]?\d{3}[-.]?\d{4}Extract Data
__CODE_BLOCK_22__Clean Text
__CODE_BLOCK_23__Parse Logs
__CODE_BLOCK_24__Common Mistakes
Not Escaping
__CODE_BLOCK_25__Greedy Matching
__CODE_BLOCK_26__Forgetting Anchors
__CODE_BLOCK_27__Tips
Keep It Simple
Complex regex is hard to debug. Break into parts:
__CODE_BLOCK_28__
Use Raw Strings
In Python, use __INLINE_CODE_12__ to avoid escaping backslashes:
__CODE_BLOCK_29__
Test Thoroughly
Regex can have edge cases:
- Empty strings
- Very long strings
- Special characters
- Unicode
Comment Complex Patterns
__CODE_BLOCK_30__Conclusion
Regex is a powerful tool for text processing. Master the basics:
- Character classes: __INLINE_CODE_13__, __INLINE_CODE_14__, __INLINE_CODE_15__, __INLINE_CODE_16__
- Quantifiers: __INLINE_CODE_17__, __INLINE_CODE_18__, __INLINE_CODE_19__, __INLINE_CODE_20__
- Anchors: __INLINE_CODE_21__, __INLINE_CODE_22__, __INLINE_CODE_23__
- Groups: __INLINE_CODE_24__, __INLINE_CODE_25__
Start simple, test thoroughly, and build complexity gradually.
Next: Shell Scripting Basics - Automating with bash
return bool(re.match(pattern, phone))Extract Data
__CODE_BLOCK_22__Clean Text
__CODE_BLOCK_23__Parse Logs
__CODE_BLOCK_24__Common Mistakes
Not Escaping
__CODE_BLOCK_25__Greedy Matching
__CODE_BLOCK_26__Forgetting Anchors
__CODE_BLOCK_27__Tips
Keep It Simple
Complex regex is hard to debug. Break into parts:
__CODE_BLOCK_28__
Use Raw Strings
In Python, use __INLINE_CODE_12__ to avoid escaping backslashes:
__CODE_BLOCK_29__
Test Thoroughly
Regex can have edge cases:
- Empty strings
- Very long strings
- Special characters
- Unicode
Comment Complex Patterns
__CODE_BLOCK_30__Conclusion
Regex is a powerful tool for text processing. Master the basics:
- Character classes: __INLINE_CODE_13__, __INLINE_CODE_14__, __INLINE_CODE_15__, __INLINE_CODE_16__
- Quantifiers: __INLINE_CODE_17__, __INLINE_CODE_18__, __INLINE_CODE_19__, __INLINE_CODE_20__
- Anchors: __INLINE_CODE_21__, __INLINE_CODE_22__, __INLINE_CODE_23__
- Groups: __INLINE_CODE_24__, __INLINE_CODE_25__
Start simple, test thoroughly, and build complexity gradually.
Next: Shell Scripting Basics - Automating with bash