String Basics
Common Operations
s = "Hello, World!"
# Case
s.lower() # "hello, world!"
s.upper() # "HELLO, WORLD!"
s.title() # "Hello, World!"
# Whitespace
s.strip() # Remove leading/trailing
s.lstrip() # Remove leading
s.rstrip() # Remove trailing
# Search
s.find("World") # 7 (index) or -1
s.index("World") # 7 (raises if not found)
"World" in s # True
s.count("o") # 2
# Replace
s.replace("o", "0") # "Hell0, W0rld!"
# Split/Join
s.split(", ") # ["Hello", "World!"]
", ".join(["a", "b"]) # "a, b"
Parsing
Splitting Data
line = "name,email,age"
parts = line.split(",") # ["name", "email", "age"]
# With limit
"a,b,c,d".split(",", 2) # ["a", "b", "c,d"]
Key-Value Parsing
line = "key=value"
key, value = line.split("=", 1)
Multi-line
text = """line1
line2
line3"""
lines = text.splitlines() # ["line1", "line2", "line3"]
Regular Expressions
Basic Patterns
import re
text = "Contact: john@example.com or call 555-1234"
# Find first match
match = re.search(r'\d{3}-\d{4}', text)
if match:
print(match.group()) # "555-1234"
# Find all matches
emails = re.findall(r'[\w.-]+@[\w.-]+\.\w+', text)
# Replace
clean = re.sub(r'\d', 'X', text)
# Split on pattern
parts = re.split(r'\s+', text)
Common Patterns
r'\d+' # One or more digits
r'\w+' # Word characters
r'\s+' # Whitespace
r'[a-z]+' # Lowercase letters
r'^Start' # Starts with
r'end
Extraction
Between Delimiters
text = "Hello [World] and [Python]"
matches = re.findall(r'\[(.*?)\]', text)
# ["World", "Python"]
Capture Groups
text = "Name: John, Age: 30"
match = re.search(r'Name: (\w+), Age: (\d+)', text)
if match:
name = match.group(1) # "John"
age = match.group(2) # "30"
Extract Numbers
text = "Order 123 has 5 items for $99.99"
numbers = re.findall(r'\d+\.?\d*', text)
# ["123", "5", "99.99"]
Transformation
Clean Whitespace
text = " too many spaces "
clean = " ".join(text.split()) # "too many spaces"
Remove Characters
# Remove non-alphanumeric
clean = re.sub(r'[^\w\s]', '', text)
# Remove digits
clean = re.sub(r'\d', '', text)
Normalize
# Lowercase and strip
normalized = text.lower().strip()
# Remove accents
import unicodedata
def remove_accents(s):
return ''.join(
c for c in unicodedata.normalize('NFD', s)
if unicodedata.category(c) != 'Mn'
)
Format Conversion
CSV to Dict
import csv
from io import StringIO
csv_text = "name,age\nAlice,30\nBob,25"
reader = csv.DictReader(StringIO(csv_text))
for row in reader:
print(row) # {"name": "Alice", "age": "30"}
JSON Parsing
import json
text = '{"name": "Alice", "age": 30}'
data = json.loads(text)
# Back to string
text = json.dumps(data, indent=2)
Key-Value File
def parse_env(text):
result = {}
for line in text.splitlines():
line = line.strip()
if line and '=' in line:
key, value = line.split('=', 1)
result[key] = value
return result
Validation
Email
def is_valid_email(email):
pattern = r'^[\w.-]+@[\w.-]+\.\w+
URL
def is_valid_url(url):
pattern = r'^https?://[\w.-]+\.[a-z]{2,}'
return bool(re.match(pattern, url, re.IGNORECASE))
Phone
def is_valid_phone(phone):
pattern = r'^\d{3}[-.]?\d{3}[-.]?\d{4}
Template Processing
F-strings
name = "Alice"
f"Hello, {name}!"
Template String
from string import Template
t = Template("Hello, $name!")
result = t.substitute(name="Alice")
Format Method
"Hello, {name}!".format(name="Alice")
"{0} and {1}".format("Alice", "Bob")
Encoding
UTF-8
# Encode string to bytes
b = "Hello".encode('utf-8')
# Decode bytes to string
s = b.decode('utf-8')
Handle Errors
# Replace invalid characters
s = bytes_data.decode('utf-8', errors='replace')
Conclusion
Text processing essentials:
- String methods for basic operations
- Regex for pattern matching
- Proper parsing for structured data
- Validation for input checking
Master these for effective text handling.
Next: Web Scraping Basics - Extracting web data
# Ends with
r'a|b' # a or b
Extraction
Between Delimiters
__CODE_BLOCK_6__
Capture Groups
__CODE_BLOCK_7__
Extract Numbers
__CODE_BLOCK_8__
Transformation
Clean Whitespace
__CODE_BLOCK_9__
Remove Characters
__CODE_BLOCK_10__
Normalize
__CODE_BLOCK_11__
Format Conversion
CSV to Dict
__CODE_BLOCK_12__
JSON Parsing
__CODE_BLOCK_13__
Key-Value File
__CODE_BLOCK_14__
Validation
Email
__CODE_BLOCK_15__
URL
__CODE_BLOCK_16__
Phone
__CODE_BLOCK_17__
Template Processing
F-strings
__CODE_BLOCK_18__
Template String
__CODE_BLOCK_19__
Format Method
__CODE_BLOCK_20__
Encoding
UTF-8
__CODE_BLOCK_21__
Handle Errors
__CODE_BLOCK_22__
Conclusion
Text processing essentials:
- String methods for basic operations
- Regex for pattern matching
- Proper parsing for structured data
- Validation for input checking
Master these for effective text handling.
Next: Web Scraping Basics - Extracting web data
return bool(re.match(pattern, email))
URL
__CODE_BLOCK_16__
Phone
__CODE_BLOCK_17__
Template Processing
F-strings
__CODE_BLOCK_18__
Template String
__CODE_BLOCK_19__
Format Method
__CODE_BLOCK_20__
Encoding
UTF-8
__CODE_BLOCK_21__
Handle Errors
__CODE_BLOCK_22__
Conclusion
Text processing essentials:
- String methods for basic operations
- Regex for pattern matching
- Proper parsing for structured data
- Validation for input checking
Master these for effective text handling.
Next: Web Scraping Basics - Extracting web data
# Ends with
r'a|b' # a or b
Extraction
Between Delimiters
__CODE_BLOCK_6__Capture Groups
__CODE_BLOCK_7__Extract Numbers
__CODE_BLOCK_8__Transformation
Clean Whitespace
__CODE_BLOCK_9__Remove Characters
__CODE_BLOCK_10__Normalize
__CODE_BLOCK_11__Format Conversion
CSV to Dict
__CODE_BLOCK_12__JSON Parsing
__CODE_BLOCK_13__Key-Value File
__CODE_BLOCK_14__Validation
URL
__CODE_BLOCK_16__Phone
__CODE_BLOCK_17__Template Processing
F-strings
__CODE_BLOCK_18__Template String
__CODE_BLOCK_19__Format Method
__CODE_BLOCK_20__Encoding
UTF-8
__CODE_BLOCK_21__Handle Errors
__CODE_BLOCK_22__Conclusion
Text processing essentials:
- String methods for basic operations
- Regex for pattern matching
- Proper parsing for structured data
- Validation for input checking
Master these for effective text handling.
Next: Web Scraping Basics - Extracting web data
return bool(re.match(pattern, phone))Template Processing
F-strings
__CODE_BLOCK_18__Template String
__CODE_BLOCK_19__Format Method
__CODE_BLOCK_20__Encoding
UTF-8
__CODE_BLOCK_21__Handle Errors
__CODE_BLOCK_22__Conclusion
Text processing essentials:
- String methods for basic operations
- Regex for pattern matching
- Proper parsing for structured data
- Validation for input checking
Master these for effective text handling.
Next: Web Scraping Basics - Extracting web data
# Ends with r'a|b' # a or bExtraction
Between Delimiters
__CODE_BLOCK_6__Capture Groups
__CODE_BLOCK_7__Extract Numbers
__CODE_BLOCK_8__Transformation
Clean Whitespace
__CODE_BLOCK_9__Remove Characters
__CODE_BLOCK_10__Normalize
__CODE_BLOCK_11__Format Conversion
CSV to Dict
__CODE_BLOCK_12__JSON Parsing
__CODE_BLOCK_13__Key-Value File
__CODE_BLOCK_14__Validation
URL
__CODE_BLOCK_16__Phone
__CODE_BLOCK_17__Template Processing
F-strings
__CODE_BLOCK_18__Template String
__CODE_BLOCK_19__Format Method
__CODE_BLOCK_20__Encoding
UTF-8
__CODE_BLOCK_21__Handle Errors
__CODE_BLOCK_22__Conclusion
Text processing essentials:
- String methods for basic operations
- Regex for pattern matching
- Proper parsing for structured data
- Validation for input checking
Master these for effective text handling.
Next: Web Scraping Basics - Extracting web data
return bool(re.match(pattern, email))URL
__CODE_BLOCK_16__Phone
__CODE_BLOCK_17__Template Processing
F-strings
__CODE_BLOCK_18__Template String
__CODE_BLOCK_19__Format Method
__CODE_BLOCK_20__Encoding
UTF-8
__CODE_BLOCK_21__Handle Errors
__CODE_BLOCK_22__Conclusion
Text processing essentials:
- String methods for basic operations
- Regex for pattern matching
- Proper parsing for structured data
- Validation for input checking
Master these for effective text handling.
Next: Web Scraping Basics - Extracting web data
# Ends with r'a|b' # a or bExtraction
Between Delimiters
__CODE_BLOCK_6__Capture Groups
__CODE_BLOCK_7__Extract Numbers
__CODE_BLOCK_8__Transformation
Clean Whitespace
__CODE_BLOCK_9__Remove Characters
__CODE_BLOCK_10__Normalize
__CODE_BLOCK_11__Format Conversion
CSV to Dict
__CODE_BLOCK_12__JSON Parsing
__CODE_BLOCK_13__Key-Value File
__CODE_BLOCK_14__Validation
URL
__CODE_BLOCK_16__Phone
__CODE_BLOCK_17__Template Processing
F-strings
__CODE_BLOCK_18__Template String
__CODE_BLOCK_19__Format Method
__CODE_BLOCK_20__Encoding
UTF-8
__CODE_BLOCK_21__Handle Errors
__CODE_BLOCK_22__Conclusion
Text processing essentials:
- String methods for basic operations
- Regex for pattern matching
- Proper parsing for structured data
- Validation for input checking
Master these for effective text handling.
Next: Web Scraping Basics - Extracting web data