redis-expert
Expert knowledge of Redis data structures, eviction policies, Lua scripting, messaging patterns, cluster topologies, memory optimization, and advanced caching strategies. Trigger phrases: when using Redis, Redis data structure selection, distributed locks with Redis,
Redis Expert
Redis is simultaneously a cache, message broker, session store, leaderboard engine, and stream processor — but only if you choose the right data structure. The biggest Redis mistakes are: using Strings when a Hash would save memory, using blocking operations in application hot paths, and not planning for eviction. Redis is single-threaded for command execution (I/O is multi-threaded since 6.0), so O(n) commands like KEYS, SMEMBERS on large sets, and LRANGE 0 -1 can block the server.
Core Mental Model
Every Redis data structure solves a different problem class. Strings are versatile but wasteful at scale. Hashes are memory-efficient objects. Sorted Sets are the Swiss army knife for ranking, scheduling, and range queries. Streams are the correct answer for durable messaging — not pub/sub. Memory is finite and Redis will evict or OOM if you don't plan TTLs and eviction policy. Cluster adds horizontal scale but complicates multi-key operations and Lua scripts.
Data Structure Selection Guide
| Structure | Use For | Avoid When |
| String | Simple KV, counters, locks, small serialized objects | Many fields per key (use Hash) |
| Hash | Objects, session data, user profiles | More than a few thousand fields |
| List | Work queues, activity feeds (bounded), stacks | Large random-access needs |
| Set | Unique membership, tagging, intersection/union | Ordered access needed |
| Sorted Set | Leaderboards, rate limiting, scheduling, priority queues | Pure unordered membership |
| Stream | Durable event log, consumer groups, audit trail | Fire-and-forget (use pub/sub) |
| HyperLogLog | Approximate unique counts (±0.81% error) | Exact counts needed |
| Bloom Filter | "Definitely not in set" checks (RedisBloom) | Membership must be certain |
| Geo | Distance queries, nearby search | Complex polygon queries |
| Bitmap | Bit-level flags, DAU counting | Non-integer keys |
# Memory comparison: 1000 user objects
# 1000 individual Strings (JSON serialized)
SET user:1 '{"id":1,"name":"Alice","email":"[email protected]","score":42}'
# ~120 bytes per key × 1000 = ~120KB + overhead per key = ~200KB total
# vs 1 Hash per user (ziplist encoded if fields < hash-max-ziplist-entries)
HSET user:1 name Alice email [email protected] score 42
# ~65 bytes × 1000 = ~65KB — nearly 3x more memory-efficient
TTL and Eviction Policies
Eviction Policy Selection
maxmemory-policy options (set in redis.conf or CONFIG SET):
noeviction → Return error when memory full. Use for: queues, data you can't afford to lose
allkeys-lru → Evict least recently used from ALL keys. Use for: general cache
volatile-lru → Evict LRU from keys WITH expiry only. Use for: cache + persistent mix
allkeys-lfu → Evict least frequently used. Use for: Zipf-distributed access patterns
volatile-lfu → LFU from keys with expiry
allkeys-random → Random eviction. Rarely correct
volatile-ttl → Evict keys with shortest remaining TTL first
Recommendation:
Pure cache: allkeys-lru or allkeys-lfu
Cache + durable data: volatile-lru (set TTL on cache keys, not on durable keys)
Queue / stream (no eviction): noeviction + monitor memory
Hot/cold access patterns: allkeys-lfu (LFU handles Zipf better than LRU)
# redis.conf
maxmemory 4gb
maxmemory-policy allkeys-lfu
maxmemory-samples 10 # LRU/LFU approximation sample size (higher = more accurate, more CPU)
# Runtime change
CONFIG SET maxmemory-policy allkeys-lfu
CONFIG SET maxmemory 4gb
# Check eviction stats
INFO stats | grep evicted_keys
INFO memory | grep used_memory_human
Distributed Lock (Redlock)
# Single-instance lock (sufficient for most use cases)
import redis
import uuid
import time
def acquire_lock(r: redis.Redis, lock_name: str, timeout_ms: int = 30000) -> str | None:
"""Returns lock token if acquired, None if lock is held."""
token = str(uuid.uuid4())
acquired = r.set(
f"lock:{lock_name}",
token,
px=timeout_ms, # expiry in milliseconds
nx=True # only set if Not eXists
)
return token if acquired else None
def release_lock(r: redis.Redis, lock_name: str, token: str) -> bool:
"""Atomic release — only release if we own the lock."""
script = """
if redis.call('get', KEYS[1]) == ARGV[1] then
return redis.call('del', KEYS[1])
else
return 0
end
"""
result = r.eval(script, 1, f"lock:{lock_name}", token)
return bool(result)
# Usage
token = acquire_lock(r, "payment_processor_user_42", timeout_ms=10000)
if token:
try:
process_payment(user_id=42)
finally:
release_lock(r, "payment_processor_user_42", token)
else:
raise Exception("Could not acquire lock — another process is running")
// Redlock for multi-node Redis (true distributed lock)
import Redlock from "redlock";
const redlock = new Redlock([client1, client2, client3], {
retryCount: 3,
retryDelay: 200, // ms between retries
retryJitter: 100, // random jitter to prevent thundering herd
driftFactor: 0.01 // clock drift tolerance
});
const lock = await redlock.acquire(["lock:payment:user:42"], 10000);
try {
await processPayment(userId);
} finally {
await lock.release();
}
Rate Limiter
# Sliding window rate limiter using Sorted Set
def is_rate_limited(r: redis.Redis, user_id: str,
limit: int = 100, window_seconds: int = 60) -> bool:
key = f"rate:{user_id}"
now = time.time()
window_start = now - window_seconds
pipe = r.pipeline()
# Remove entries outside the window
pipe.zremrangebyscore(key, 0, window_start)
# Count requests in window
pipe.zcard(key)
# Add current request (score = timestamp)
pipe.zadd(key, {str(uuid.uuid4()): now})
# Set TTL to clean up idle keys
pipe.expire(key, window_seconds + 1)
_, count, _, _ = pipe.execute()
return count >= limit # True means rate limited
# Fixed window counter (simpler, tiny thundering herd at window boundary)
def fixed_window_limit(r: redis.Redis, user_id: str,
limit: int = 100, window_seconds: int = 60) -> bool:
key = f"ratelimit:{user_id}:{int(time.time() // window_seconds)}"
count = r.incr(key)
if count == 1:
r.expire(key, window_seconds)
return count > limit
Leaderboard with Sorted Set
# Add/update scores
ZADD leaderboard 1500 "player:alice"
ZADD leaderboard 2300 "player:bob"
ZADD leaderboard 890 "player:carol"
ZINCRBY leaderboard 100 "player:alice" # atomic increment
# Top N players (descending score)
ZREVRANGEBYSCORE leaderboard +inf -inf WITHSCORES LIMIT 0 10
# Player's rank (0-indexed, use ZREVRANK for highest-first)
ZREVRANK leaderboard "player:alice" # → 1 (0-based, so rank 2)
# Players in score range
ZRANGEBYSCORE leaderboard 1000 2000 WITHSCORES
# Player's score
ZSCORE leaderboard "player:alice"
# Window around player (show neighbors)
ZREVRANK leaderboard "player:alice" # get rank first
ZREVRANGE leaderboard 0 9 WITHSCORES # get surrounding players
# Full leaderboard service
class Leaderboard:
def __init__(self, r: redis.Redis, key: str):
self.r = r
self.key = key
def add_score(self, player_id: str, score: float):
self.r.zadd(self.key, {player_id: score})
def increment_score(self, player_id: str, delta: float) -> float:
return self.r.zincrby(self.key, delta, player_id)
def get_rank(self, player_id: str) -> int | None:
rank = self.r.zrevrank(self.key, player_id)
return rank + 1 if rank is not None else None # 1-indexed
def get_top(self, n: int = 10) -> list[dict]:
entries = self.r.zrevrangebyscore(self.key, "+inf", "-inf",
withscores=True, start=0, num=n)
return [{"player": p.decode(), "score": s, "rank": i+1}
for i, (p, s) in enumerate(entries)]
def get_around_player(self, player_id: str, radius: int = 5) -> list[dict]:
rank = self.r.zrevrank(self.key, player_id)
if rank is None:
return []
start = max(0, rank - radius)
stop = rank + radius
entries = self.r.zrevrange(self.key, start, stop, withscores=True)
return [{"player": p.decode(), "score": s, "rank": start + i + 1}
for i, (p, s) in enumerate(entries)]
Pub/Sub vs Streams vs Lists
| Pattern | Delivery | History | Consumer Groups | Persistence | Use For |
| Pub/Sub | Fire-and-forget | ❌ None | ❌ | ❌ | Real-time notifications, live updates |
| List (LPUSH/BRPOP) | At-least-once | ❌ (consumed) | ❌ | AOF/RDB | Simple work queue |
| Streams | At-least-once + ACK | ✅ | ✅ Consumer groups | AOF/RDB | Durable event log, multi-consumer |
# Redis Streams: durable event processing with consumer groups
# Producer
r.xadd("orders", {
"order_id": "ord_123",
"customer": "alice",
"total": "99.99",
"status": "new"
})
# Create consumer group (read from beginning: '0', or latest: '
Lua Scripting for Atomic Operations
-- redis-lua: atomic check-and-set with complex logic
-- KEYS[1] = counter key, ARGV[1] = limit, ARGV[2] = ttl_seconds
local current = redis.call('GET', KEYS[1])
if current and tonumber(current) >= tonumber(ARGV[1]) then
return 0 -- rate limited
end
local new_val = redis.call('INCR', KEYS[1])
if new_val == 1 then
redis.call('EXPIRE', KEYS[1], ARGV[2])
end
return 1 -- allowed
# Load and execute Lua script (cached by SHA)
rate_limit_script = r.register_script("""
local current = redis.call('GET', KEYS[1])
if current and tonumber(current) >= tonumber(ARGV[1]) then
return 0
end
local new_val = redis.call('INCR', KEYS[1])
if new_val == 1 then
redis.call('EXPIRE', KEYS[1], ARGV[2])
end
return 1
""")
allowed = rate_limit_script(keys=[f"rate:{user_id}"], args=[limit, window_seconds])
Cache Stampede Prevention
# Problem: cache expires, 1000 requests all miss and query DB simultaneously
# Solution 1: Probabilistic early recomputation (XFetch algorithm)
import math
import random
def get_with_xfetch(r: redis.Redis, key: str, ttl: int,
fetch_fn, beta: float = 1.0):
"""Proactively recompute before expiry using probabilistic early refresh."""
data = r.get(key)
if data:
value, expiry = deserialize(data)
remaining = expiry - time.time()
delta = time.time() - fetch_fn.last_duration
# Recompute early based on remaining TTL and fetch cost
if remaining - delta * beta * math.log(random.random()) < 0:
return refresh(r, key, ttl, fetch_fn) # early refresh
return value
return refresh(r, key, ttl, fetch_fn)
# Solution 2: Lock-based (only one refresh, others wait)
def get_or_compute(r: redis.Redis, key: str, ttl: int, compute_fn):
value = r.get(key)
if value:
return deserialize(value)
lock_key = f"{key}:computing"
lock_token = acquire_lock(r, lock_key, timeout_ms=5000)
if lock_token:
try:
# Double-check after acquiring lock
value = r.get(key)
if value:
return deserialize(value)
result = compute_fn()
r.setex(key, ttl, serialize(result))
return result
finally:
release_lock(r, lock_key, lock_token)
else:
# Another worker is computing — wait briefly and retry
time.sleep(0.1)
return get_or_compute(r, key, ttl, compute_fn)
Memory Optimization
# Check encoding of a key
OBJECT ENCODING mykey
# Possible values: int, embstr, raw, ziplist, listpack, hashtable, skiplist, quicklist
# redis.conf thresholds for compact encoding
hash-max-listpack-entries 128 # Hash uses listpack if ≤ 128 fields
hash-max-listpack-value 64 # and all values ≤ 64 bytes
zset-max-listpack-entries 128 # Sorted Set uses listpack if ≤ 128 members
zset-max-listpack-value 64
set-max-intset-entries 512 # Set uses intset if all members are integers ≤ 512
# Memory analysis
MEMORY USAGE mykey # bytes for a specific key
MEMORY DOCTOR # recommendations
DEBUG OBJECT mykey # encoding + serialized length
# Find large keys (use SCAN, never KEYS in production)
redis-cli --bigkeys # scans and reports largest keys per type
redis-cli --memkeys # reports memory usage per key
# SCAN instead of KEYS
SCAN 0 MATCH "user:*" COUNT 100 # cursor-based, non-blocking
# Iterate until cursor returns 0
Cluster vs Sentinel vs Standalone
Standalone: Single node. Simple ops. Zero HA. Dev/test only.
Sentinel: HA with automatic failover. 3+ sentinel processes.
Primary + replicas. Reads can go to replicas.
No horizontal scaling. Good for < ~25GB, moderate throughput.
Cluster: Horizontal scaling + HA. 3+ primary nodes.
Data automatically sharded across nodes (16384 hash slots).
Multi-key ops require keys on same slot (use hash tags: {user}.profile, {user}.session)
Lua scripts limited to keys on same slot.
Use for: large datasets, high throughput needs.
Hash tags for cluster co-location:
MSET {user:42}.profile "..." {user:42}.session "..." ✅ same slot
MSET user:42:profile "..." user:42:session "..." ❌ potentially different slots
Anti-Patterns
# ❌ KEYS in production (blocks server while scanning ALL keys)
KEYS user:*
# ✅ SCAN with cursor
SCAN 0 MATCH "user:*" COUNT 100
# ❌ Large collections without pagination
SMEMBERS huge_set # O(N) — blocks if N is large
LRANGE mylist 0 -1 # entire list
# ✅ Paginate
SSCAN myset 0 COUNT 100
LRANGE mylist 0 99 # page 1
# ❌ Storing large blobs (> 100KB) per key
SET user:42:avatar [50KB binary]
# ✅ Store in object storage (S3/GCS), store URL in Redis
# ❌ No TTL on cache keys (memory fills, eviction kicks in unpredictably)
SET cache:user:42 "..."
# ✅ Always set TTL
SETEX cache:user:42 3600 "..."
# ❌ pub/sub for reliable messaging (messages lost if subscriber is down)
PUBLISH notifications '{"event":"payment_complete"}'
# ✅ Streams for reliability
XADD notifications * event payment_complete user_id 42
# ❌ String for every field of an object (1 key per field)
SET user:42:name "Alice"
SET user:42:email "[email protected]"
# ✅ Hash for objects
HSET user:42 name Alice email [email protected]
Quick Reference
Data Structure Decision:
Simple KV / counter / flag → String
Object with multiple fields → Hash
Work queue / stack → List (LPUSH/BRPOP)
Unique membership / tag sets → Set
Ranking / scheduling / ranges → Sorted Set
Durable event log / multi-consumer → Stream
Approx unique count → HyperLogLog
"Definitely not present" check → Bloom Filter (RedisBloom)
Eviction Policy:
Pure cache → allkeys-lru or allkeys-lfu
Mixed cache + persistent → volatile-lru
Queue / stream (no loss allowed) → noeviction + alerting
Distributed Lock:
Single node → SET NX PX + Lua release
Multi-node (true distributed) → Redlock (3+ nodes)
Rate Limiting:
Sliding window (accurate) → ZADD + ZREMRANGEBYSCORE
Fixed window (simple) → INCR + EXPIRE
Topology:
Dev / test → Standalone
HA, < 25GB → Sentinel (3 nodes)
Scale out, > 25GB → Cluster (6+ nodes: 3 primary + 3 replica)
)
r.xgroup_create("orders", "order_processors", id="0", mkstream=True)
# Consumer (in worker process)
while True:
# XREADGROUP: read up to 10 messages, block 2s if empty
messages = r.xreadgroup(
groupname="order_processors",
consumername="worker-1",
streams={"orders": ">"}, # ">" = new undelivered messages
count=10,
block=2000
)
for stream_name, entries in messages or []:
for msg_id, fields in entries:
try:
process_order(fields)
r.xack("orders", "order_processors", msg_id) # ACK = processed
except Exception as e:
log_error(e)
# Message stays in PEL (pending entry list) for retry/DLQ
# Claim stale messages (messages pending > 60s — worker may have crashed)
stale = r.xautoclaim("orders", "order_processors", "worker-1",
min_idle_time=60000, start_id="0-0")
Lua Scripting for Atomic Operations
__CODE_BLOCK_9__ __CODE_BLOCK_10__Cache Stampede Prevention
__CODE_BLOCK_11__Memory Optimization
__CODE_BLOCK_12__Cluster vs Sentinel vs Standalone
__CODE_BLOCK_13__Anti-Patterns
__CODE_BLOCK_14__Quick Reference
__CODE_BLOCK_15__Skill Information
- Source
- MoltbotDen
- Category
- Data & Analytics
- Repository
- View on GitHub
Related Skills
sql-expert
Write advanced SQL queries for analytics, reporting, and application databases. Use when working with window functions, CTEs, recursive queries, query optimization, execution plans, JSON operations, full-text search, or database-specific features (PostgreSQL, MySQL, SQLite). Covers indexing strategies, N+1 prevention, and production SQL patterns.
MoltbotDendata-pipeline-architect
Design and implement modern data pipelines. Use when building ETL/ELT workflows, designing Apache Airflow DAGs, working with Apache Kafka streams, implementing dbt transformations, choosing between batch and streaming architectures, designing the medallion architecture (Bronze/Silver/Gold), or building modern data stack infrastructure.
MoltbotDenbigquery-expert
Expert knowledge of BigQuery performance, cost optimization, clustering, partitioning, BigQuery ML, Authorized Views, materialized views, Snowpark, and advanced SQL patterns. Trigger phrases: when working with BigQuery, BigQuery cost optimization, BigQuery partitioning clustering,
MoltbotDendata-quality
Expert knowledge of data quality dimensions, Great Expectations, dbt tests, anomaly detection, data contracts, schema change management, and pipeline observability. Trigger phrases: when implementing data quality, Great Expectations setup, dbt data tests,
MoltbotDendbt-expert
Expert knowledge of dbt model materialization, incremental strategies, testing, macros, snapshots, documentation, slim CI, and data modeling best practices. Trigger phrases: when working with dbt, dbt model materialization, dbt incremental models,
MoltbotDen