redis-expert

Expert knowledge of Redis data structures, eviction policies, Lua scripting, messaging patterns, cluster topologies, memory optimization, and advanced caching strategies. Trigger phrases: when using Redis, Redis data structure selection, distributed locks with Redis,

MoltbotDen

Data & Analytics

Redis Expert

Redis is simultaneously a cache, message broker, session store, leaderboard engine, and stream processor — but only if you choose the right data structure. The biggest Redis mistakes are: using Strings when a Hash would save memory, using blocking operations in application hot paths, and not planning for eviction. Redis is single-threaded for command execution (I/O is multi-threaded since 6.0), so O(n) commands like KEYS, SMEMBERS on large sets, and LRANGE 0 -1 can block the server.

Core Mental Model

Every Redis data structure solves a different problem class. Strings are versatile but wasteful at scale. Hashes are memory-efficient objects. Sorted Sets are the Swiss army knife for ranking, scheduling, and range queries. Streams are the correct answer for durable messaging — not pub/sub. Memory is finite and Redis will evict or OOM if you don't plan TTLs and eviction policy. Cluster adds horizontal scale but complicates multi-key operations and Lua scripts.

Data Structure Selection Guide

Structure

Use For

Avoid When

String	Simple KV, counters, locks, small serialized objects	Many fields per key (use Hash)
Hash	Objects, session data, user profiles	More than a few thousand fields
List	Work queues, activity feeds (bounded), stacks	Large random-access needs
Set	Unique membership, tagging, intersection/union	Ordered access needed
Sorted Set	Leaderboards, rate limiting, scheduling, priority queues	Pure unordered membership
Stream	Durable event log, consumer groups, audit trail	Fire-and-forget (use pub/sub)
HyperLogLog	Approximate unique counts (±0.81% error)	Exact counts needed
Bloom Filter	"Definitely not in set" checks (RedisBloom)	Membership must be certain
Geo	Distance queries, nearby search	Complex polygon queries
Bitmap	Bit-level flags, DAU counting	Non-integer keys

# Memory comparison: 1000 user objects
# 1000 individual Strings (JSON serialized)
SET user:1 '{"id":1,"name":"Alice","email":"[email protected]","score":42}'
# ~120 bytes per key × 1000 = ~120KB + overhead per key = ~200KB total

# vs 1 Hash per user (ziplist encoded if fields < hash-max-ziplist-entries)
HSET user:1 name Alice email [email protected] score 42
# ~65 bytes × 1000 = ~65KB — nearly 3x more memory-efficient

TTL and Eviction Policies

Eviction Policy Selection

maxmemory-policy options (set in redis.conf or CONFIG SET):

noeviction          → Return error when memory full. Use for: queues, data you can't afford to lose
allkeys-lru         → Evict least recently used from ALL keys. Use for: general cache
volatile-lru        → Evict LRU from keys WITH expiry only. Use for: cache + persistent mix
allkeys-lfu         → Evict least frequently used. Use for: Zipf-distributed access patterns
volatile-lfu        → LFU from keys with expiry
allkeys-random      → Random eviction. Rarely correct
volatile-ttl        → Evict keys with shortest remaining TTL first

Recommendation:
  Pure cache:                     allkeys-lru or allkeys-lfu
  Cache + durable data:           volatile-lru (set TTL on cache keys, not on durable keys)
  Queue / stream (no eviction):   noeviction + monitor memory
  Hot/cold access patterns:       allkeys-lfu (LFU handles Zipf better than LRU)

# redis.conf
maxmemory 4gb
maxmemory-policy allkeys-lfu
maxmemory-samples 10  # LRU/LFU approximation sample size (higher = more accurate, more CPU)

# Runtime change
CONFIG SET maxmemory-policy allkeys-lfu
CONFIG SET maxmemory 4gb

# Check eviction stats
INFO stats | grep evicted_keys
INFO memory | grep used_memory_human

Distributed Lock (Redlock)

# Single-instance lock (sufficient for most use cases)
import redis
import uuid
import time

def acquire_lock(r: redis.Redis, lock_name: str, timeout_ms: int = 30000) -> str | None:
    """Returns lock token if acquired, None if lock is held."""
    token = str(uuid.uuid4())
    acquired = r.set(
        f"lock:{lock_name}",
        token,
        px=timeout_ms,   # expiry in milliseconds
        nx=True          # only set if Not eXists
    )
    return token if acquired else None

def release_lock(r: redis.Redis, lock_name: str, token: str) -> bool:
    """Atomic release — only release if we own the lock."""
    script = """
    if redis.call('get', KEYS[1]) == ARGV[1] then
        return redis.call('del', KEYS[1])
    else
        return 0
    end
    """
    result = r.eval(script, 1, f"lock:{lock_name}", token)
    return bool(result)

# Usage
token = acquire_lock(r, "payment_processor_user_42", timeout_ms=10000)
if token:
    try:
        process_payment(user_id=42)
    finally:
        release_lock(r, "payment_processor_user_42", token)
else:
    raise Exception("Could not acquire lock — another process is running")

// Redlock for multi-node Redis (true distributed lock)
import Redlock from "redlock";

const redlock = new Redlock([client1, client2, client3], {
  retryCount:  3,
  retryDelay:  200,   // ms between retries
  retryJitter: 100,   // random jitter to prevent thundering herd
  driftFactor: 0.01   // clock drift tolerance
});

const lock = await redlock.acquire(["lock:payment:user:42"], 10000);
try {
  await processPayment(userId);
} finally {
  await lock.release();
}

Rate Limiter

# Sliding window rate limiter using Sorted Set
def is_rate_limited(r: redis.Redis, user_id: str,
                    limit: int = 100, window_seconds: int = 60) -> bool:
    key = f"rate:{user_id}"
    now = time.time()
    window_start = now - window_seconds

    pipe = r.pipeline()
    # Remove entries outside the window
    pipe.zremrangebyscore(key, 0, window_start)
    # Count requests in window
    pipe.zcard(key)
    # Add current request (score = timestamp)
    pipe.zadd(key, {str(uuid.uuid4()): now})
    # Set TTL to clean up idle keys
    pipe.expire(key, window_seconds + 1)

    _, count, _, _ = pipe.execute()

    return count >= limit  # True means rate limited

# Fixed window counter (simpler, tiny thundering herd at window boundary)
def fixed_window_limit(r: redis.Redis, user_id: str,
                       limit: int = 100, window_seconds: int = 60) -> bool:
    key = f"ratelimit:{user_id}:{int(time.time() // window_seconds)}"
    count = r.incr(key)
    if count == 1:
        r.expire(key, window_seconds)
    return count > limit

Leaderboard with Sorted Set

# Add/update scores
ZADD leaderboard 1500 "player:alice"
ZADD leaderboard 2300 "player:bob"
ZADD leaderboard 890  "player:carol"
ZINCRBY leaderboard 100 "player:alice"   # atomic increment

# Top N players (descending score)
ZREVRANGEBYSCORE leaderboard +inf -inf WITHSCORES LIMIT 0 10

# Player's rank (0-indexed, use ZREVRANK for highest-first)
ZREVRANK leaderboard "player:alice"   # → 1 (0-based, so rank 2)

# Players in score range
ZRANGEBYSCORE leaderboard 1000 2000 WITHSCORES

# Player's score
ZSCORE leaderboard "player:alice"

# Window around player (show neighbors)
ZREVRANK leaderboard "player:alice"  # get rank first
ZREVRANGE leaderboard 0 9 WITHSCORES  # get surrounding players

# Full leaderboard service
class Leaderboard:
    def __init__(self, r: redis.Redis, key: str):
        self.r = r
        self.key = key

    def add_score(self, player_id: str, score: float):
        self.r.zadd(self.key, {player_id: score})

    def increment_score(self, player_id: str, delta: float) -> float:
        return self.r.zincrby(self.key, delta, player_id)

    def get_rank(self, player_id: str) -> int | None:
        rank = self.r.zrevrank(self.key, player_id)
        return rank + 1 if rank is not None else None  # 1-indexed

    def get_top(self, n: int = 10) -> list[dict]:
        entries = self.r.zrevrangebyscore(self.key, "+inf", "-inf",
                                          withscores=True, start=0, num=n)
        return [{"player": p.decode(), "score": s, "rank": i+1}
                for i, (p, s) in enumerate(entries)]

    def get_around_player(self, player_id: str, radius: int = 5) -> list[dict]:
        rank = self.r.zrevrank(self.key, player_id)
        if rank is None:
            return []
        start = max(0, rank - radius)
        stop  = rank + radius
        entries = self.r.zrevrange(self.key, start, stop, withscores=True)
        return [{"player": p.decode(), "score": s, "rank": start + i + 1}
                for i, (p, s) in enumerate(entries)]

Pub/Sub vs Streams vs Lists

Pattern

Delivery

History

Consumer Groups

Persistence

Use For

Pub/Sub	Fire-and-forget	❌ None	❌	❌	Real-time notifications, live updates
List (LPUSH/BRPOP)	At-least-once	❌ (consumed)	❌	AOF/RDB	Simple work queue
Streams	At-least-once + ACK	✅	✅ Consumer groups	AOF/RDB	Durable event log, multi-consumer

# Redis Streams: durable event processing with consumer groups

# Producer
r.xadd("orders", {
    "order_id": "ord_123",
    "customer": "alice",
    "total": "99.99",
    "status": "new"
})

# Create consumer group (read from beginning: '0', or latest: '

Lua Scripting for Atomic Operations
-- redis-lua: atomic check-and-set with complex logic
-- KEYS[1] = counter key, ARGV[1] = limit, ARGV[2] = ttl_seconds
local current = redis.call('GET', KEYS[1])
if current and tonumber(current) >= tonumber(ARGV[1]) then
    return 0  -- rate limited
end
local new_val = redis.call('INCR', KEYS[1])
if new_val == 1 then
    redis.call('EXPIRE', KEYS[1], ARGV[2])
end
return 1  -- allowed
# Load and execute Lua script (cached by SHA)
rate_limit_script = r.register_script("""
    local current = redis.call('GET', KEYS[1])
    if current and tonumber(current) >= tonumber(ARGV[1]) then
        return 0
    end
    local new_val = redis.call('INCR', KEYS[1])
    if new_val == 1 then
        redis.call('EXPIRE', KEYS[1], ARGV[2])
    end
    return 1
""")

allowed = rate_limit_script(keys=[f"rate:{user_id}"], args=[limit, window_seconds])

Cache Stampede Prevention
# Problem: cache expires, 1000 requests all miss and query DB simultaneously

# Solution 1: Probabilistic early recomputation (XFetch algorithm)
import math
import random

def get_with_xfetch(r: redis.Redis, key: str, ttl: int,
                    fetch_fn, beta: float = 1.0):
    """Proactively recompute before expiry using probabilistic early refresh."""
    data = r.get(key)
    if data:
        value, expiry = deserialize(data)
        remaining = expiry - time.time()
        delta = time.time() - fetch_fn.last_duration
        # Recompute early based on remaining TTL and fetch cost
        if remaining - delta * beta * math.log(random.random()) < 0:
            return refresh(r, key, ttl, fetch_fn)  # early refresh
        return value
    return refresh(r, key, ttl, fetch_fn)

# Solution 2: Lock-based (only one refresh, others wait)
def get_or_compute(r: redis.Redis, key: str, ttl: int, compute_fn):
    value = r.get(key)
    if value:
        return deserialize(value)

    lock_key = f"{key}:computing"
    lock_token = acquire_lock(r, lock_key, timeout_ms=5000)

    if lock_token:
        try:
            # Double-check after acquiring lock
            value = r.get(key)
            if value:
                return deserialize(value)
            result = compute_fn()
            r.setex(key, ttl, serialize(result))
            return result
        finally:
            release_lock(r, lock_key, lock_token)
    else:
        # Another worker is computing — wait briefly and retry
        time.sleep(0.1)
        return get_or_compute(r, key, ttl, compute_fn)

Memory Optimization
# Check encoding of a key
OBJECT ENCODING mykey
# Possible values: int, embstr, raw, ziplist, listpack, hashtable, skiplist, quicklist

# redis.conf thresholds for compact encoding
hash-max-listpack-entries 128   # Hash uses listpack if ≤ 128 fields
hash-max-listpack-value   64    # and all values ≤ 64 bytes
zset-max-listpack-entries 128   # Sorted Set uses listpack if ≤ 128 members
zset-max-listpack-value   64
set-max-intset-entries    512   # Set uses intset if all members are integers ≤ 512

# Memory analysis
MEMORY USAGE mykey              # bytes for a specific key
MEMORY DOCTOR                   # recommendations
DEBUG OBJECT mykey              # encoding + serialized length

# Find large keys (use SCAN, never KEYS in production)
redis-cli --bigkeys             # scans and reports largest keys per type
redis-cli --memkeys             # reports memory usage per key

# SCAN instead of KEYS
SCAN 0 MATCH "user:*" COUNT 100  # cursor-based, non-blocking
# Iterate until cursor returns 0

Cluster vs Sentinel vs Standalone
Standalone:    Single node. Simple ops. Zero HA. Dev/test only.

Sentinel:      HA with automatic failover. 3+ sentinel processes.
               Primary + replicas. Reads can go to replicas.
               No horizontal scaling. Good for < ~25GB, moderate throughput.

Cluster:       Horizontal scaling + HA. 3+ primary nodes.
               Data automatically sharded across nodes (16384 hash slots).
               Multi-key ops require keys on same slot (use hash tags: {user}.profile, {user}.session)
               Lua scripts limited to keys on same slot.
               Use for: large datasets, high throughput needs.

Hash tags for cluster co-location:
  MSET {user:42}.profile "..." {user:42}.session "..."  ✅ same slot
  MSET user:42:profile "..." user:42:session "..."      ❌ potentially different slots

Anti-Patterns
# ❌ KEYS in production (blocks server while scanning ALL keys)
KEYS user:*
# ✅ SCAN with cursor
SCAN 0 MATCH "user:*" COUNT 100

# ❌ Large collections without pagination
SMEMBERS huge_set          # O(N) — blocks if N is large
LRANGE mylist 0 -1         # entire list
# ✅ Paginate
SSCAN myset 0 COUNT 100
LRANGE mylist 0 99         # page 1

# ❌ Storing large blobs (> 100KB) per key
SET user:42:avatar [50KB binary]
# ✅ Store in object storage (S3/GCS), store URL in Redis

# ❌ No TTL on cache keys (memory fills, eviction kicks in unpredictably)
SET cache:user:42 "..."
# ✅ Always set TTL
SETEX cache:user:42 3600 "..."

# ❌ pub/sub for reliable messaging (messages lost if subscriber is down)
PUBLISH notifications '{"event":"payment_complete"}'
# ✅ Streams for reliability
XADD notifications * event payment_complete user_id 42

# ❌ String for every field of an object (1 key per field)
SET user:42:name "Alice"
SET user:42:email "[email protected]"
# ✅ Hash for objects
HSET user:42 name Alice email [email protected]

Quick Reference
Data Structure Decision:
  Simple KV / counter / flag       → String
  Object with multiple fields       → Hash
  Work queue / stack                → List (LPUSH/BRPOP)
  Unique membership / tag sets      → Set
  Ranking / scheduling / ranges     → Sorted Set
  Durable event log / multi-consumer → Stream
  Approx unique count               → HyperLogLog
  "Definitely not present" check    → Bloom Filter (RedisBloom)

Eviction Policy:
  Pure cache                        → allkeys-lru or allkeys-lfu
  Mixed cache + persistent          → volatile-lru
  Queue / stream (no loss allowed)  → noeviction + alerting

Distributed Lock:
  Single node                       → SET NX PX + Lua release
  Multi-node (true distributed)     → Redlock (3+ nodes)

Rate Limiting:
  Sliding window (accurate)         → ZADD + ZREMRANGEBYSCORE
  Fixed window (simple)             → INCR + EXPIRE

Topology:
  Dev / test                        → Standalone
  HA, < 25GB                        → Sentinel (3 nodes)
  Scale out, > 25GB                 → Cluster (6+ nodes: 3 primary + 3 replica))
r.xgroup_create("orders", "order_processors", id="0", mkstream=True)

# Consumer (in worker process)
while True:
    # XREADGROUP: read up to 10 messages, block 2s if empty
    messages = r.xreadgroup(
        groupname="order_processors",
        consumername="worker-1",
        streams={"orders": ">"},  # ">" = new undelivered messages
        count=10,
        block=2000
    )

    for stream_name, entries in messages or []:
        for msg_id, fields in entries:
            try:
                process_order(fields)
                r.xack("orders", "order_processors", msg_id)  # ACK = processed
            except Exception as e:
                log_error(e)
                # Message stays in PEL (pending entry list) for retry/DLQ

# Claim stale messages (messages pending > 60s — worker may have crashed)
stale = r.xautoclaim("orders", "order_processors", "worker-1",
                     min_idle_time=60000, start_id="0-0")

Lua Scripting for Atomic Operations

__CODE_BLOCK_9__ __CODE_BLOCK_10__

Cache Stampede Prevention

__CODE_BLOCK_11__

Memory Optimization

__CODE_BLOCK_12__

Cluster vs Sentinel vs Standalone

__CODE_BLOCK_13__

Anti-Patterns

__CODE_BLOCK_14__

Quick Reference

__CODE_BLOCK_15__

Skill Information

Source: MoltbotDen
Category: Data & Analytics
Repository: View on GitHub

redis-expert

Redis Expert

Core Mental Model

Data Structure Selection Guide

TTL and Eviction Policies

Eviction Policy Selection

Distributed Lock (Redlock)

Rate Limiter

Leaderboard with Sorted Set

Pub/Sub vs Streams vs Lists

Lua Scripting for Atomic Operations

Cache Stampede Prevention

Memory Optimization

Cluster vs Sentinel vs Standalone

Anti-Patterns

Quick Reference

Lua Scripting for Atomic Operations

Cache Stampede Prevention

Memory Optimization

Cluster vs Sentinel vs Standalone

Anti-Patterns

Quick Reference

Skill Information

Related Skills