langchain-expert

Expert-level LangChain and LangGraph patterns: LCEL pipe syntax, retriever types, memory management, LangSmith tracing, structured output parsing, and stateful agent flows with LangGraph. Covers when to use LangChain vs raw API calls. Trigger phrases: LangChain,

MoltbotDen

AI & LLMs

LangChain & LangGraph

LangChain is powerful when used for what it's actually good at: composable retrieval pipelines, standardized tool interfaces, and evaluation infrastructure. It's frequently misused as a wrapper around simple API calls, adding 10 layers of abstraction to something that should be 20 lines of code. This skill covers when to use it and how to use it well.

Core Mental Model

LangChain's value is in composition (chaining retrievers, parsers, and LLMs into testable pipelines) and standardization (swapping models or vector stores without rewriting logic). LangGraph's value is in state machines (multi-step agents where control flow depends on intermediate results). If your use case is a single API call with a system prompt, use the raw API. If you're building a pipeline with multiple steps, fallbacks, or branching logic, that's where these frameworks earn their keep.

When to Use Each

# RAW API: Single call, simple output — don't reach for LangChain
response = anthropic.messages.create(
    model="claude-sonnet-4-5",
    messages=[{"role": "user", "content": prompt}]
)

# LANGCHAIN: Multi-step pipeline with retrieval + generation
chain = retriever | prompt_template | llm | output_parser

# LANGGRAPH: Stateful agent that loops, branches, or uses tools
# (agent that decides whether to search, use a calculator, or ask clarification)

LCEL: LangChain Expression Language

LCEL uses the | pipe operator to compose Runnables. Every component implements the same interface: invoke(), stream(), batch(), ainvoke().

from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

llm = ChatAnthropic(model="claude-sonnet-4-5", temperature=0)

# Basic chain
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer based on the context provided."),
    ("user", "Context: {context}\n\nQuestion: {question}"),
])

chain = prompt | llm | StrOutputParser()

# RunnableParallel: run multiple chains concurrently
parallel_chain = RunnableParallel(
    answer=chain,
    sources=retriever,  # Run retriever in parallel with generation
)

# RunnablePassthrough: pass input through unchanged
rag_chain = (
    RunnableParallel(
        context=retriever,
        question=RunnablePassthrough(),  # Pass question unchanged to next step
    )
    | prompt
    | llm
    | StrOutputParser()
)

# With fallbacks (graceful degradation)
primary_chain = prompt | ChatAnthropic(model="claude-opus-4-5") | StrOutputParser()
fallback_chain = prompt | ChatAnthropic(model="claude-haiku-4-5") | StrOutputParser()
chain_with_fallback = primary_chain.with_fallbacks([fallback_chain])

# Retry on failure
chain_with_retry = chain.with_retry(
    retry_if_exception_type=(RateLimitError, APITimeoutError),
    wait_exponential_jitter=True,
    stop_after_attempt=3,
)

Advanced Retriever Types

from langchain.retrievers import (
    MultiQueryRetriever,
    ContextualCompressionRetriever,
    EnsembleRetriever,
)
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_community.retrievers import BM25Retriever

# Base vector store retriever
vector_retriever = vectorstore.as_retriever(
    search_type="mmr",           # Max Marginal Relevance: diversity + relevance
    search_kwargs={"k": 10, "fetch_k": 30, "lambda_mult": 0.7},
)

# MultiQueryRetriever: generates multiple query variations to combat query phrasing issues
multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=vector_retriever,
    llm=ChatAnthropic(model="claude-haiku-4-5"),
    # Generates 3 different phrasings of the question, retrieves for each, deduplicates
)

# ContextualCompressionRetriever: extract only relevant passages from retrieved docs
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vector_retriever,
)

# EnsembleRetriever: hybrid BM25 (keyword) + vector (semantic) search
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = 5

ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever],
    weights=[0.4, 0.6],  # Weight toward semantic, but preserve keyword matching
)

# Usage pattern for RAG pipeline
def build_rag_chain(retriever, llm):
    prompt = ChatPromptTemplate.from_template(
        "Use the following context to answer the question.\n"
        "Context: {context}\n"
        "Question: {question}\n"
        "Answer:"
    )
    
    def format_docs(docs):
        return "\n\n---\n\n".join(
            f"Source: {doc.metadata.get('source', 'unknown')}\n{doc.page_content}"
            for doc in docs
        )
    
    return (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )

Memory Management

from langchain.memory import ConversationSummaryMemory
from langchain_core.messages import HumanMessage, AIMessage

# AVOID: ConversationBufferMemory — grows unbounded, hits context limit
# from langchain.memory import ConversationBufferMemory  # BAD for long conversations

# USE: ConversationSummaryMemory — compresses old turns
memory = ConversationSummaryMemory(
    llm=ChatAnthropic(model="claude-haiku-4-5"),
    return_messages=True,
    max_token_limit=1000,  # Summarize when history exceeds this
)

# Better for production: manage conversation history yourself
class ConversationManager:
    def __init__(self, max_tokens=8000):
        self.messages = []
        self.max_tokens = max_tokens
    
    def add_message(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})
        self._trim_if_needed()
    
    def _trim_if_needed(self):
        """Sliding window: remove oldest messages (keep system prompt)"""
        # Rough token estimate: 4 chars ≈ 1 token
        total_chars = sum(len(m["content"]) for m in self.messages)
        while total_chars > self.max_tokens * 4 and len(self.messages) > 2:
            removed = self.messages.pop(0)
            total_chars -= len(removed["content"])

Structured Output with Pydantic

from langchain_core.output_parsers import PydanticOutputParser
from langchain_anthropic import ChatAnthropic
from pydantic import BaseModel, Field
from typing import List

class ExtractedEntity(BaseModel):
    name: str = Field(description="Entity name")
    type: str = Field(description="Entity type: person, organization, location, product")
    confidence: float = Field(description="Confidence score 0-1", ge=0, le=1)
    context: str = Field(description="Surrounding context where entity was found")

class ExtractionResult(BaseModel):
    entities: List[ExtractedEntity]
    summary: str = Field(description="One sentence summary of the text")

# Method 1: Pydantic parser (works with most models)
parser = PydanticOutputParser(pydantic_object=ExtractionResult)
prompt = ChatPromptTemplate.from_messages([
    ("system", "Extract entities from the text.\n{format_instructions}"),
    ("user", "{text}"),
]).partial(format_instructions=parser.get_format_instructions())

chain = prompt | llm | parser

# Method 2: with_structured_output (preferred — uses native tool calling)
structured_llm = ChatAnthropic(model="claude-sonnet-4-5").with_structured_output(ExtractionResult)
result: ExtractionResult = structured_llm.invoke("Anthropic was founded by Dario Amodei in San Francisco.")

LangSmith Tracing

import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "ls_..."
os.environ["LANGCHAIN_PROJECT"] = "my-rag-app"

# Now all chain invocations are automatically traced
# View at: https://smith.langchain.com

# Manual evaluation
from langsmith import Client
from langsmith.evaluation import evaluate

client = Client()

def evaluate_rag_answer(inputs: dict, outputs: dict, reference: dict) -> dict:
    """Custom evaluator for RAG quality"""
    question = inputs["question"]
    generated = outputs["answer"]
    expected = reference["answer"]
    
    # LLM-as-judge
    judge_prompt = f"""
    Question: {question}
    Expected: {expected}
    Generated: {generated}
    
    Rate the generated answer on a scale of 1-5 for accuracy.
    Respond with just the number.
    """
    score = int(llm.invoke(judge_prompt).content.strip())
    return {"accuracy": score / 5, "passed": score >= 3}

results = evaluate(
    lambda inputs: {"answer": chain.invoke(inputs["question"])},
    data="my-rag-dataset",      # Dataset name in LangSmith
    evaluators=[evaluate_rag_answer],
    experiment_prefix="rag-v2",
)

LangGraph: Stateful Agent Flows

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict, Annotated
import operator

# 1. Define state schema
class AgentState(TypedDict):
    messages: Annotated[list, operator.add]  # Append-only via operator.add
    tool_calls: list
    final_answer: str | None

# 2. Define nodes (pure functions that transform state)
def call_llm(state: AgentState) -> dict:
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}

def run_tools(state: AgentState) -> dict:
    last_message = state["messages"][-1]
    results = []
    for tool_call in last_message.tool_calls:
        tool = tools_by_name[tool_call["name"]]
        result = tool.invoke(tool_call["args"])
        results.append(ToolMessage(content=str(result), tool_call_id=tool_call["id"]))
    return {"messages": results}

def should_continue(state: AgentState) -> str:
    """Routing function — determines next node"""
    last_message = state["messages"][-1]
    if last_message.tool_calls:
        return "run_tools"
    return END

# 3. Build graph
workflow = StateGraph(AgentState)
workflow.add_node("call_llm", call_llm)
workflow.add_node("run_tools", run_tools)

workflow.set_entry_point("call_llm")
workflow.add_conditional_edges(
    "call_llm",
    should_continue,  # Router function
    {"run_tools": "run_tools", END: END},
)
workflow.add_edge("run_tools", "call_llm")  # Always go back to LLM after tools

# 4. Compile with checkpointer (enables conversation history)
memory = MemorySaver()
agent = workflow.compile(checkpointer=memory)

# 5. Run with thread_id for persistent state
config = {"configurable": {"thread_id": "user-session-abc123"}}
result = agent.invoke(
    {"messages": [HumanMessage(content="What's the weather in Tokyo?")]},
    config=config,
)

Streaming

# Stream with LCEL
async def stream_response(question: str):
    async for chunk in chain.astream({"question": question}):
        yield chunk

# Stream with LangGraph (events)
async for event in agent.astream_events(
    {"messages": [HumanMessage(content="Research AI trends")]},
    version="v1",
):
    kind = event["event"]
    if kind == "on_chat_model_stream":
        content = event["data"]["chunk"].content
        if content:
            print(content, end="", flush=True)
    elif kind == "on_tool_start":
        print(f"\n[Tool: {event['name']}]")

Custom Tools

from langchain_core.tools import BaseTool, tool
from pydantic import BaseModel

# Simple decorator style
@tool
def search_database(query: str) -> str:
    """Search the product database. Use for product lookups, inventory checks."""
    results = db.search(query)
    return json.dumps(results[:5])

# Structured input with schema
class DatabaseSearchInput(BaseModel):
    query: str = Field(description="Search query")
    limit: int = Field(default=5, description="Max results", ge=1, le=20)
    category: str = Field(default="all", description="Product category filter")

@tool(args_schema=DatabaseSearchInput)
def search_database_structured(query: str, limit: int, category: str) -> str:
    """Search the product database with optional category filter."""
    results = db.search(query, limit=limit, category=category)
    return json.dumps(results)

# Class-based for complex tools with state
class DatabaseTool(BaseTool):
    name: str = "database_search"
    description: str = "Search the company database"
    db_client: Any = Field(exclude=True)  # Not serialized
    
    def _run(self, query: str) -> str:
        return self.db_client.search(query)
    
    async def _arun(self, query: str) -> str:
        return await self.db_client.asearch(query)

Anti-Patterns

❌ Using LangChain for simple single-turn completions
A 10-line OpenAI/Anthropic call doesn't need LangChain. The abstraction overhead is not free — it adds latency, debugging complexity, and version lock-in.

❌ Using ConversationBufferMemory in production
It grows without bound until you hit context limits. Always use windowed or summary memory, or manage conversation history explicitly.

❌ Putting business logic inside LangGraph nodes
Nodes should be thin wrappers that transform state. Business logic belongs in separate, testable functions called by nodes.

❌ Ignoring LangSmith during development
You cannot optimize what you cannot observe. Enable tracing from day 1 — debugging chain failures without traces is guesswork.

❌ Not handling streaming errors
Network errors mid-stream leave partial responses. Always wrap streaming in try/except and signal error to the UI.

Quick Reference

LCEL operator chaining:
  A | B | C     → Sequential (B receives A's output)
  A.with_fallbacks([B])  → Try A, fallback to B on error
  RunnableParallel(a=A, b=B)  → Concurrent execution

Retriever choice:
  Simple semantic search        → vector_store.as_retriever()
  Query phrasing sensitivity    → MultiQueryRetriever
  Keyword + semantic needed     → EnsembleRetriever (BM25 + vector)
  Long documents, need excerpts → ContextualCompressionRetriever

LangGraph patterns:
  Linear pipeline               → StateGraph with sequential edges
  LLM + tools (ReAct)          → Conditional edge on tool_calls present
  Human-in-the-loop             → interrupt_before=["human_review"] node
  Multi-agent                   → Subgraph per agent, supervisor graph above

Skill Information

Source: MoltbotDen
Category: AI & LLMs
Repository: View on GitHub