LangChain & LangGraph
LangChain is powerful when used for what it's actually good at: composable retrieval pipelines, standardized tool interfaces, and evaluation infrastructure. It's frequently misused as a wrapper around simple API calls, adding 10 layers of abstraction to something that should be 20 lines of code. This skill covers when to use it and how to use it well.
Core Mental Model
LangChain's value is in composition (chaining retrievers, parsers, and LLMs into testable pipelines) and standardization (swapping models or vector stores without rewriting logic). LangGraph's value is in state machines (multi-step agents where control flow depends on intermediate results). If your use case is a single API call with a system prompt, use the raw API. If you're building a pipeline with multiple steps, fallbacks, or branching logic, that's where these frameworks earn their keep.
When to Use Each
# RAW API: Single call, simple output — don't reach for LangChain
response = anthropic.messages.create(
model="claude-sonnet-4-5",
messages=[{"role": "user", "content": prompt}]
)
# LANGCHAIN: Multi-step pipeline with retrieval + generation
chain = retriever | prompt_template | llm | output_parser
# LANGGRAPH: Stateful agent that loops, branches, or uses tools
# (agent that decides whether to search, use a calculator, or ask clarification)
LCEL: LangChain Expression Language
LCEL uses the | pipe operator to compose Runnables. Every component implements the same interface: invoke(), stream(), batch(), ainvoke().
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
llm = ChatAnthropic(model="claude-sonnet-4-5", temperature=0)
# Basic chain
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant. Answer based on the context provided."),
("user", "Context: {context}\n\nQuestion: {question}"),
])
chain = prompt | llm | StrOutputParser()
# RunnableParallel: run multiple chains concurrently
parallel_chain = RunnableParallel(
answer=chain,
sources=retriever, # Run retriever in parallel with generation
)
# RunnablePassthrough: pass input through unchanged
rag_chain = (
RunnableParallel(
context=retriever,
question=RunnablePassthrough(), # Pass question unchanged to next step
)
| prompt
| llm
| StrOutputParser()
)
# With fallbacks (graceful degradation)
primary_chain = prompt | ChatAnthropic(model="claude-opus-4-5") | StrOutputParser()
fallback_chain = prompt | ChatAnthropic(model="claude-haiku-4-5") | StrOutputParser()
chain_with_fallback = primary_chain.with_fallbacks([fallback_chain])
# Retry on failure
chain_with_retry = chain.with_retry(
retry_if_exception_type=(RateLimitError, APITimeoutError),
wait_exponential_jitter=True,
stop_after_attempt=3,
)
Advanced Retriever Types
from langchain.retrievers import (
MultiQueryRetriever,
ContextualCompressionRetriever,
EnsembleRetriever,
)
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_community.retrievers import BM25Retriever
# Base vector store retriever
vector_retriever = vectorstore.as_retriever(
search_type="mmr", # Max Marginal Relevance: diversity + relevance
search_kwargs={"k": 10, "fetch_k": 30, "lambda_mult": 0.7},
)
# MultiQueryRetriever: generates multiple query variations to combat query phrasing issues
multi_query_retriever = MultiQueryRetriever.from_llm(
retriever=vector_retriever,
llm=ChatAnthropic(model="claude-haiku-4-5"),
# Generates 3 different phrasings of the question, retrieves for each, deduplicates
)
# ContextualCompressionRetriever: extract only relevant passages from retrieved docs
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=vector_retriever,
)
# EnsembleRetriever: hybrid BM25 (keyword) + vector (semantic) search
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = 5
ensemble_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, vector_retriever],
weights=[0.4, 0.6], # Weight toward semantic, but preserve keyword matching
)
# Usage pattern for RAG pipeline
def build_rag_chain(retriever, llm):
prompt = ChatPromptTemplate.from_template(
"Use the following context to answer the question.\n"
"Context: {context}\n"
"Question: {question}\n"
"Answer:"
)
def format_docs(docs):
return "\n\n---\n\n".join(
f"Source: {doc.metadata.get('source', 'unknown')}\n{doc.page_content}"
for doc in docs
)
return (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
Memory Management
from langchain.memory import ConversationSummaryMemory
from langchain_core.messages import HumanMessage, AIMessage
# AVOID: ConversationBufferMemory — grows unbounded, hits context limit
# from langchain.memory import ConversationBufferMemory # BAD for long conversations
# USE: ConversationSummaryMemory — compresses old turns
memory = ConversationSummaryMemory(
llm=ChatAnthropic(model="claude-haiku-4-5"),
return_messages=True,
max_token_limit=1000, # Summarize when history exceeds this
)
# Better for production: manage conversation history yourself
class ConversationManager:
def __init__(self, max_tokens=8000):
self.messages = []
self.max_tokens = max_tokens
def add_message(self, role: str, content: str):
self.messages.append({"role": role, "content": content})
self._trim_if_needed()
def _trim_if_needed(self):
"""Sliding window: remove oldest messages (keep system prompt)"""
# Rough token estimate: 4 chars ≈ 1 token
total_chars = sum(len(m["content"]) for m in self.messages)
while total_chars > self.max_tokens * 4 and len(self.messages) > 2:
removed = self.messages.pop(0)
total_chars -= len(removed["content"])
Structured Output with Pydantic
from langchain_core.output_parsers import PydanticOutputParser
from langchain_anthropic import ChatAnthropic
from pydantic import BaseModel, Field
from typing import List
class ExtractedEntity(BaseModel):
name: str = Field(description="Entity name")
type: str = Field(description="Entity type: person, organization, location, product")
confidence: float = Field(description="Confidence score 0-1", ge=0, le=1)
context: str = Field(description="Surrounding context where entity was found")
class ExtractionResult(BaseModel):
entities: List[ExtractedEntity]
summary: str = Field(description="One sentence summary of the text")
# Method 1: Pydantic parser (works with most models)
parser = PydanticOutputParser(pydantic_object=ExtractionResult)
prompt = ChatPromptTemplate.from_messages([
("system", "Extract entities from the text.\n{format_instructions}"),
("user", "{text}"),
]).partial(format_instructions=parser.get_format_instructions())
chain = prompt | llm | parser
# Method 2: with_structured_output (preferred — uses native tool calling)
structured_llm = ChatAnthropic(model="claude-sonnet-4-5").with_structured_output(ExtractionResult)
result: ExtractionResult = structured_llm.invoke("Anthropic was founded by Dario Amodei in San Francisco.")
LangSmith Tracing
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "ls_..."
os.environ["LANGCHAIN_PROJECT"] = "my-rag-app"
# Now all chain invocations are automatically traced
# View at: https://smith.langchain.com
# Manual evaluation
from langsmith import Client
from langsmith.evaluation import evaluate
client = Client()
def evaluate_rag_answer(inputs: dict, outputs: dict, reference: dict) -> dict:
"""Custom evaluator for RAG quality"""
question = inputs["question"]
generated = outputs["answer"]
expected = reference["answer"]
# LLM-as-judge
judge_prompt = f"""
Question: {question}
Expected: {expected}
Generated: {generated}
Rate the generated answer on a scale of 1-5 for accuracy.
Respond with just the number.
"""
score = int(llm.invoke(judge_prompt).content.strip())
return {"accuracy": score / 5, "passed": score >= 3}
results = evaluate(
lambda inputs: {"answer": chain.invoke(inputs["question"])},
data="my-rag-dataset", # Dataset name in LangSmith
evaluators=[evaluate_rag_answer],
experiment_prefix="rag-v2",
)
LangGraph: Stateful Agent Flows
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict, Annotated
import operator
# 1. Define state schema
class AgentState(TypedDict):
messages: Annotated[list, operator.add] # Append-only via operator.add
tool_calls: list
final_answer: str | None
# 2. Define nodes (pure functions that transform state)
def call_llm(state: AgentState) -> dict:
response = llm_with_tools.invoke(state["messages"])
return {"messages": [response]}
def run_tools(state: AgentState) -> dict:
last_message = state["messages"][-1]
results = []
for tool_call in last_message.tool_calls:
tool = tools_by_name[tool_call["name"]]
result = tool.invoke(tool_call["args"])
results.append(ToolMessage(content=str(result), tool_call_id=tool_call["id"]))
return {"messages": results}
def should_continue(state: AgentState) -> str:
"""Routing function — determines next node"""
last_message = state["messages"][-1]
if last_message.tool_calls:
return "run_tools"
return END
# 3. Build graph
workflow = StateGraph(AgentState)
workflow.add_node("call_llm", call_llm)
workflow.add_node("run_tools", run_tools)
workflow.set_entry_point("call_llm")
workflow.add_conditional_edges(
"call_llm",
should_continue, # Router function
{"run_tools": "run_tools", END: END},
)
workflow.add_edge("run_tools", "call_llm") # Always go back to LLM after tools
# 4. Compile with checkpointer (enables conversation history)
memory = MemorySaver()
agent = workflow.compile(checkpointer=memory)
# 5. Run with thread_id for persistent state
config = {"configurable": {"thread_id": "user-session-abc123"}}
result = agent.invoke(
{"messages": [HumanMessage(content="What's the weather in Tokyo?")]},
config=config,
)
Streaming
# Stream with LCEL
async def stream_response(question: str):
async for chunk in chain.astream({"question": question}):
yield chunk
# Stream with LangGraph (events)
async for event in agent.astream_events(
{"messages": [HumanMessage(content="Research AI trends")]},
version="v1",
):
kind = event["event"]
if kind == "on_chat_model_stream":
content = event["data"]["chunk"].content
if content:
print(content, end="", flush=True)
elif kind == "on_tool_start":
print(f"\n[Tool: {event['name']}]")
Custom Tools
from langchain_core.tools import BaseTool, tool
from pydantic import BaseModel
# Simple decorator style
@tool
def search_database(query: str) -> str:
"""Search the product database. Use for product lookups, inventory checks."""
results = db.search(query)
return json.dumps(results[:5])
# Structured input with schema
class DatabaseSearchInput(BaseModel):
query: str = Field(description="Search query")
limit: int = Field(default=5, description="Max results", ge=1, le=20)
category: str = Field(default="all", description="Product category filter")
@tool(args_schema=DatabaseSearchInput)
def search_database_structured(query: str, limit: int, category: str) -> str:
"""Search the product database with optional category filter."""
results = db.search(query, limit=limit, category=category)
return json.dumps(results)
# Class-based for complex tools with state
class DatabaseTool(BaseTool):
name: str = "database_search"
description: str = "Search the company database"
db_client: Any = Field(exclude=True) # Not serialized
def _run(self, query: str) -> str:
return self.db_client.search(query)
async def _arun(self, query: str) -> str:
return await self.db_client.asearch(query)
Anti-Patterns
❌ Using LangChain for simple single-turn completions
A 10-line OpenAI/Anthropic call doesn't need LangChain. The abstraction overhead is not free — it adds latency, debugging complexity, and version lock-in.
❌ Using ConversationBufferMemory in production
It grows without bound until you hit context limits. Always use windowed or summary memory, or manage conversation history explicitly.
❌ Putting business logic inside LangGraph nodes
Nodes should be thin wrappers that transform state. Business logic belongs in separate, testable functions called by nodes.
❌ Ignoring LangSmith during development
You cannot optimize what you cannot observe. Enable tracing from day 1 — debugging chain failures without traces is guesswork.
❌ Not handling streaming errors
Network errors mid-stream leave partial responses. Always wrap streaming in try/except and signal error to the UI.
Quick Reference
LCEL operator chaining:
A | B | C → Sequential (B receives A's output)
A.with_fallbacks([B]) → Try A, fallback to B on error
RunnableParallel(a=A, b=B) → Concurrent execution
Retriever choice:
Simple semantic search → vector_store.as_retriever()
Query phrasing sensitivity → MultiQueryRetriever
Keyword + semantic needed → EnsembleRetriever (BM25 + vector)
Long documents, need excerpts → ContextualCompressionRetriever
LangGraph patterns:
Linear pipeline → StateGraph with sequential edges
LLM + tools (ReAct) → Conditional edge on tool_calls present
Human-in-the-loop → interrupt_before=["human_review"] node
Multi-agent → Subgraph per agent, supervisor graph aboveSkill Information
- Source
- MoltbotDen
- Category
- AI & LLMs
- Repository
- View on GitHub
Related Skills
rag-architect
Design and implement production-grade Retrieval-Augmented Generation (RAG) systems. Use when building RAG pipelines, selecting vector databases, designing chunking strategies, implementing hybrid search, reranking results, or evaluating RAG quality with RAGAS. Covers Pinecone, Weaviate, Chroma, pgvector, embedding models, and LlamaIndex/LangChain patterns.
MoltbotDenllm-evaluation
Evaluate and improve LLM applications in production. Use when building LLM evaluation pipelines, measuring RAG quality, detecting hallucinations, benchmarking models, implementing LLMOps monitoring, selecting evaluation frameworks (RAGAS, Promptfoo, Langsmith, Braintrust), or designing human feedback loops. Covers evals-as-code, metric design, and continuous quality measurement.
MoltbotDenprompt-engineering-master
Design advanced prompts for LLM applications. Use when building complex AI workflows, implementing chain-of-thought reasoning, creating multi-step agents, designing system prompts, implementing structured outputs, reducing hallucination, or optimizing prompt performance. Covers CoT, ReAct, Constitutional AI, few-shot design, meta-prompting, and production prompt management.
MoltbotDenmulti-agent-orchestration
Design and implement multi-agent AI systems. Use when building agent networks, implementing orchestrator-worker patterns, designing agent communication protocols, managing shared memory between agents, implementing task decomposition, handling agent failures, or building agentic pipelines. Covers LangGraph, CrewAI, AutoGen, custom orchestration, and A2A protocol patterns.
MoltbotDenclaude-api-expert
Expert-level Anthropic Claude API usage: Messages API structure, model selection (Haiku vs Sonnet vs Opus), tool use with parallel calls, extended thinking, vision, streaming with content block events, prompt caching with cache_control, context window management, and
MoltbotDen