Skip to main content
Product ManagementDocumented

product-analytics

Data-driven product decision making with expert-level analytics methodology. Covers north star metrics, funnel and cohort analysis, A/B testing, event taxonomy design, attribution modeling, session recording patterns, and the balance between data-inf

Share:

Installation

npx clawhub@latest install product-analytics

View the full skill documentation and source below.

Documentation

Product Analytics

Analytics without a measurement strategy is just expensive data hoarding. Most product teams track dozens of metrics but can't tell you what their north star is or whether it moved last quarter. This skill covers how to instrument products correctly, analyze data rigorously, and — critically — know when to override the data with judgment.

Core Mental Model

Metrics are proxies for value, not value itself. Optimizing a metric blindly causes Goodhart's Law failures: "When a measure becomes a target, it ceases to be a good measure." Daily active users can be gamed with dark patterns. Conversion rates can be inflated with low-quality sign-ups. The key is building a metric tree where improving each leaf metric reflects genuine value creation for users.

Layer your metrics:

  • North Star Metric — single metric most correlated with long-term value delivery

  • Leading Indicators — metrics that predict north star movement (can act on now)

  • Guardrail Metrics — metrics you cannot degrade while chasing the north star

  • Diagnostic Metrics — help you understand why north star moved
  • North Star Metric

    The north star is the ONE metric that best captures the value your product delivers to customers at scale.

    Characteristics of a Good North Star

    • Reflects customer value received, not activity
    • Predictive of long-term revenue
    • Lagging enough to matter, leading enough to act on
    • Understandable by the whole company
    • One number (not a composite)
    Company          North Star
    ---------------------------------
    Airbnb           Nights booked
    Spotify          Time spent listening
    Slack            Messages sent within a workspace
    Facebook         Daily Active Users (their original, now controversial)
    Stripe           Total payment volume
    Duolingo         Daily active learners
    HubSpot          Weekly active teams using ≥3 features
    
    Anti-patterns:
    Revenue          (lag, can obscure user value loss before churn)
    Pageviews        (activity, not value)
    Sign-ups         (output, not engagement)
    App installs     (output)

    North Star Metric Framework

    Step 1: List the value moments in your product
            "User gets value when they: send first message / complete a project / 
             receive first payment / reach their goal"
    
    Step 2: Find the metric that best proxies that moment at scale
            "Weekly projects completed" captures the value moment
    
    Step 3: Stress-test it
            - Can it be gamed without delivering real value? 
            - Does it degrade if we compromise quality?
            - Does it grow when our best customers engage more?
    
    Step 4: Build the metric tree under it
            North Star: "Weekly projects completed"
            └── Activation: % who complete first project in 7 days
            └── Engagement: Projects/user/week
            └── Retention: % users active week-over-week
            └── Expansion: Teams inviting 2+ members

    Funnel Analysis

    Funnels show conversion rates between sequential steps. Use them to find where users drop off and prioritize optimization.

    Funnel Construction

    -- Example: Signup funnel (Mixpanel/Amplitude SQL equivalent)
    SELECT
      step,
      COUNT(DISTINCT user_id) as users,
      COUNT(DISTINCT user_id) * 100.0 / MAX(COUNT(DISTINCT user_id)) OVER() as pct_of_top
    FROM (
      SELECT user_id, 'visited_landing' as step, created_at FROM page_views WHERE path = '/'
      UNION ALL
      SELECT user_id, 'clicked_signup', created_at FROM events WHERE event = 'signup_clicked'
      UNION ALL
      SELECT user_id, 'completed_signup', created_at FROM events WHERE event = 'signup_completed'
      UNION ALL
      SELECT user_id, 'completed_onboarding', created_at FROM events WHERE event = 'onboarding_finished'
    )
    GROUP BY step

    Interpreting Drop-Off

    Funnel:
    Visited landing:        100,000  (100%)
    Clicked sign-up:         22,000  (22%) ← 78% drop here
    Completed sign-up:       15,000  (68% of prev, 15% total)
    Completed onboarding:     6,000  (40% of prev, 6% total)
    Activated (week 1):       3,200  (53% of prev, 3.2% total)
    
    Analysis:
    - Biggest absolute drop: landing → click (78K users lost)
      → A/B test headline, CTA, value prop
    - Biggest % drop: sign-up → onboarding (40% lost)
      → Investigate: too many steps? Email verification blocking?
    - Highest leverage: improving landing→click by 5pp adds ~5K sign-ups

    Statistical Significance for Funnel Changes

    Before declaring a funnel improvement: did the change cause it?
    from scipy import stats
    
    # Chi-squared test for conversion rate changes
    control_conversions = 150
    control_visitors = 1000
    test_conversions = 175
    test_visitors = 1000
    
    chi2, p_value = stats.chi2_contingency([
        [control_conversions, control_visitors - control_conversions],
        [test_conversions, test_visitors - test_conversions]
    ])[:2]
    
    print(f"p-value: {p_value:.4f}")  # < 0.05 = statistically significant

    Cohort Analysis

    Cohorts group users by when they first performed an action. Essential for understanding retention and the impact of product changes on different user groups.

    Retention Cohort Table

    Sign-up cohort   Day 1  Day 7  Day 14  Day 30  Day 60  Day 90
    Jan 2025         45%    28%    22%     18%     15%     14%  ← flattening
    Feb 2025         48%    30%    24%     19%     16%     -
    Mar 2025         52%    35%    28%     22%     -       -
    Apr 2025         55%    38%    30%     -       -       -
    
    Reading: Jan's 14% D90 retention means 14% of Jan signups 
             were still active 90 days later.
    
    Sign of PMF: Retention flattens (stops declining) — means 
                 you have a core audience that keeps coming back.
    Sign of trouble: Retention approaches 0% — bleeding all users.

    Cohort Analysis for Feature Impact

    Scenario: Did the new onboarding (shipped March 15) improve retention?
    
    Pre-onboarding cohorts (Jan-Mar): D30 retention = 18% avg
    Post-onboarding cohorts (Apr-Jun): D30 retention = 24% avg
    
    Is this the feature? Check:
    1. Did other things change? (marketing channel, seasonality)
    2. Are cohort sizes similar? (mix shift can distort)
    3. Is the difference statistically significant? (run t-test on user-level data)
    4. Is the new cohort large enough? (wait 30 days for D30 data)

    A/B Testing Fundamentals

    Before You Test — Sample Size

    Calculate required sample size BEFORE running the experiment. Running until it "looks significant" inflates false positive rates.

    import math
    
    def required_sample_size(baseline_rate, min_detectable_effect, power=0.8, significance=0.05):
        """
        baseline_rate:        current conversion rate (e.g., 0.05 for 5%)
        min_detectable_effect: smallest relative change worth detecting (e.g., 0.10 for 10%)
        power:                probability of detecting a real effect (0.8 = 80%)
        significance:         acceptable false positive rate (0.05 = 5%)
        """
        p1 = baseline_rate
        p2 = baseline_rate * (1 + min_detectable_effect)
        
        z_alpha = 1.96  # for 95% confidence (two-tailed)
        z_beta  = 0.842 # for 80% power
        
        n = (z_alpha * math.sqrt(2 * p1 * (1-p1)) + z_beta * math.sqrt(p1*(1-p1) + p2*(1-p2))) ** 2 / (p2 - p1) ** 2
        return math.ceil(n)
    
    # Example: 5% baseline, want to detect 10% lift
    n = required_sample_size(0.05, 0.10)
    print(f"Required per variant: {n:,}")  # ~14,000 per variant
    # Total duration = (n × 2) / daily_traffic

    Interpreting Results

    p-value: probability of seeing this result if null hypothesis is true
             p < 0.05 → reject null (likely real effect)
             p > 0.05 → don't reject null (effect may not be real)
    
    Confidence interval: [lower, upper] — if it doesn't cross 0, the effect is real
    
    Practical significance: Is the lift large enough to matter?
      Statistically significant 0.1% lift on checkout: not worth shipping complexity
      Statistically significant 5% lift on checkout: ship immediately
    
    Novelty effect: New features often show inflated early results.
      Run tests for at least 2 full business cycles (2 weeks minimum).
      Segment: "new users during test" vs "existing users during test"
      — existing users show the novelty effect; new users show steady state.

    A/B Test Decision Framework

    Result           Action
    -----------      --------
    Significant positive  →  Ship (verify guardrails didn't degrade)
    Significant negative  →  Drop + analyze why
    Inconclusive          →  Assess: extend runtime? Increase sample? Simplify hypothesis?
    "Directionally positive" → Be skeptical. Either extend or run a bigger bet.
    
    Multi-armed bandit: Use for content/messaging experiments where you 
                        want to exploit winning variants quickly.
                        Use classic A/B for feature experiments where you need 
                        clean before/after attribution.

    Event Taxonomy Design

    Well-designed events are the foundation of all analytics. Bad taxonomy = unmaintainable, uninterpretable data.

    Noun-Verb Convention

    Format: {object}_{action}  (snake_case, past tense)
    
    ✅ Good events:
    user_signed_up
    project_created
    payment_completed
    team_member_invited
    feature_flag_enabled
    file_exported
    search_performed
    onboarding_step_completed
    
    ❌ Bad events:
    button_clicked          (what button? what did it do?)
    page_viewed             (use consistent noun: dashboard_viewed)
    action_performed        (meaningless)
    UserSignedUp            (wrong casing convention)
    sign up complete        (spaces, ambiguous)

    Event Properties (the real value is in properties)

    // user_signed_up event with rich properties
    analytics.track('user_signed_up', {
      // Identity
      user_id: 'usr_abc123',
      email_domain: 'acme.com',    // not full email — privacy
      
      // Acquisition
      signup_source: 'organic',    // 'organic' | 'paid' | 'referral' | 'direct'
      utm_campaign: 'spring-sale',
      utm_medium: 'email',
      referrer_user_id: 'usr_xyz', // if invited
      
      // Context
      signup_method: 'google_oauth', // 'email_password' | 'google_oauth' | 'github'
      plan_selected: 'pro',
      
      // Experiment
      experiment_variant: 'onboarding_v2', // which A/B variant they saw
      
      // Timing
      time_from_landing_to_signup_seconds: 142,
    })

    Event Taxonomy Schema (Amplitude/Mixpanel)

    Object Type → Action → Properties
    ------------------------------------
    User        → signed_up, logged_in, profile_updated, deleted_account
    Project     → created, renamed, archived, deleted, shared, exported
    Content     → created, published, edited, deleted, viewed, liked, shared
    Payment     → initiated, completed, failed, refunded, subscription_created
    Team        → created, member_invited, member_removed, role_changed
    Feature     → enabled, disabled, usage (with feature_name property)
    Error       → api_error, validation_error (with error_code, error_message)

    Amplitude/Mixpanel Implementation Patterns

    // Mixpanel: identify user with traits
    mixpanel.identify(user.id)
    mixpanel.people.set({
      '$email':    user.email,
      '$name':     user.name,
      'plan':      user.plan,
      'company':   user.company,
      'created_at': user.createdAt,
    })
    
    // Amplitude: set user properties
    amplitude.setUserId(user.id)
    const identify = new amplitude.Identify()
      .set('plan', user.plan)
      .set('company_size', user.companySize)
      .setOnce('signup_date', user.createdAt)  // setOnce prevents overwrites
    amplitude.identify(identify)
    
    // Group analytics (org-level metrics)
    mixpanel.set_group('company', user.companyId)
    mixpanel.get_group('company', user.companyId).set({
      'plan': org.plan,
      'seat_count': org.seats,
    })

    Attribution Modeling

    How do you credit conversions across multiple touchpoints?

    Customer journey:
    Day 1: Saw Twitter ad          → Ad spend: $0.50
    Day 3: Read blog post (organic)
    Day 7: Clicked Google Search ad → Ad spend: $2.00
    Day 8: Opened welcome email
    Day 10: Converted (paid $99)
    
    Attribution models:
    First-touch: Twitter ad gets 100% credit ($99)
    Last-touch:  Google Search ad gets 100% credit ($99) ← default in GA4
    Linear:      Each touchpoint gets $24.75 (4 touchpoints)
    Time-decay:  More credit to touchpoints closer to conversion
      Google Search: ~40%, Email: ~30%, Blog: ~20%, Twitter: ~10%
    Data-driven:  ML model based on historical patterns (requires lots of data)

    What to use when:

    • First-touch: Understanding top-of-funnel channel value

    • Last-touch: SEM/paid campaigns where click → convert is the model

    • Linear/Time-decay: Content marketing attribution

    • Data-driven: Large companies with enough events for ML models


    Session Recording Analysis

    Tools: Hotjar, FullStory, Microsoft Clarity (free)

    Patterns to Look For

    Rage clicks:     User clicks same area repeatedly → something isn't working
    Dead clicks:     Clicking non-interactive elements → perceived affordance mismatch
    Scroll depth:    Where do users stop reading? → CTA placement optimization
    U-turns:         Back-and-forth between two pages → navigation confusion
    Form abandonment: Which field causes drop-off? → form friction analysis
    
    Hotjar heatmap reading:
      Dark red spots → most attention/interaction
      Cold blue spots → ignored content
      
      Common findings:
      - Hero image gets more clicks than CTA
      - Users try to click non-link text
      - Footer links have surprising engagement

    Data-Informed vs Data-Driven

    The most important distinction in analytics philosophy.

    Data-driven: The data makes the decision. The metrics determine the action. No override.

    Data-informed: Data is a critical input, but judgment, strategy, and ethics also inform the decision.

    When to trust the data over judgment:
    - A/B test with sufficient power and clear result
    - Funnel drop-off is obvious and unambiguous
    - Retention cohort shows clear inflection from product change
    
    When to override data with judgment:
    - Metric being optimized conflicts with long-term user trust
      ("notification click rates are up" — but we're burning goodwill)
    - Small n — your sample is too small to be directional
    - Survivorship bias — data only reflects users who stayed
    - The strategy requires investment before metrics improve
      (new market expansion looks "bad" in data before it gets good)
    - Ethical concerns about a tactic that "works" in the data

    Anti-Patterns

    Vanity metrics — pageviews, downloads, Twitter followers. They move easily, don't predict revenue, and create a false sense of progress.

    Peeking at A/B tests — checking results before hitting your required sample size inflates false positives dramatically (up to 3x more false positives).

    One-size-fits-all metrics — different user segments should have segment-specific KPIs. Power users and casual users have different value patterns.

    Event names that changesignup_complete becomes registration_finished in v2. Now you can't compare cohorts. Lock naming conventions and enforce them.

    Tracking everything — 500 events with no ownership creates a graveyard. Each event should answer a specific question. Delete events not used in 6 months.

    Correlation is causation — "Sign-ups increased the week we shipped X" is not evidence X caused it. You need controlled experiments.

    Quick Reference

    A/B Test Checklist

    • Hypothesis written (If X, then Y, because Z)
    • Primary metric defined before launch
    • Guardrail metrics defined (won't degrade)
    • Sample size calculated (not based on time, based on events)
    • Test runs minimum 2 full business cycles
    • Segment analysis planned (new vs existing users)
    • Ship/no-ship threshold defined upfront

    North Star Metric Check

    • Reflects value delivered to customers (not activity)
    • Predictive of long-term revenue
    • Can be decomposed into leading indicators
    • Can't be easily gamed without delivering real value
    • Whole team understands what it means and how to move it

    Event Taxonomy Rules

    Format:      {noun}_{verb}  (past tense, snake_case)
    Properties:  Always include user_id, timestamp (auto), experiment_variant
    Avoid:       PII (email, phone), button/UI names (use semantic names)
    Test in:     Dev environment before production
    Review:      Analytics review every 6 months — delete unused events