Notes & PKMDocumentedScanned

voice-note-to-midi

Convert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection.

Share:

Installation

npx clawhub@latest install voice-note-to-midi

View the full skill documentation and source below.

Documentation

🎡 Voice Note to MIDI

Transform your voice memos, humming, and melodic recordings into clean, quantized MIDI files ready for your DAW.

What It Does

This skill provides a complete audio-to-MIDI conversion pipeline that:

  • Stem Separation - Uses HPSS (Harmonic-Percussive Source Separation) to isolate melodic content from drums, noise, and background sounds

  • ML-Powered Pitch Detection - Leverages Spotify's Basic Pitch model for accurate fundamental frequency extraction

  • Key Detection - Automatically detects the musical key of your recording using Krumhansl-Kessler key profiles

  • Intelligent Quantization - Snaps notes to a configurable timing grid with optional key-aware pitch correction

  • Post-Processing - Applies octave pruning, overlap-based harmonic removal, and legato note merging for clean output
  • Pipeline Architecture

    Audio Input (WAV/M4A/MP3)
        ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Step 1: Stem Separation (HPSS)     β”‚
    β”‚ - Isolate harmonic content          β”‚
    β”‚ - Remove drums/percussion           β”‚
    β”‚ - Noise gating                      β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Step 2: Pitch Detection             β”‚
    β”‚ - Basic Pitch ML model (Spotify)    β”‚
    β”‚ - Polyphonic note detection         β”‚
    β”‚ - Onset/offset estimation           β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Step 3: Analysis                    β”‚
    β”‚ - Pitch class distribution          β”‚
    β”‚ - Key detection                     β”‚
    β”‚ - Dominant note identification      β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Step 4: Quantization & Cleanup      β”‚
    β”‚ - Timing grid snap                  β”‚
    β”‚ - Key-aware pitch correction        β”‚
    β”‚ - Octave pruning (harmonic removal) β”‚
    β”‚ - Overlap-based pruning             β”‚
    β”‚ - Note merging (legato)             β”‚
    β”‚ - Velocity normalization            β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        ↓
    MIDI Output (Standard MIDI File)

    Setup

    Prerequisites

    • Python 3.11+ (Python 3.14+ recommended)
    • FFmpeg (for audio format support)
    • pip

    Installation

    Quick Install (Recommended):

    cd /path/to/voice-note-to-midi
    ./setup.sh

    This automated script will:

    • Check Python 3.11+ is installed

    • Create the ~/melody-pipeline directory

    • Set up the virtual environment

    • Install all dependencies (basic-pitch, librosa, music21, etc.)

    • Download and configure the hum2midi script

    • Add melody-pipeline to your PATH


    Manual Install:

    If you prefer manual setup:

    mkdir -p ~/melody-pipeline
    cd ~/melody-pipeline
    python3 -m venv venv-bp
    source venv-bp/bin/activate
    pip install basic-pitch librosa soundfile mido music21
    chmod +x ~/melody-pipeline/hum2midi

  • Add to your PATH (optional):
  • echo 'export PATH="$HOME/melody-pipeline:$PATH"' >> ~/.bashrc
    source ~/.bashrc

    Verify Installation

    cd ~/melody-pipeline
    ./hum2midi --help

    Usage

    Basic Usage

    Convert a voice memo to MIDI:

    ./hum2midi my_humming.wav

    This creates my_humming.mid with 16th-note quantization.

    Specify Output File

    ./hum2midi input.wav output.mid

    Command-Line Options

    OptionDescriptionDefault
    --grid Quantization grid: 1/4, 1/8, 1/16, 1/321/16
    --min-note Minimum note duration in milliseconds50
    --no-quantizeSkip quantization (output raw Basic Pitch MIDI)disabled
    --key-awareEnable key-aware pitch correctiondisabled
    --no-analysisSkip pitch analysis and key detectiondisabled

    Usage Examples

    Quantize to eighth notes

    ./hum2midi melody.wav --grid 1/8

    Key-aware quantization (recommended for tonal music)

    ./hum2midi song.wav --key-aware

    Require longer minimum notes

    ./hum2midi humming.wav --min-note 100

    Skip analysis for faster processing

    ./hum2midi quick.wav --no-analysis

    Combine options

    ./hum2midi recording.wav output.mid --grid 1/8 --key-aware --min-note 80

    Processing MIDI Input

    You can also process existing MIDI files through the quantization pipeline:

    ./hum2midi input.mid output.mid --grid 1/16 --key-aware

    This skips the audio processing steps and goes directly to analysis and quantization.

    Sample Output

    ═══════════════════════════════════════════════════════════════
      hum2midi - Melody-to-MIDI Pipeline (Basic Pitch Edition)
      [Key-Aware Mode Enabled]
    ═══════════════════════════════════════════════════════════════
    
    Input:  my_humming.wav
    Output: my_humming.mid
    
    β†’ Step 1: Stem Separation (HPSS)
      Isolating melodic content...
      Loaded: 5.23s @ 44100Hz
      βœ“ Melody stem extracted β†’ 5.23s
    
    β†’ Step 2: Audio-to-MIDI Conversion (Basic Pitch)
      Running Spotify's Basic Pitch ML model on melody stem...
      βœ“ Raw MIDI generated (Basic Pitch)
    
    β†’ Step 3: Pitch Analysis & Key Detection
      Notes detected: 42 total, 7 unique
      Note range: C3 - G4
      Pitch classes: C3, E3, G3, A3, C4, D4, G4
      Dominant note: G3 (23.8% of notes)
      Detected key: G major
    
    β†’ Step 4: Quantization & Cleanup
      Octave pruning: removed 3 harmonic notes above 67 (median+12)
      Overlap pruning: removed 2 harmonic notes at overlapping positions
      Note merging: merged 5 staccato chunks into legato notes (gap<=60 ticks)
      Grid:   240 ticks (1/16)
      Notes:  38 notes
      Key:    G major
      Key-aware: 2 notes corrected to scale
      Tempo:  120 BPM
      βœ“ Quantized MIDI saved
    
    ═══════════════════════════════════════════════════════════════
      βœ“ Done! Output: my_humming.mid
    ═══════════════════════════════════════════════════════════════
    
    πŸ“Š ANALYSIS SUMMARY
    ─────────────────────────────────────────────────────────────
      Detected Notes: C3, E3, G3, A3, C4, D4, G4
      Detected Key:   G major
      Quantization:   Key-aware mode (notes snapped to scale)
    
    MIDI Info: 38 notes, 7 unique pitches, 120 BPM
    Pitches: C3, E3, G3, A3, C4, D4, G4

    Notes & Limitations

    Audio Quality Matters

    • Clear, loud melody produces the best results
    • Background noise can cause false note detection
    • Reverb and effects may confuse pitch detection
    • Close-mic'd vocals work significantly better than room recordings

    Musical Considerations

    • Monophonic sources work best (single melody line)
    • Polyphonic audio (chords, multiple instruments) will produce messy results
    • Vibrato and pitch bends may be quantized to stepped pitches
    • Rapid note passages may be missed or merged

    Technical Limitations

    • Tempo is fixed at 120 BPM in output (time positions are preserved, but tempo may need adjustment in your DAW)
    • Note velocities are normalized but may need manual adjustment
    • Very short notes (<50ms) may be filtered out by default
    • Extreme pitch ranges may cause octave detection issues

    Post-Processing Recommendations

    After generating MIDI, you may want to:

  • Import into your DAW and adjust tempo to match your original recording

  • Quantize further if stricter timing is needed

  • Adjust note velocities for dynamics

  • Apply swing/groove templates if the rigid grid sounds too mechanical

  • Edit individual notes that were misdetected (common with fast runs)
  • Supported Audio Formats

    Input formats supported via FFmpeg:

    • WAV, AIFF, FLAC (uncompressed, best quality)

    • MP3, M4A, AAC (compressed, acceptable)

    • OGG, OPUS (open source formats)

    • Most other formats FFmpeg supports


    Troubleshooting

    No notes detected

    • Check that input file isn't silent or corrupted
    • Try increasing --min-note threshold
    • Verify audio has clear melodic content (not just noise)

    Too many notes / messy output

    • Enable octave pruning and overlap pruning (on by default)
    • Use --key-aware to constrain to musical scale
    • Check for background noise in source audio

    Wrong key detected

    • Key detection works best with at least 8-10 measures of music
    • Chromatic passages may confuse the detector
    • Manually review and adjust in your DAW if needed

    Notes in wrong octave

    • Basic Pitch sometimes detects harmonics instead of fundamentals
    • The pipeline includes pruning, but some may slip through
    • Use your DAW's transpose function for simple octave shifts

    References

    • [Basic Pitch]() - Spotify's polyphonic pitch detection model
    • [librosa HPSS]() - Harmonic-Percussive Source Separation
    • [Krumhansl-Kessler Key Profiles]() - Key detection algorithm