voice-note-to-midi
Convert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection.
Installation
npx clawhub@latest install voice-note-to-midiView the full skill documentation and source below.
Documentation
π΅ Voice Note to MIDI
Transform your voice memos, humming, and melodic recordings into clean, quantized MIDI files ready for your DAW.
What It Does
This skill provides a complete audio-to-MIDI conversion pipeline that:
Pipeline Architecture
Audio Input (WAV/M4A/MP3)
β
βββββββββββββββββββββββββββββββββββββββ
β Step 1: Stem Separation (HPSS) β
β - Isolate harmonic content β
β - Remove drums/percussion β
β - Noise gating β
βββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββ
β Step 2: Pitch Detection β
β - Basic Pitch ML model (Spotify) β
β - Polyphonic note detection β
β - Onset/offset estimation β
βββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββ
β Step 3: Analysis β
β - Pitch class distribution β
β - Key detection β
β - Dominant note identification β
βββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββ
β Step 4: Quantization & Cleanup β
β - Timing grid snap β
β - Key-aware pitch correction β
β - Octave pruning (harmonic removal) β
β - Overlap-based pruning β
β - Note merging (legato) β
β - Velocity normalization β
βββββββββββββββββββββββββββββββββββββββ
β
MIDI Output (Standard MIDI File)
Setup
Prerequisites
- Python 3.11+ (Python 3.14+ recommended)
- FFmpeg (for audio format support)
- pip
Installation
Quick Install (Recommended):
cd /path/to/voice-note-to-midi
./setup.sh
This automated script will:
- Check Python 3.11+ is installed
- Create the
~/melody-pipelinedirectory - Set up the virtual environment
- Install all dependencies (basic-pitch, librosa, music21, etc.)
- Download and configure the hum2midi script
- Add melody-pipeline to your PATH
Manual Install:
If you prefer manual setup:
mkdir -p ~/melody-pipeline
cd ~/melody-pipeline
python3 -m venv venv-bp
source venv-bp/bin/activate
pip install basic-pitch librosa soundfile mido music21
chmod +x ~/melody-pipeline/hum2midi
echo 'export PATH="$HOME/melody-pipeline:$PATH"' >> ~/.bashrc
source ~/.bashrc
Verify Installation
cd ~/melody-pipeline
./hum2midi --help
Usage
Basic Usage
Convert a voice memo to MIDI:
./hum2midi my_humming.wav
This creates my_humming.mid with 16th-note quantization.
Specify Output File
./hum2midi input.wav output.mid
Command-Line Options
| Option | Description | Default |
--grid | Quantization grid: 1/4, 1/8, 1/16, 1/32 | 1/16 |
--min-note | Minimum note duration in milliseconds | 50 |
--no-quantize | Skip quantization (output raw Basic Pitch MIDI) | disabled |
--key-aware | Enable key-aware pitch correction | disabled |
--no-analysis | Skip pitch analysis and key detection | disabled |
Usage Examples
Quantize to eighth notes
./hum2midi melody.wav --grid 1/8
Key-aware quantization (recommended for tonal music)
./hum2midi song.wav --key-aware
Require longer minimum notes
./hum2midi humming.wav --min-note 100
Skip analysis for faster processing
./hum2midi quick.wav --no-analysis
Combine options
./hum2midi recording.wav output.mid --grid 1/8 --key-aware --min-note 80
Processing MIDI Input
You can also process existing MIDI files through the quantization pipeline:
./hum2midi input.mid output.mid --grid 1/16 --key-aware
This skips the audio processing steps and goes directly to analysis and quantization.
Sample Output
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
hum2midi - Melody-to-MIDI Pipeline (Basic Pitch Edition)
[Key-Aware Mode Enabled]
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Input: my_humming.wav
Output: my_humming.mid
β Step 1: Stem Separation (HPSS)
Isolating melodic content...
Loaded: 5.23s @ 44100Hz
β Melody stem extracted β 5.23s
β Step 2: Audio-to-MIDI Conversion (Basic Pitch)
Running Spotify's Basic Pitch ML model on melody stem...
β Raw MIDI generated (Basic Pitch)
β Step 3: Pitch Analysis & Key Detection
Notes detected: 42 total, 7 unique
Note range: C3 - G4
Pitch classes: C3, E3, G3, A3, C4, D4, G4
Dominant note: G3 (23.8% of notes)
Detected key: G major
β Step 4: Quantization & Cleanup
Octave pruning: removed 3 harmonic notes above 67 (median+12)
Overlap pruning: removed 2 harmonic notes at overlapping positions
Note merging: merged 5 staccato chunks into legato notes (gap<=60 ticks)
Grid: 240 ticks (1/16)
Notes: 38 notes
Key: G major
Key-aware: 2 notes corrected to scale
Tempo: 120 BPM
β Quantized MIDI saved
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Done! Output: my_humming.mid
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π ANALYSIS SUMMARY
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Detected Notes: C3, E3, G3, A3, C4, D4, G4
Detected Key: G major
Quantization: Key-aware mode (notes snapped to scale)
MIDI Info: 38 notes, 7 unique pitches, 120 BPM
Pitches: C3, E3, G3, A3, C4, D4, G4
Notes & Limitations
Audio Quality Matters
- Clear, loud melody produces the best results
- Background noise can cause false note detection
- Reverb and effects may confuse pitch detection
- Close-mic'd vocals work significantly better than room recordings
Musical Considerations
- Monophonic sources work best (single melody line)
- Polyphonic audio (chords, multiple instruments) will produce messy results
- Vibrato and pitch bends may be quantized to stepped pitches
- Rapid note passages may be missed or merged
Technical Limitations
- Tempo is fixed at 120 BPM in output (time positions are preserved, but tempo may need adjustment in your DAW)
- Note velocities are normalized but may need manual adjustment
- Very short notes (<50ms) may be filtered out by default
- Extreme pitch ranges may cause octave detection issues
Post-Processing Recommendations
After generating MIDI, you may want to:
Supported Audio Formats
Input formats supported via FFmpeg:
- WAV, AIFF, FLAC (uncompressed, best quality)
- MP3, M4A, AAC (compressed, acceptable)
- OGG, OPUS (open source formats)
- Most other formats FFmpeg supports
Troubleshooting
No notes detected
- Check that input file isn't silent or corrupted
- Try increasing
--min-notethreshold - Verify audio has clear melodic content (not just noise)
Too many notes / messy output
- Enable octave pruning and overlap pruning (on by default)
- Use
--key-awareto constrain to musical scale - Check for background noise in source audio
Wrong key detected
- Key detection works best with at least 8-10 measures of music
- Chromatic passages may confuse the detector
- Manually review and adjust in your DAW if needed
Notes in wrong octave
- Basic Pitch sometimes detects harmonics instead of fundamentals
- The pipeline includes pruning, but some may slip through
- Use your DAW's transpose function for simple octave shifts
References
- [Basic Pitch]() - Spotify's polyphonic pitch detection model
- [librosa HPSS]() - Harmonic-Percussive Source Separation
- [Krumhansl-Kessler Key Profiles]() - Key detection algorithm