Web Audio API · Drum Synthesis
Every drum in your sample library is a mathematical operation on air pressure. This is what's actually inside those WAV files — and how to build it from nothing, in the browser, right now.
I've been producing music since I was 14. For the first several years, a drum sound was a file. You dragged a kick.wav into your DAW, you lined it up on the grid, you hit play. The sound came out. You didn't ask why. You just used it — the same way most people use electricity without knowing what an electron is. I want to break that habit, because once you understand what a kick drum actually is at a physical level, you gain an entirely different kind of control over it.
A kick drum is a membrane — stretched leather or plastic — being hit. When it's struck, it vibrates at a fundamental frequency that decays rapidly. The pitch starts higher at the moment of impact and drops fast, following an exponential curve. That's it. That's the whole kick drum. In the Web Audio API, that's an OscillatorNode whose frequency starts at around 150Hz and ramps to 50Hz in about 60 milliseconds, with a GainNode that decays from full volume to silence over the same window. A snare drum is that same tonal body, combined with the rattle of metal wires against the bottom head — which is noise. Filtered white noise. A hi-hat is a thin metal disc with no clear pitch whatsoever — which means it's essentially several detuned oscillators running simultaneously, pushing through a high-pass filter. A clap is not one hit — it's several noise bursts triggered 5 to 15 milliseconds apart, simulating the staggered timing of multiple hands clapping at once.
Five chapters. Four drum sounds built from zero. One working sequencer at the end. Every concept from the first blog applies here — and by the end, you'll understand why your favourite kick sounds like it does at a level no sample pack tutorial ever taught you.
A kick drum is deceptively simple. A membrane is hit. It vibrates — loudly at first, then quickly dying away. The pitch starts high at the moment of impact and drops fast as the membrane tension relaxes. In synthesis terms, that's two things: a gain envelope (loud → silence) and a frequency envelope (high pitch → low pitch), both triggered simultaneously.
The body of the kick lives in the 50–80Hz range — that's the thump you feel in your chest. The initial click comes from the transient: a brief burst of higher frequencies right at the moment of impact. A harder transient gives punch. A softer one gives weight. This is why different kicks feel different even at the same volume.
function kick(ctx, time, { pitch = 150, body = 50, decay = 0.5, click = 0.8 } = {}) { // ─ Tonal body: oscillator with pitch drop const osc = ctx.createOscillator(); const gain = ctx.createGain(); osc.type = 'sine'; // Frequency envelope: pitch → body over 60ms osc.frequency.setValueAtTime(pitch, time); osc.frequency.exponentialRampToValueAtTime(body, time + 0.06); // Gain envelope: sharp attack, exponential decay gain.gain.setValueAtTime(1, time); gain.gain.exponentialRampToValueAtTime(0.001, time + decay); osc.connect(gain); gain.connect(ctx.destination); osc.start(time); osc.stop(time + decay); // ─ Click transient: short noise burst const buf = ctx.createBuffer(1, ctx.sampleRate * 0.02, ctx.sampleRate); const data = buf.getChannelData(0); for (let i = 0; i < data.length; i++) data[i] = (Math.random() * 2 - 1); const noise = ctx.createBufferSource(); const nGain = ctx.createGain(); noise.buffer = buf; nGain.gain.setValueAtTime(click, time); nGain.gain.exponentialRampToValueAtTime(0.001, time + 0.02); noise.connect(nGain); nGain.connect(ctx.destination); noise.start(time); noise.stop(time + 0.02); }
A snare is more complex than a kick because it's two things happening simultaneously. There's a tonal body — the head resonating, like a small tom-tom. And there's the snare wire: a coil of metal stretched across the bottom head that buzzes and rattles when the drum is hit. Remove the snare wire and you have a tom. Add it back and you have a snare. That rattling buzz is filtered white noise.
In synthesis: path one is a sine wave at around 200Hz with a very fast decay (the body). Path two is white noise through a highpass filter around 2kHz with a slightly longer decay (the snare buzz). Mix both together, apply an overall amplitude envelope, and you have a snare.
function snare(ctx, time, { tone = 200, snap = 0.15, buzz = 0.25, mix = 0.6 } = {}) { const master = ctx.createGain(); master.connect(ctx.destination); // ─ Path 1: Tonal body const osc = ctx.createOscillator(); const oGain = ctx.createGain(); osc.type = 'sine'; osc.frequency.value = tone; oGain.gain.setValueAtTime(1 - mix, time); oGain.gain.exponentialRampToValueAtTime(0.001, time + snap); osc.connect(oGain); oGain.connect(master); osc.start(time); osc.stop(time + snap); // ─ Path 2: Noise (snare wire buzz) const buf = ctx.createBuffer(1, ctx.sampleRate * buzz, ctx.sampleRate); const data = buf.getChannelData(0); for (let i = 0; i < data.length; i++) data[i] = Math.random() * 2 - 1; const noise = ctx.createBufferSource(); const hp = ctx.createBiquadFilter(); const nGain = ctx.createGain(); noise.buffer = buf; hp.type = 'highpass'; hp.frequency.value = 2000; nGain.gain.setValueAtTime(mix, time); nGain.gain.exponentialRampToValueAtTime(0.001, time + buzz); noise.connect(hp); hp.connect(nGain); nGain.connect(master); noise.start(time); noise.stop(time + buzz); }
A hi-hat is a thin metal disc. Unlike a drum membrane which has a clear fundamental resonance, metal has complex, inharmonic overtones — frequencies that don't sit in nice integer ratios. This is why a hi-hat has no clear pitch. To synthesize this, we use multiple square wave oscillators at mathematically unrelated frequencies, mix them together, then run the result through a highpass filter to remove any low-frequency content. What remains is that characteristic metallic shimmer.
The difference between a closed hi-hat and an open hi-hat is just the decay time. Closed: very short, sharp cut. Open: let it ring. Same synthesis, different envelope.
const HAT_FREQS = [205.3, 369.4, 427.5, 511.1, 772.5, 989.0]; // inharmonic function hihat(ctx, time, { open = false, decay = 0.08, tone = 1 } = {}) { const master = ctx.createGain(); const hp = ctx.createBiquadFilter(); hp.type = 'highpass'; hp.frequency.value = 7000 * tone; const mGain = ctx.createGain(); // 6 detuned square oscillators HAT_FREQS.forEach(f => { const osc = ctx.createOscillator(); const gain = ctx.createGain(); osc.type = 'square'; osc.frequency.value = f * tone; gain.gain.value = 1 / HAT_FREQS.length; osc.connect(gain); gain.connect(hp); osc.start(time); osc.stop(time + (open ? decay : decay * 0.3) + 0.01); }); const length = open ? decay : decay * 0.3; mGain.gain.setValueAtTime(0.7, time); mGain.gain.exponentialRampToValueAtTime(0.001, time + length); hp.connect(mGain); mGain.connect(ctx.destination); }
A handclap is not a single sound. When multiple people clap simultaneously, their timing is slightly off — by 5 to 30 milliseconds. This micro-timing creates a smearing effect that gives a clap its distinctive texture. A single perfect clap sounds thin and fake. Multiple imperfect claps sound real.
To synthesize a clap: trigger 3–5 short bursts of bandpass-filtered noise at tiny random offsets, followed by a slightly longer tail with a slower decay. The bandpass filter sits around 1–2kHz to give it that mid-frequency crack rather than a low thud or high hiss.
function clap(ctx, time, { room = 0.15, crack = 1200, bursts = 4 } = {}) { function noiseBurst(t, dur, vol) { const len = Math.ceil(ctx.sampleRate * dur); const buf = ctx.createBuffer(1, len, ctx.sampleRate); const data = buf.getChannelData(0); for (let i = 0; i < len; i++) data[i] = Math.random() * 2 - 1; const src = ctx.createBufferSource(); const bp = ctx.createBiquadFilter(); const gain = ctx.createGain(); src.buffer = buf; bp.type = 'bandpass'; bp.frequency.value = crack; bp.Q.value = 0.5; gain.gain.setValueAtTime(vol, t); gain.gain.exponentialRampToValueAtTime(0.001, t + dur); src.connect(bp); bp.connect(gain); gain.connect(ctx.destination); src.start(t); src.stop(t + dur + 0.01); } // Multiple staggered bursts → the "smear" for (let i = 0; i < bursts; i++) { const offset = i * (0.012 / bursts); noiseBurst(time + offset, 0.02, 0.8 - i * 0.1); } // Room tail: longer decaying noise noiseBurst(time + 0.015, room, 0.3); }
// Set bursts to 1 — that's a rimshot, not a clap. Set room to max — you're in a cathedral. The presets above move between these extremes.
The instinct for a sequencer is to use setInterval — fire a callback every N milliseconds, trigger the next beat. This is wrong. JavaScript timers are imprecise — they're blocked by garbage collection, rendering, and anything else on the main thread. A timer that fires every 500ms will drift by 5–20ms in practice. At 120 BPM, a 10ms drift is audible. Your groove falls apart.
The Web Audio API gives us AudioContext.currentTime — a high-precision clock measured in seconds, running independently of JavaScript. The correct approach: use a fast setTimeout as a lookahead scheduler that schedules audio events slightly ahead in time. Events are scheduled with exact timestamps. The audio engine fires them at those exact times. No drift.
const LOOK_AHEAD = 0.1; // schedule 100ms ahead const INTERVAL = 25; // check every 25ms function scheduler() { // Keep scheduling notes until we're ahead enough while (nextNoteTime < audioCtx.currentTime + LOOK_AHEAD) { scheduleStep(currentStep, nextNoteTime); advanceStep(); } timerID = setTimeout(scheduler, INTERVAL); } function advanceStep() { const secondsPerBeat = 60.0 / bpm; nextNoteTime += secondsPerBeat / 4; // 16th note grid currentStep = (currentStep + 1) % 16; }
// Every sound in the sequencer uses the synthesis functions from the previous four chapters. The kick sweeps pitch. The snare mixes noise and tone. The hi-hat runs six detuned oscillators. The clap fires multiple staggered bursts. Nothing here is a sample. All of it is math.