RIPPLE WAVE
v3 DOCS
Deep technical internals of the extension — audio pipeline, engine architectures, CSP bypass strategy, and storage schema.
How It Works
Ripple Wave v3 is a Chrome Manifest v3 extension that intercepts the page's <video> element (YouTube or Reddit) and routes its audio through a real-time processing chain before it reaches your speakers — without touching the network or any server.
YouTube / Reddit <video> element
│
▼ MediaElementSource (Web Audio API)
┌───────────────────────────────────┐
│ Your chosen filter engine: │
│ EQ Lite / RNNoise / │
│ DeepFilterNet3 │
└───────────────────────────────────┘
│
▼ AudioContext.destination
Your speakers / headphonesThe key insight: Web Audio API's MediaElementSourceNode lets you tap the audio stream from any HTML media element. Once tapped, the stream can be processed by any combination of native AudioNodes or customAudioWorkletProcessors before it plays out.
Extension Architecture
Built on Manifest v3, the extension is structured into four distinct layers that communicate via Chrome's messaging APIs and shared storage.
chrome-extension/
├── manifest.json MV3 declaration
│
├── content_script.js Injected into every YouTube / Reddit tab
│ ├── Hooks <video> element via MutationObserver
│ ├── Constructs AudioContext + chosen filter chain
│ └── Listens for settings changes via chrome.storage.onChanged
│
├── background.js Service worker (persistent-ish)
│ ├── Handles model download for DeepFilterNet3
│ ├── Routes large fetch() calls to bypass site CSP
│ └── Manages extension lifecycle events
│
├── popup/ Extension popup UI
│ ├── popup.html + popup.js
│ ├── Engine selector, intensity slider, presets
│ └── Writes to chrome.storage.sync → triggers content_script
│
└── worklets/
├── rnnoise-worklet.js AudioWorklet wrapping RNNoise WASM
└── deepfilter-worklet.js AudioWorklet wrapping DeepFilterNet3Content Script Injection
The content script runs at document_idle on allyoutube.com/* URLs. It uses a MutationObserver to watch for client-side navigation (both YouTube and Reddit are SPAs — the DOM mutates rather than triggering full page loads). When a <video> element appears or changes, the hook re-attaches automatically.
// Simplified hook logic in content_script.js
const observer = new MutationObserver(() => {
const video = document.querySelector('video')
if (video && !video.__rwHooked) {
attachFilterChain(video)
video.__rwHooked = true
}
})
observer.observe(document.body, { childList: true, subtree: true })Settings Flow
When you change engine or intensity in the popup, chrome.storage.sync.set() is called. The content script listens to chrome.storage.onChanged and immediately swaps the active AudioNode graph — no page reload required.
Web Audio Pipeline
All three engines share the same entry and exit points in the AudioContext graph. Only the middle processing nodes differ.
const ctx = new AudioContext({ sampleRate: 48000 })
// Source: tap the video element
const src = ctx.createMediaElementSource(videoElement)
// ──── Engine nodes go here ────
// (BiquadFilterNodes / AudioWorkletNode)
// Sink: play through speakers
processedNode.connect(ctx.destination)Sample Rate
The context is created at 48000 Hz — matching YouTube's delivery format. Both RNNoise and DeepFilterNet3 expect 48 kHz input natively, avoiding any resampling overhead.
AudioWorklet vs ScriptProcessorNode
The ML engines use AudioWorkletNode (not the deprecated ScriptProcessorNode). Worklets run in a dedicated audio rendering thread, separate from the main JS thread, so UI interactions never cause audio glitches or dropouts.
// Registering the worklet module
await ctx.audioWorklet.addModule(
chrome.runtime.getURL('worklets/rnnoise-worklet.js')
)
const workletNode = new AudioWorkletNode(ctx, 'rnnoise-processor')EQ Lite Engine
The EQ engine uses a chain of native BiquadFilterNodes — natively implemented by the browser, running in optimised C++ with effectively zero latency. Keyboard clicks concentrate energy in the 1–6 kHz range with short transient spikes, which is exactly what parametric EQ can surgically remove.
Filter Chain
src
├─► BiquadFilter { type: 'peaking', frequency: 1200, gain: -Gdyn, Q: 2.5 }
├─► BiquadFilter { type: 'peaking', frequency: 2400, gain: -Gdyn, Q: 2.5 }
├─► BiquadFilter { type: 'peaking', frequency: 3800, gain: -Gdyn, Q: 3.0 }
├─► BiquadFilter { type: 'peaking', frequency: 5500, gain: -Gdyn, Q: 3.5 }
└─► DynamicsCompressorNode { threshold: -24, knee: 8, ratio: 8, attack: 0.003 }
└─► ctx.destination
Gdyn = intensity slider value mapped to [0 dB … 18 dB]Presets
The four presets map intensity to calibrated gain and compressor settings:
LIGHT → Gdyn = 6 dB, ratio = 4:1, threshold = -18 dBFS MED → Gdyn = 10 dB, ratio = 6:1, threshold = -22 dBFS HEAVY → Gdyn = 14 dB, ratio = 8:1, threshold = -26 dBFS NUKE → Gdyn = 18 dB, ratio = 20:1, threshold = -32 dBFS
Trade-offs
Aggressive EQ can colour speech at the targeted frequencies. The compressor helps catch transients the EQ misses, but very short attacks (sub-5 ms) may clip musical content. LIGHT mode is recommended for music-heavy videos; NUKE for pure talking-head content.
RNNoise Engine
RNNoise is a recurrent neural network noise suppressor originally developed at Mozilla. It uses a Gated Recurrent Unit (GRU) architecture trained on a large corpus of speech + noise pairs. The WASM build (~150 KB) is bundled directly inside the extension — no download needed.
Architecture
Input frame: 480 samples @ 48 kHz = 10 ms window
│
▼
Bark-scale feature extraction (22 bands)
│
▼
3 × GRU layers (96 units each)
│
▼
Gain curve per Bark band → applied via FFT/IFFT
│
▼
Output frame: 480 samples (noise-suppressed)AudioWorklet Integration
The worklet processor accumulates samples into 480-sample frames, passes them through the WASM module synchronously, and emits the processed frames to the output buffer. This introduces one frame of algorithmic latency (~10 ms) plus a small buffering delay (~5 ms), totalling ~15 ms end-to-end.
// Inside rnnoise-worklet.js (simplified)
class RNNoiseProcessor extends AudioWorkletProcessor {
process(inputs, outputs) {
const input = inputs[0][0] // Float32Array, 128 samples
const output = outputs[0][0]
this.buffer.push(...input)
while (this.buffer.length >= 480) {
const frame = this.buffer.splice(0, 480)
const clean = rnnoiseWasm.processFrame(frame)
this.outQueue.push(...clean)
}
output.set(this.outQueue.splice(0, 128))
return true
}
}Frequency Coverage
Unlike the EQ engine which only targets 1–6 kHz, RNNoise operates across the full audible band (0–24 kHz in Bark domain). It suppresses keyboard clicks, fan hum, room noise, and even typing sounds simultaneously — treating them all as "not speech."
DeepFilterNet3 Engine
DeepFilterNet3 is a full deep-learning speech enhancement model built on a dual-stage architecture: a Temporal Convolutional Network (TCN) for broad noise estimation, and an Enhancement GAN stage for waveform refinement. It is compiled to WASM via ONNX Runtime Web.
Model Architecture
Input: 20 ms frames @ 48 kHz (960 samples)
│
▼
STFT → Complex spectrogram (481 bins × 2)
│
▼
Encoder (5× depthwise conv blocks, dim=256)
│
├─► TCN branch: coarse noise mask estimation
│
└─► GRU branch (512 units): temporal refinement
│
▼
Multiplicative mask application in frequency domain
│
▼
Overlap-add iSTFT → wideband clean audioWASM / ONNX Runtime
The model is serialised as an ONNX graph and loaded via onnxruntime-web. The WASM backend runs multi-threaded inference using SharedArrayBuffer when available (requires COOP/COEP headers — which the extension sets via its own service worker).
Processing Latency Breakdown
Frame size: 20 ms (960 samples @ 48 kHz) Model inference: ~8 ms (on modern hardware) Buffering overhead: ~5 ms STFT/iSTFT: ~2 ms ───────────────────────────── Total: ~25 ms (imperceptible on videos)
IndexedDB Caching
The ~2 MB model blob is downloaded once and stored in IndexedDB under the key deepfilter_v3_model. On subsequent activations the background service worker serves it from cache, making activation near-instant even on slow connections.
CSP Bypass & Model Downloads
YouTube (and Reddit) enforce Content Security Policies that blocks extension scripts from making arbitrary fetch() calls to external origins. Downloading the DeepFilterNet3 model directly from the content script would be blocked.
Service Worker Proxy
The MV3 background service worker is not subject to the page's CSP. The content script requests the model via Chrome's messaging API; the service worker performs the actual fetch and transfers theArrayBuffer back via message:
// content_script.js
chrome.runtime.sendMessage({ type: 'FETCH_MODEL' }, (response) => {
const modelBuffer = response.arrayBuffer
loadOrtSession(modelBuffer)
})
// background.js (service worker — no CSP)
chrome.runtime.onMessage.addListener((msg, _, sendResponse) => {
if (msg.type === 'FETCH_MODEL') {
fetch('https://cdn.example.com/deepfilter-v3.ort')
.then(r => r.arrayBuffer())
.then(buf => sendResponse({ arrayBuffer: buf }))
return true // async response
}
})declarativeNetRequest
MV3 removes access to webRequestBlocking. Header modifications (required for SharedArrayBuffer — COOP/COEP) are instead declared statically via declarativeNetRequest rules in manifest.json, which Chrome applies before the page sees the response.
Configuration & Storage
chrome.storage.sync Schema
User settings are persisted via chrome.storage.sync (synced across the user's Chrome profile):
{
"engine": "eq" | "rnn" | "deep", // active engine
"intensity": 0–100, // suppression %
"preset": "light"|"med"|"heavy"|"nuke", // EQ only
"enabled": true | false, // global on/off
"autoStart": true | false // re-attach on navigation
}IndexedDB Schema
Large binary assets (DeepFilterNet3 model, RNNoise WASM) are cached in IndexedDB database ripplewave-cache:
DB: "ripplewave-cache" version: 1
ObjectStore: "assets"
key: "deepfilter_v3_model" → ArrayBuffer (~2 MB)
key: "rnnoise_wasm" → ArrayBuffer (~150 KB, redundant backup)
key: "ort_wasm_simd" → ArrayBuffer (~4 MB, ONNX runtime)Live Settings Updates
Changes in the popup propagate to active tabs via chrome.storage.onChanged — no message passing required. The content script handles the delta and hot-swaps the filter graph within a single audio render quantum (≤ 128 samples).
Privacy & Security
Ripple Wave processes audio entirely inside your browser. Here is the complete data flow audit:
Audio data: ✓ Stays in-browser (Web Audio API, local only) ✗ Never sent to any server ✗ Never recorded or buffered beyond one audio frame Model download (DeepFilterNet3 only): ✓ One-time fetch from a static CDN ✓ Cached in local IndexedDB after first download ✗ Only the model weights are downloaded, no audio data chrome.storage.sync: ✓ Only stores engine preference + intensity slider ✗ No browsing history, no URLs, no audio Permissions in manifest.json: "permissions": ["storage", "activeTab", "scripting"] "host_permissions": ["*://*.youtube.com/*"]
Open Source
Every line of code is public. Review the full source, the filter implementations, and the WASM build scripts on GitHub ↗.
Third-Party Components
RNNoise — BSD-2-Clause (Mozilla / Jean-Marc Valin) DeepFilterNet — MIT License (Hendrik Schröter et al.) onnxruntime-web — MIT License (Microsoft) All bundled as WASM — no external runtime calls
OPEN SOURCE
Questions? Read the source.
All filter logic is public and auditable.