Home/Technical Docs

TECHNICAL REFERENCE

RIPPLE WAVE
v3 DOCS

Deep technical internals of the extension — audio pipeline, engine architectures, CSP bypass strategy, and storage schema.

OVERVIEW

How It Works

Ripple Wave v3 is a Chrome Manifest v3 extension that intercepts the page's <video> element (YouTube or Reddit) and routes its audio through a real-time processing chain before it reaches your speakers — without touching the network or any server.

YouTube / Reddit <video> element
    │
    ▼  MediaElementSource (Web Audio API)
┌───────────────────────────────────┐
│   Your chosen filter engine:      │
│   EQ Lite  /  RNNoise  /         │
│   DeepFilterNet3                  │
└───────────────────────────────────┘
    │
    ▼  AudioContext.destination
Your speakers / headphones

The key insight: Web Audio API's MediaElementSourceNode lets you tap the audio stream from any HTML media element. Once tapped, the stream can be processed by any combination of native AudioNodes or customAudioWorkletProcessors before it plays out.

ARCHITECTURE

Extension Architecture

Built on Manifest v3, the extension is structured into four distinct layers that communicate via Chrome's messaging APIs and shared storage.

chrome-extension/
├── manifest.json          MV3 declaration
│
├── content_script.js      Injected into every YouTube / Reddit tab
│   ├── Hooks <video> element via MutationObserver
│   ├── Constructs AudioContext + chosen filter chain
│   └── Listens for settings changes via chrome.storage.onChanged
│
├── background.js          Service worker (persistent-ish)
│   ├── Handles model download for DeepFilterNet3
│   ├── Routes large fetch() calls to bypass site CSP
│   └── Manages extension lifecycle events
│
├── popup/                 Extension popup UI
│   ├── popup.html + popup.js
│   ├── Engine selector, intensity slider, presets
│   └── Writes to chrome.storage.sync → triggers content_script
│
└── worklets/
    ├── rnnoise-worklet.js    AudioWorklet wrapping RNNoise WASM
    └── deepfilter-worklet.js AudioWorklet wrapping DeepFilterNet3

Content Script Injection

The content script runs at document_idle on allyoutube.com/* URLs. It uses a MutationObserver to watch for client-side navigation (both YouTube and Reddit are SPAs — the DOM mutates rather than triggering full page loads). When a <video> element appears or changes, the hook re-attaches automatically.

// Simplified hook logic in content_script.js
const observer = new MutationObserver(() => {
  const video = document.querySelector('video')
  if (video && !video.__rwHooked) {
    attachFilterChain(video)
    video.__rwHooked = true
  }
})
observer.observe(document.body, { childList: true, subtree: true })

Settings Flow

When you change engine or intensity in the popup, chrome.storage.sync.set() is called. The content script listens to chrome.storage.onChanged and immediately swaps the active AudioNode graph — no page reload required.

WEB AUDIO API

Web Audio Pipeline

All three engines share the same entry and exit points in the AudioContext graph. Only the middle processing nodes differ.

const ctx = new AudioContext({ sampleRate: 48000 })

// Source: tap the video element
const src = ctx.createMediaElementSource(videoElement)

// ──── Engine nodes go here ────
// (BiquadFilterNodes / AudioWorkletNode)

// Sink: play through speakers
processedNode.connect(ctx.destination)

Sample Rate

The context is created at 48000 Hz — matching YouTube's delivery format. Both RNNoise and DeepFilterNet3 expect 48 kHz input natively, avoiding any resampling overhead.

AudioWorklet vs ScriptProcessorNode

The ML engines use AudioWorkletNode (not the deprecated ScriptProcessorNode). Worklets run in a dedicated audio rendering thread, separate from the main JS thread, so UI interactions never cause audio glitches or dropouts.

// Registering the worklet module
await ctx.audioWorklet.addModule(
  chrome.runtime.getURL('worklets/rnnoise-worklet.js')
)
const workletNode = new AudioWorkletNode(ctx, 'rnnoise-processor')

ENGINE 01

EQ Lite Engine

LATENCY~0 ms

CPUNEGLIGIBLE

DOWNLOADNONE

TYPEEQ + COMP

The EQ engine uses a chain of native BiquadFilterNodes — natively implemented by the browser, running in optimised C++ with effectively zero latency. Keyboard clicks concentrate energy in the 1–6 kHz range with short transient spikes, which is exactly what parametric EQ can surgically remove.

Filter Chain

src
 ├─► BiquadFilter { type: 'peaking', frequency: 1200, gain: -Gdyn, Q: 2.5 }
 ├─► BiquadFilter { type: 'peaking', frequency: 2400, gain: -Gdyn, Q: 2.5 }
 ├─► BiquadFilter { type: 'peaking', frequency: 3800, gain: -Gdyn, Q: 3.0 }
 ├─► BiquadFilter { type: 'peaking', frequency: 5500, gain: -Gdyn, Q: 3.5 }
 └─► DynamicsCompressorNode { threshold: -24, knee: 8, ratio: 8, attack: 0.003 }
     └─► ctx.destination

Gdyn = intensity slider value mapped to [0 dB … 18 dB]

Presets

The four presets map intensity to calibrated gain and compressor settings:

LIGHT  → Gdyn = 6 dB,   ratio = 4:1,  threshold = -18 dBFS
MED    → Gdyn = 10 dB,  ratio = 6:1,  threshold = -22 dBFS
HEAVY  → Gdyn = 14 dB,  ratio = 8:1,  threshold = -26 dBFS
NUKE   → Gdyn = 18 dB,  ratio = 20:1, threshold = -32 dBFS

Trade-offs

Aggressive EQ can colour speech at the targeted frequencies. The compressor helps catch transients the EQ misses, but very short attacks (sub-5 ms) may clip musical content. LIGHT mode is recommended for music-heavy videos; NUKE for pure talking-head content.

ENGINE 02

RNNoise Engine

LATENCY~15 ms

CPULOW–MED

DOWNLOADBUNDLED

MODEL SIZE150 KB

RNNoise is a recurrent neural network noise suppressor originally developed at Mozilla. It uses a Gated Recurrent Unit (GRU) architecture trained on a large corpus of speech + noise pairs. The WASM build (~150 KB) is bundled directly inside the extension — no download needed.

Architecture

Input frame: 480 samples @ 48 kHz = 10 ms window
    │
    ▼
Bark-scale feature extraction (22 bands)
    │
    ▼
3 × GRU layers (96 units each)
    │
    ▼
Gain curve per Bark band → applied via FFT/IFFT
    │
    ▼
Output frame: 480 samples (noise-suppressed)

AudioWorklet Integration

The worklet processor accumulates samples into 480-sample frames, passes them through the WASM module synchronously, and emits the processed frames to the output buffer. This introduces one frame of algorithmic latency (~10 ms) plus a small buffering delay (~5 ms), totalling ~15 ms end-to-end.

// Inside rnnoise-worklet.js (simplified)
class RNNoiseProcessor extends AudioWorkletProcessor {
  process(inputs, outputs) {
    const input  = inputs[0][0]   // Float32Array, 128 samples
    const output = outputs[0][0]

    this.buffer.push(...input)
    while (this.buffer.length >= 480) {
      const frame = this.buffer.splice(0, 480)
      const clean = rnnoiseWasm.processFrame(frame)
      this.outQueue.push(...clean)
    }
    output.set(this.outQueue.splice(0, 128))
    return true
  }
}

Frequency Coverage

Unlike the EQ engine which only targets 1–6 kHz, RNNoise operates across the full audible band (0–24 kHz in Bark domain). It suppresses keyboard clicks, fan hum, room noise, and even typing sounds simultaneously — treating them all as "not speech."

ENGINE 03

DeepFilterNet3 Engine

LATENCY~25 ms

CPUMODERATE

DOWNLOAD~2 MB ONCE

QUALITYSTATE OF ART

DeepFilterNet3 is a full deep-learning speech enhancement model built on a dual-stage architecture: a Temporal Convolutional Network (TCN) for broad noise estimation, and an Enhancement GAN stage for waveform refinement. It is compiled to WASM via ONNX Runtime Web.

Model Architecture

Input: 20 ms frames @ 48 kHz (960 samples)
    │
    ▼
STFT → Complex spectrogram (481 bins × 2)
    │
    ▼
Encoder (5× depthwise conv blocks, dim=256)
    │
    ├─► TCN branch: coarse noise mask estimation
    │
    └─► GRU branch (512 units): temporal refinement
            │
            ▼
    Multiplicative mask application in frequency domain
            │
            ▼
    Overlap-add iSTFT → wideband clean audio

WASM / ONNX Runtime

The model is serialised as an ONNX graph and loaded via onnxruntime-web. The WASM backend runs multi-threaded inference using SharedArrayBuffer when available (requires COOP/COEP headers — which the extension sets via its own service worker).

Processing Latency Breakdown

Frame size:          20 ms    (960 samples @ 48 kHz)
Model inference:     ~8 ms    (on modern hardware)
Buffering overhead:  ~5 ms
STFT/iSTFT:          ~2 ms
─────────────────────────────
Total:               ~25 ms   (imperceptible on videos)

IndexedDB Caching

The ~2 MB model blob is downloaded once and stored in IndexedDB under the key deepfilter_v3_model. On subsequent activations the background service worker serves it from cache, making activation near-instant even on slow connections.

CSP & DOWNLOADS

CSP Bypass & Model Downloads

YouTube (and Reddit) enforce Content Security Policies that blocks extension scripts from making arbitrary fetch() calls to external origins. Downloading the DeepFilterNet3 model directly from the content script would be blocked.

Service Worker Proxy

The MV3 background service worker is not subject to the page's CSP. The content script requests the model via Chrome's messaging API; the service worker performs the actual fetch and transfers theArrayBuffer back via message:

// content_script.js
chrome.runtime.sendMessage({ type: 'FETCH_MODEL' }, (response) => {
  const modelBuffer = response.arrayBuffer
  loadOrtSession(modelBuffer)
})

// background.js (service worker — no CSP)
chrome.runtime.onMessage.addListener((msg, _, sendResponse) => {
  if (msg.type === 'FETCH_MODEL') {
    fetch('https://cdn.example.com/deepfilter-v3.ort')
      .then(r => r.arrayBuffer())
      .then(buf => sendResponse({ arrayBuffer: buf }))
    return true  // async response
  }
})

declarativeNetRequest

MV3 removes access to webRequestBlocking. Header modifications (required for SharedArrayBuffer — COOP/COEP) are instead declared statically via declarativeNetRequest rules in manifest.json, which Chrome applies before the page sees the response.

STORAGE

Configuration & Storage

chrome.storage.sync Schema

User settings are persisted via chrome.storage.sync (synced across the user's Chrome profile):

{
  "engine":    "eq" | "rnn" | "deep",   // active engine
  "intensity": 0–100,                    // suppression %
  "preset":    "light"|"med"|"heavy"|"nuke", // EQ only
  "enabled":   true | false,            // global on/off
  "autoStart": true | false             // re-attach on navigation
}

IndexedDB Schema

Large binary assets (DeepFilterNet3 model, RNNoise WASM) are cached in IndexedDB database ripplewave-cache:

DB: "ripplewave-cache"  version: 1
  ObjectStore: "assets"
    key: "deepfilter_v3_model"  → ArrayBuffer (~2 MB)
    key: "rnnoise_wasm"         → ArrayBuffer (~150 KB, redundant backup)
    key: "ort_wasm_simd"        → ArrayBuffer (~4 MB, ONNX runtime)

Live Settings Updates

Changes in the popup propagate to active tabs via chrome.storage.onChanged — no message passing required. The content script handles the delta and hot-swaps the filter graph within a single audio render quantum (≤ 128 samples).

PRIVACY & SECURITY

Privacy & Security

Ripple Wave processes audio entirely inside your browser. Here is the complete data flow audit:

Audio data:
  ✓ Stays in-browser (Web Audio API, local only)
  ✗ Never sent to any server
  ✗ Never recorded or buffered beyond one audio frame

Model download (DeepFilterNet3 only):
  ✓ One-time fetch from a static CDN
  ✓ Cached in local IndexedDB after first download
  ✗ Only the model weights are downloaded, no audio data

chrome.storage.sync:
  ✓ Only stores engine preference + intensity slider
  ✗ No browsing history, no URLs, no audio

Permissions in manifest.json:
  "permissions": ["storage", "activeTab", "scripting"]
  "host_permissions": ["*://*.youtube.com/*"]

Open Source

Every line of code is public. Review the full source, the filter implementations, and the WASM build scripts on GitHub ↗.

Third-Party Components

RNNoise      — BSD-2-Clause (Mozilla / Jean-Marc Valin)
DeepFilterNet — MIT License  (Hendrik Schröter et al.)
onnxruntime-web — MIT License (Microsoft)
All bundled as WASM — no external runtime calls

OPEN SOURCE

Questions? Read the source.

All filter logic is public and auditable.

VIEW SOURCE →

RIPPLE WAVEv3 DOCS