This document describes the architecture of the web implementation of the audio recording system in expo-audio-studio.
The web implementation of expo-audio-studio provides a complete audio recording solution that works in web browsers. It supports:
- Recording compressed and uncompressed audio
- Real-time audio analysis
- Streaming audio data to clients
- Various audio formats and quality settings
The web implementation consists of several key components:
- AudioStudioWeb: Main entry point that provides the public API
- WebRecorder: Core recording implementation for web platform
- AudioWorklet: Low-level audio processing in a dedicated thread
- FeatureExtractor: Audio analysis for visualization and features
┌────────────────┐ ┌─────────────────┐ ┌────────────────────┐
│ │ │ │ │ │
│ Browser │ │ WebRecorder │ │ AudioStudio │
│ Audio API ├──────►│ (processing) ├──────►│ (events/API) │
│ │ │ │ │ │
└────────────────┘ └─────────────────┘ └────────────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌────────────────┐ ┌─────────────────┐ ┌────────────────────┐
│ │ │ │ │ │
│ AudioWorklet │ │ MediaRecorder │ │ Client App │
│ (PCM data) │ │ (Compression) │ │ (Consuming API) │
│ │ │ │ │ │
└────────────────┘ └─────────────────┘ └────────────────────┘
- src/AudioStudio.web.ts: Main module implementation
- src/WebRecorder.web.ts: Core recording implementation
- src/workers/inlineAudioWebWorker.web.tsx: AudioWorklet processor
- src/workers/InlineFeaturesExtractor.web.ts: Audio analysis worker
- src/hooks/useAudioRecorder.tsx: React hook for easy API consumption
This is the main class that implements the Expo module interface for web. It:
- Provides the public API for recording operations
- Manages the recording lifecycle
- Emits events to the client application
- Handles compressed and uncompressed audio data
Key methods:
startRecording(): Begins a recording sessionstopRecording(): Ends recording and finalizes audiopauseRecording(): Temporarily suspends recordingresumeRecording(): Continues a paused recordingemitAudioEvent(): Sends audio data to clients
This class handles the actual recording implementation:
- Creates and manages the AudioContext and audio worklet
- Processes audio data from the microphone
- Implements compression via MediaRecorder API
- Handles device switching and interruptions
- Creates WAV files from PCM data
Key methods:
init(): Sets up the audio workletstart(): Begins audio capturestop(): Stops recording and finalizes audiocreateWavFromPcmData(): Converts PCM to WAV format
This component runs in a separate thread to process audio efficiently:
- Receives raw audio samples from the microphone
- Processes audio in real-time
- Handles sample rate conversion and bit depth
- Emits chunks of audio data at regular intervals
The AudioWorklet is implemented as an inline script that's injected at runtime to avoid CORS issues.
This worker extracts audio features for visualization:
- Analyzes audio for amplitude, frequency, etc.
- Generates data for waveform visualization
- Runs in a separate thread for performance
- Capture: Browser's MediaDevices API captures audio from the microphone
- Processing: AudioWorklet processes raw PCM data in real-time
- Compression (optional): MediaRecorder API compresses audio chunks
- Analysis: FeatureExtractor analyzes audio for visualization
- Event Emission: AudioStudioWeb emits events with audio data
- Storage: Audio data is stored in memory and/or as files
The system supports two parallel audio paths:
-
Uncompressed PCM:
- Format: 32-bit float PCM (in memory), 16-bit WAV (for storage)
- Source: AudioWorklet
- Use: High-quality processing, visualization, analysis
-
Compressed Audio:
- Format: Opus (WebM container) or AAC
- Source: MediaRecorder API
- Use: Efficient storage, network transmission
The system provides several ways to access audio data:
-
Event-based access (during recording):
onAudioStreamcallback in the recording configuration- Events contain both PCM data and optional compressed chunks
- Real-time access as chunks are generated
-
File-based access (after recording):
stopRecording()returns a result with file URIs- Access to both compressed (
compression.compressedFileUri) and uncompressed (fileUri) formats - Complete audio files suitable for playback or storage
-
Raw data access:
- PCM chunks as Float32Array
- Compressed chunks as Blob objects
| Aspect | Uncompressed (PCM) | Compressed (Opus/AAC) |
|---|---|---|
| Source | AudioWorklet processor | MediaRecorder API |
| Chunk Size | Small (typically ~500ms intervals) | Larger (1-2s intervals) |
| Frequency | High frequency chunks | Lower frequency chunks |
| Intermediate | Often includes micro-chunks (64 samples) | No intermediate chunks |
| Memory Usage | Higher (32-bit floats) | Lower (compressed data) |
| Latency | Lower latency | Higher latency due to compression |
| Quality | Lossless | Lossy compression |
The system employs several mechanisms to synchronize the two audio paths:
-
Position Tracking:
- Each audio chunk includes a
positionvalue (in seconds) - Timestamps allow aligning chunks between compressed and uncompressed paths
- WebRecorder maintains a global position counter
- Each audio chunk includes a
-
Data Association:
pendingCompressedChunkmechanism in WebRecorder- Associates each compressed chunk with corresponding PCM data
- Emits both when available in a single event
-
Shared Start/Stop Control:
- Both MediaRecorder and AudioWorklet are initialized together
- Start/pause/resume/stop operations affect both simultaneously
- Ensures timeline alignment between formats
To ensure both audio paths start and stop at exactly the same time:
-
Initialization:
// Both paths initialized simultaneously in WebRecorder.init() this.audioWorkletNode = new AudioWorkletNode(audioContext, 'recorder-processor'); this.compressedMediaRecorder = new MediaRecorder(source.mediaStream);
-
Starting:
// Both paths started in sequence in WebRecorder.start() this.source.connect(this.audioWorkletNode); this.audioWorkletNode.connect(this.audioContext.destination); this.compressedMediaRecorder.start(interval);
-
Stopping:
// Stop compressed recording first this.compressedMediaRecorder.stop(); // Allow time for final compressed chunks await new Promise(resolve => setTimeout(resolve, 100)); // Final cleanup and disconnection this.cleanup();
There are several challenges in perfect synchronization:
- Timing differences: MediaRecorder and AudioWorklet operate at different timescales
- Browser variations: Different browsers implement MediaRecorder with different timing
- Compression latency: Encoding introduces variable delay
To address these challenges:
- Time alignment: Both sources use the same timeline base
- Chunk correlation: Compressed chunks are paired with PCM data by position
- Final processing delay: Stop sequence includes a delay to ensure all data is processed
- Metadata preservation: Position data is preserved through the entire pipeline
When working with both audio paths:
-
Choosing the right format:
- Use compressed for efficient storage and transmission
- Use uncompressed for highest quality processing and visualization
-
Handling chunk misalignment:
- Sort chunks by position before concatenation or processing
- Use the position value for alignment rather than arrival order
-
End-of-recording handling:
- Wait for both paths to complete before finalizing
- Check for trailing micro-chunks in the uncompressed path
Audio data is streamed to clients via events:
AudioData: Contains PCM data and optional compressed chunksAudioAnalysis: Contains visualization and analysis data
Events include metadata such as:
- Position (timestamp)
- File information
- Size and format details
- Compression information
The system includes mechanisms to prevent duplicate audio chunks:
Position-based detection in WebRecorder(Removed to prevent data loss)- Time-based throttling in AudioWorklet
- Synchronization between compressed and uncompressed paths
Note: Position-based duplicate detection was removed because it could incorrectly filter out legitimate audio chunks, causing significant data loss (up to 48%) in reconstructed audio from onAudioStream events.
The system handles device disconnections and other interruptions:
- Detecting disconnection events
- Optional fallback to other devices
- Graceful pause/resume functionality
Compression is configurable with:
- Format selection (Opus, AAC)
- Bitrate control
- Enable/disable options
The useAudioRecorder hook provides a React-friendly interface:
const {
isRecording,
isPaused,
startRecording,
stopRecording,
pauseRecording,
resumeRecording,
// Additional state and controls...
} = useAudioRecorder();- Audio processing is done in a separate thread via AudioWorklet
- Feature extraction runs in a Web Worker
- Memory usage is managed by limiting buffer sizes
- Compressed chunks reduce storage requirements
The implementation supports modern browsers with Web Audio API support:
- Chrome/Edge (best support)
- Firefox (good support)
- Safari (limited support in some versions)
The code includes fallbacks for browser differences and feature detection.