Web audio --WebSocket--> FastAPI Server.
Use https to use getUserMedia
cross host.
uvicorn src.main:app --host=0.0.0.0 --reload --ssl-keyfile=./key.pem --ssl-certfile=./cert.pem
deprecated.
uvicorn src.main:app --reload
The API is based on the manipulation of a MediaStream
object representing a flux of audio- or video-related data. See an example in Get the video.
A MediaStream
consists of zero or more MediaStreamTrack
objects, representing various audio or video tracks. Each MediaStreamTrack
may have one or more channels. The channel represents the smallest unit of a media stream, such as an audio signal associated with a given speaker, like left or right in a stereo audio track.
MediaStream
objects have a single input and a single output. A MediaStream
object generated by getUserMedia()
is called local, and has as its source input one of the user's cameras or microphones. A non-local MediaStream
may be representing to a media element, like ](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/video) or [
, a stream originating over the network, and obtained via the WebRTC RTCPeerConnection
API, or a stream created using the Web Audio API MediaStreamAudioSourceNode
.
The output of the MediaStream
object is linked to a consumer. It can be a media elements, like ](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/audio) or [
, the WebRTC RTCPeerConnection
API or a Web Audio API MediaStreamAudioSourceNode
.
https://developer.mozilla.org/en-US/docs/Web/API/Media_Streams_API
navigator.mediaDevices.getUserMedia
:for read microphone stream.context.createScriptProcessor
: for process audio buffer, though it is deprecated.
const handleSuccess = function (stream) {
const context = new AudioContext();
const source = context.createMediaStreamSource(stream);
const processor = context.createScriptProcessor(1024, 1, 1);
source.connect(processor);
processor.connect(context.destination);
processor.onaudioprocess = function (e) {
// Do something with the data, e.g. convert it to WAV
console.log(e.inputBuffer);
};
};
navigator.mediaDevices.getUserMedia({ audio: true, video: false })
.then(handleSuccess);
Cause getUserMedia with Constraint Not work, so resample by the following methods:
// `sourceAudioBuffer` is an AudioBuffer instance of the source audio
// at the original sample rate.
const DESIRED_SAMPLE_RATE = 16000;
const offlineCtx = new OfflineAudioContext(sourceAudioBuffer.numberOfChannels, sourceAudioBuffer.duration * DESIRED_SAMPLE_RATE, DESIRED_SAMPLE_RATE);
const cloneBuffer = offlineCtx.createBuffer(sourceAudioBuffer.numberOfChannels, sourceAudioBuffer.length, sourceAudioBuffer.sampleRate);
// Copy the source data into the offline AudioBuffer
for (let channel = 0; channel < sourceAudioBuffer.numberOfChannels; channel ) {
cloneBuffer.copyToChannel(sourceAudioBuffer.getChannelData(channel), channel);
}
// Play it from the beginning.
const source = offlineCtx.createBufferSource();
source.buffer = cloneBuffer;
source.connect(offlineCtx.destination);
offlineCtx.oncomplete = function (e) {
// `resampledAudioBuffer` contains an AudioBuffer resampled at 16000Hz.
// use resampled.getChannelData(x) to get an Float32Array for channel x.
const resampledAudioBuffer = e.renderedBuffer;
console.log(resampledAudioBuffer);
}
offlineCtx.startRendering();
source.start(0);
https://stackoverflow.com/a/55427982/974526
navigator.mediaDevices.getUserMedia({audio: true})
.then((stream) => {
let context = new AudioContext(),
bufSize = 4096,
microphone = context.createMediaStreamSource(stream),
processor = context.createScriptProcessor(bufSize, 1, 1),
res = new Resampler(context.sampleRate, 16000, 1, bufSize),
bufferArray = [];
processor.onaudioprocess = (event) => {
console.log('onaudioprocess');
// const right = event.inputBuffer.getChannelData(1);
const outBuf = res.resample(event.inputBuffer.getChannelData(0));
bufferArray.push.apply(bufferArray, outBuf);
}
}
}
https://github.com/felix307253927/resampler
Although navigator.mediaDevices.getUserMedia
is set by following MediaTrackConstraints: mediaStreamConstraints
, the stream is still at SampleRate 48000. Because the Chrome browser I use only support sampleRate 48000.
const mediaStreamConstraints = {
audio: {
channelCount: 1,
sampleRate: 16000,
sampleSize: 16
}
}
// set constraints at begining
navigator.mediaDevices.getUserMedia(mediaStreamConstraints)
.catch( err => serverlog(`ERROR mediaDevices.getUserMedia: ${err}`) )
.then( stream => {
const track = mediaStream.getAudioTracks()[0];
// can update audio track Constraints here
// track.applyConstraints(mediaStreamConstraints['audio'])
.then(() => {
console.log(track.getCapabilities());
});
// audio recorded as Blob
// and the binary data are sent via socketio to a nodejs server
// that store blob as a file (e.g. audio/inp/audiofile.webm)
} )
So how to check the capabilities?
let stream = await navigator.mediaDevices.getUserMedia({audio: true});
let track = stream.getAudioTracks()[0];
console.log(track.getCapabilities());
output:
{autoGainControl: Array(2), channelCount: {…}, deviceId: "default", echoCancellation: Array(2), groupId: "1e76386ad54f9ad3548f6f6c14c08e7eff6753f9362d93d8620cc48f546604f5", …}
autoGainControl: (2) [true, false]
channelCount: {max: 2, min: 1}
deviceId: "default"
echoCancellation: (2) [true, false]
groupId: "1e76386ad54f9ad3548f6f6c14c08e7eff6753f9362d93d8620cc48f546604f5"
latency: {max: 0.01, min: 0.01}
noiseSuppression: (2) [true, false]
sampleRate: {max: 48000, min: 48000}
sampleSize: {max: 16, min: 16}
__proto__: Object
https://developer.mozilla.org/en-US/docs/Web/API/Media_Streams_API/Constraints
The legacy ScriptProcessorNode
was asynchronous and required thread hops (between UI thread and user thread), which could produce an unstable audio output. The AudioWorklet
object provides a new synchronous JavaScript execution context which allows developers to programmatically control audio without additional latency and higher stability in the output audio.
You can see example code in action along with other examples at Google Chrome Labs.
https://blog.chromium.org/2018/03/chrome-66-beta-css-typed-object-model.html
Safari does not support AudioWorklet now.
https://developer.mozilla.org/en-US/docs/Web/API/AudioWorklet
The Web Audio API provides a powerful and versatile system for controlling audio on the Web, allowing developers to choose audio sources, add effects to audio, create audio visualizations, apply spatial effects (such as panning) and much more.
Browser/Web audio Brief history:
flash play audio -> <audio> element -> Web Audio API (do something outside main thread)
https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API
- Use Web Audio API native
- Use recorder.js, but is not being actively maintained. (Can not get streaming buffer, only after stop.)
- Use RecordRTC.js, it is active and support almost browser. ((Can not get streaming buffer, only after stop.)
Audio glitches are caused by an interruption of the normal continuous audio stream, resulting in loud clicks and pops. It is considered to be a catastrophic failure of a multi-media system and MUST be avoided. It can be caused by problems with the threads responsible for delivering the audio stream to the hardware, such as scheduling latencies caused by threads not having the proper priority and time-constraints. It can also be caused by the audio DSP trying to do more work than is possible in real-time given the CPU’s speed.
The ScriptProcessorNode
is constructed with a bufferSize
which MUST be one of the following values: 256, 512, 1024, 2048, 4096, 8192, 16384. This value controls how frequently the onaudioprocess
event is dispatched and how many sample-frames need to be processed each call. onaudioprocess
events are only dispatched if the ScriptProcessorNode
has at least one input or one output connected. Lower numbers for bufferSize
will result in a lower (better) latency. Higher numbers will be necessary to avoid audio breakup and glitches.
Use mkcert to make certificates.
mkcert: A simple zero-config tool to make locally trusted development certificates with any names you'd like.
mkcert -key-file key.pem -cert-file cert.pem localhost <host ip>
There are several ways to downsample audio in web:
- OfflineAudioContext (native code, built in downsampling feature), currently used.
- Web Worker, and use self implementation downsampling method, such JavaScript or WebAssembly code.
-
WebSocket WSS (Self Signed Certificate) doesn't work on iOS Safari
-
Sometimes Chrome at Ubuntu 18.04 may became discontinuous (https://stackoverflow.com/questions/54794052/how-can-i-prevent-breakup-choppiness-glitches-when-using-an-audioworklet-to-stre)