Media Capture and Streams Extensions

In-browser camera and microphone picker

The existing {{MediaDevices/enumerateDevices()}} function exposes camera and microphone {{MediaDeviceInfo/label}}s to let applications build in-content user interfaces for camera and microphone selection. Applications have had to do this because {{MediaDevices/getUserMedia()}} did not offer a web compatible in-agent device picker. This specification aims to rectify that.

Due to the significant fingerprinting vector caused by device {{MediaDeviceInfo/label}}s, and the well-established nature of the existing APIs, the scope of this particular effort is limited to removing {{MediaDeviceInfo/label}}, leaving the overall constraints-based model intact. This helps ensure a migration path more viable than to a less-powerful API.

This specification augments the existing {{MediaDevices/getUserMedia()}} function instead of introducing a new less-powerful API to compete with it, for that reason as well.

getUserMedia "user-chooses" semantics

This specification introduces slightly altered semantics to the {{MediaDevices/getUserMedia()}} function called "user-chooses" that guarantee a picker will be shown to the user in cases where the user agent would otherwise choose for the user (that is: when application constraints do not narrow down the choices to a single device). This is orthoginal to permission, and offers a better and more consistent user experience across applications and user agents.

Unfortunately, since the "user-chooses" semantics may produce user agent prompts at different times and in different situations compared to the old semantics, they are somewhat incompatible with expectations in some existing web applications that tend to call {{MediaDevices/getUserMedia()}} repeatedly and lazily instead of using e.g. stream.clone().

Web compatibility and migration

User agents are encouraged to provide the new semantics as opt-in initially for web compatibility. User agents MUST deprecate (remove) {{MediaDeviceInfo/label}} from {{MediaDeviceInfo}} over time, though specific migration strategies are left to user agents. User agents SHOULD migrate to offering the new semantics by default (opt-out) over time.

Since the constraints-model remains intact, web compatibility problems are expected to be limited to:

Sites that never migrated show e.g. "Camera 1", "Camera 2" etc. instead of descriptive device labels
Sites with no device management strategy provoke a picker in the user agent every visit for users with more than a singular choice of camera or microphone (a feature of sorts)

MediaDevices Interface Extensions

partial interface MediaDevices {
  readonly attribute GetUserMediaSemantics defaultSemantics;
};

Attributes

defaultSemantics of type GetUserMediaSemantics, readonly

The default semantics of {{MediaDevices/getUserMedia()}} in this user agent.

User agents SHOULD default to "browser-chooses" for backwards compatibility, until a transition plan has been enacted where a majority of user agents collectively switch their defaults to "user-chooses" for improved user privacy, and usage metrics suggest this transition is feasible without major breakage.

MediaStreamConstraints dictionary extensions

partial dictionary MediaStreamConstraints {
  GetUserMediaSemantics semantics;
};

Dictionary {{MediaStreamConstraints}} Members

semantics of type {{GetUserMediaSemantics}}: In cases where the specified constraints do not narrow multiple choices between devices down to one per kind, specifies how the final determination of which devices to pick from the remaining choices MUST be made. If not specified, then the defaultSemantics are used.

GetUserMediaSemantics enum

enum GetUserMediaSemantics {
  "browser-chooses",
  "user-chooses"
};

GetUserMediaSemantics Enumeration description
`browser-chooses`	When application-specified constraints do not narrow multiple choices between devices down to one per kind, the user agent is allowed to make the final determination between the remaining choices.
`user-chooses`	When application-specified constraints do not narrow multiple choices between devices down to one per kind, the user agent MUST prompt the user to choose between the remaining choices, even if the application already has permission to some or all of them.

Algorithms

When the {{MediaDevices/getUserMedia()}} method is invoked, run the following steps before invoking the {{MediaDevices/getUserMedia()}} algorithm:

Let mediaDevices be the object on which this method was invoked.
Let constraints be the method's first argument.
Let semanticsPresent be true if constraints.semantics [= map/exists =], otherwise false.
Let semantics be constraints.semantics if present, or the value of mediaDevices.defaultSemantics otherwise.
Replace step 6.5.1. of the {{MediaDevices/getUserMedia()}} algorithm in its entirety with the following two steps:
1. Let descriptor be a {{PermissionDescriptor}} with its {{PermissionDescriptor/name}} member set to the permission name associated with kind (e.g. {{PermissionName/"camera"}} for "video", {{PermissionName/"microphone"}} for "audio"), and, optionally, consider its {{DevicePermissionDescriptor/deviceId}} member set to any appropriate device's deviceId.
2. If the number of unique devices sourcing tracks of media type kind in candidateSet is greater than 1 and semantics is "user-chooses", then prompt the user to choose a device with descriptor, resulting in provided media. Otherwise, request permission to use a device with descriptor, while considering all devices being attached to a live and same-permission MediaStreamTrack in the current [=browsing context=] to mean having permission status {{PermissionState/"granted"}}, resulting in provided media.
  
  Same-permission in this context means a {{MediaStreamTrack}} that required the same level of permission to obtain as what is being requested.
  
  When asking the user’s permission, the user agent MUST disclose whether permission will be granted only to the device chosen, or to all devices of that kind.
  
  Let track be the provided media, which MUST be precisely one track of type kind from finalSet. If semantics is "browser-chooses" then the decision of which track to choose from finalSet is up to the User Agent, which MAY use the value of the computed "fitness distance" from the SelectSettings algorithm, the value of semanticsPresent, or any other internally-available information about the devices, as inputs to its decision. If semantics is "user-chooses", and the application has not narrowed down the choices to one, then the user agent MUST ask the user to make the final selection.
  
  Once selected, the source of the {{MediaStreamTrack}} MUST NOT change.
  
  User Agents are encouraged to default to or present a default choice based primarily on fitness distance, and secondarily on the user's primary or system default device for kind (when possible). User Agents MAY allow users to use any media source, including pre-recorded media files.

Examples

This example shows a setup with a start button and a camera selector using the new semantics (microphone is not shown for brievity but is equivalent).

<button id="start">Start</button>
<button id="chosenCamera" disabled>Camera: none</button>
<script>

let cameraTrack = null;

start.onclick = async () => {
  try {
    const stream = await navigator.mediaDevices.getUserMedia({
      video: {deviceId: localStorage.cameraId}
    });
    setCameraTrack(stream.getVideoTracks()[0]);
  } catch (err) {
    console.error(err);
  }
}

chosenCamera.onclick = async () => {
  try {
    const stream = await navigator.mediaDevices.getUserMedia({
      video: true,
      semantics: "user-chooses"
    });
    setCameraTrack(stream.getVideoTracks()[0]);
  } catch (err) {
    console.error(err);
  }
}

function setCameraTrack(track) {
  cameraTrack = track;
  const {deviceId, label} = track.getSettings();
  localStorage.cameraId = deviceId;
  chosenCamera.innerText = `Camera: ${label}`;
  chosenCamera.disabled = false;
}
</script>

Transferable MediaStreamTrack

A {{MediaStreamTrack}} is a transferable object. This allows manipulating real-time media outside the context it was requested or created in, for instance in workers or third-party iframes.

To preserve the existing privacy and security infrastructure, in particular for capture tracks, the track source lifetime management remains tied to the context that created it. The transfer algorithm MUST ensure the following behaviors:

The context named originalContext that created a track named originalTrack remains in control of the originalTrack source, named trackSource, even when originalTrack is transferred into transferredTrack.
In particular, originalContext remains the proxy to privacy indicators of trackSource. transferredTrack or any of its clones are considered as tracks using trackSource as if they were tracks created in and controlled by originalContext.
When originalContext goes away, trackSource gets ended, thus transferredTrack gets ended.
When originalContext would have muted/unmuted originalTrack, transferredTrack gets muted/unmuted.
If transferredTrack is cloned in transferredTrackClone, transferredTrackClone is tied to trackSource. It is not tied to originalTrack in any way.
If transferredTrack is transferred into transferredAgainTrack, transferredAgainTrack is tied to trackSource. It is not tied to transferredTrack or originalTrack in any way.

The WebIDL changes are the following:

[Exposed=(Window,Worker), Transferable]
partial interface MediaStreamTrack {
};

At creation of a {{MediaStreamTrack}} object, called track, run the following steps:

Initialize track.`[[IsDetached]]` to false.

The {{MediaStreamTrack}} transfer steps, given value and dataHolder, are:

If value.`[[IsDetached]]` is true, throw a "DataCloneError" DOMException.
Set dataHolder.`[[id]]` to value.{{MediaStreamTrack/id}}.
Set dataHolder.`[[kind]]` to value.{{MediaStreamTrack/kind}}.
Set dataHolder.`[[label]]` to value.{{MediaStreamTrack/label}}.
Set dataHolder.`[[readyState]]` to value.{{MediaStreamTrack/readyState}}.
Set dataHolder.`[[enabled]]` to value.{{MediaStreamTrack/enabled}}.
Set dataHolder.`[[muted]]` to value.{{MediaStreamTrack/muted}}.
Set dataHolder.`[[source]]` to value underlying source.
Set dataHolder.`[[constraints]]` to value active constraints.
Set value.`[[IsDetached]]` to true.
Set value.{{MediaStreamTrack/[[ReadyState]]}} to "ended" (without stopping the underlying source or firing an `ended` event).

{{MediaStreamTrack}} transfer-receiving steps, given dataHolder and track, are:

Initialize track.{{MediaStreamTrack/id}} to dataHolder.`[[id]]`.
Initialize track.{{MediaStreamTrack/kind}} to dataHolder.`[[kind]]`.
Initialize track.{{MediaStreamTrack/label}} to dataHolder.`[[label]]`.
Initialize track.{{MediaStreamTrack/readyState}} to dataHolder.`[[readyState]]`.
Initialize track.{{MediaStreamTrack/enabled}} to dataHolder.`[[enabled]]`.
Initialize track.{{MediaStreamTrack/muted}} to dataHolder.`[[muted]]`.
[=MediaStreamTrack/Initialize the underlying source=] of track to dataHolder.`[[source]]`.
Set track's constraints to dataHolder.`[[constraints]]`.

The underlying source is supposed to be kept alive between the transfer and transfer-receiving steps, or as long as the data holder is alive. In a sense, between these steps, the data holder is attached to the underlying source as if it was a track.

The powerEfficientPixelFormat constraint

MediaTrackSupportedConstraints dictionary extensions

partial dictionary MediaTrackSupportedConstraints {
  boolean powerEfficientPixelFormat = true;
};

Dictionary {{MediaTrackSupportedConstraints}} Members

powerEfficientPixelFormat of type {{boolean}}, defaulting to true: See powerEfficientPixelFormat for details.

MediaTrackCapabilities dictionary extensions

partial dictionary MediaTrackCapabilities {
  sequence<boolean> powerEfficientPixelFormat;
};

Dictionary {{MediaTrackCapabilities}} Members

powerEfficientPixelFormat of type sequence<{{boolean}}>: If the source only has power efficient pixel formats, a single true is reported. If the source only has power inefficient pixel formats, a single false is reported. If the script can control the feature, the source reports a list with both true and false as possible values. See powerEfficientPixelFormat for additional details.

MediaTrackSettings dictionary extensions

partial dictionary MediaTrackSettings {
  boolean powerEfficientPixelFormat;
};

Dictionary {{MediaTrackSettings}} Members

powerEfficientPixelFormat of type {{boolean}}: See powerEfficientPixelFormat for details.

Constrainable Properties

The constrainable properties in this document are defined below.

Property Name	Values	Notes
powerEfficientPixelFormat	{{ConstrainBoolean}}	Compressed pixel formats often need to be decoded, for instance for display purposes or when being encoded during a video call. The user agent SHOULD label compressed pixel formats that incur significant power penalty when decoded as power inefficient. The labeling is up to the user agent, but decoding MJPEG in software is an example of an expensive mode. Pixel formats that have not been labeled power inefficient by the user agent are for the purpose of this API considered power efficient. As a constraint, setting it to true allows filtering out inefficient pixel formats and setting it to false allows filtering out efficient pixel formats. As a setting, this reflects whether or not the current pixel format is considered power efficient by the user agent.

Property Name

Values

Notes

powerEfficientPixelFormat

Compressed pixel formats often need to be decoded, for instance for display purposes or when being encoded during a video call. The user agent SHOULD label compressed pixel formats that incur significant power penalty when decoded as power inefficient. The labeling is up to the user agent, but decoding MJPEG in software is an example of an expensive mode. Pixel formats that have not been labeled power inefficient by the user agent are for the purpose of this API considered power efficient.

As a constraint, setting it to true allows filtering out inefficient pixel formats and setting it to false allows filtering out efficient pixel formats.

As a setting, this reflects whether or not the current pixel format is considered power efficient by the user agent.

Exposing MediaStreamTrack source automatic face framing support

Some platforms or User Agents may provide built-in support for automatic framing based on the position of human faces within the field of view, in particular for camera video streams. Web applications may either want to control or at least be aware that automatic human face framing is applied at the source level. This may for instance allow the web application to update its UI or to not apply human face framing on its own. For that reason, we extend {{MediaStreamTrack}} with the following properties.

The WebIDL changes are the following:

partial dictionary MediaTrackSupportedConstraints {
  boolean faceFraming = true;
};

partial dictionary MediaTrackCapabilities {
  sequence<boolean> faceFraming;
};

partial dictionary MediaTrackConstraintSet {
  ConstrainBoolean faceFraming;
};

partial dictionary MediaTrackSettings {
  boolean faceFraming;
};

Processing considerations

When the "faceFraming" setting is set to true by the ApplyConstraints algorithm, the UA will attempt to improve framing by cropping to human faces.

When the "faceFraming" setting is set to false by the ApplyConstraints algorithm, the UA will not crop to human faces.

Examples

<video></video>
<script>
// Open camera.
const stream = await navigator.mediaDevices.getUserMedia({video: true});
const [videoTrack] = stream.getVideoTracks();

// Try to improve framing.
const videoCapabilities = videoTrack.getCapabilities();
if ((videoCapabilities.faceFraming || []).includes(true)) {
  await videoTrack.applyConstraints({
    advanced: [{faceFraming: true}]
  });
} else {
  // Face framing is not supported by the platform or by the camera.
  // Consider falling back to some other method.
}

// Show to user.
const videoElement = document.querySelector("video");
videoElement.srcObject = stream;
</script>

Exposing MediaStreamTrack source lighting correction support

Some platforms or User Agents may provide built-in support for human face lighting correction, in particular for camera video streams. Web applications may either want to control or at least be aware that human face lighting correction is applied at the source level. This may for instance allow the web application to update its UI or to not apply human face lighting correction on its own. For that reason, we extend {{MediaStreamTrack}} with the following properties.

The WebIDL changes are the following:

partial dictionary MediaTrackSupportedConstraints {
  boolean lightingCorrection = true;
};

partial dictionary MediaTrackCapabilities {
  sequence<boolean> lightingCorrection;
};

partial dictionary MediaTrackConstraintSet {
  ConstrainBoolean lightingCorrection;
};

partial dictionary MediaTrackSettings {
  boolean lightingCorrection;
};

Processing considerations

When the "lightingCorrection" setting is set to true by the ApplyConstraints algorithm, the UA will attempt to correct human face and background lighting balance so that human faces are not underexposed.

When the "lightingCorrection" setting is set to false by the ApplyConstraints algorithm, the UA will not correct human face and background lighting balance.

Examples

<video></video>
<script>
// Open camera.
const stream = await navigator.mediaDevices.getUserMedia({video: true});
const [videoTrack] = stream.getVideoTracks();

// Try to correct lighting.
const videoCapabilities = videoTrack.getCapabilities();
if ((videoCapabilities.lightingCorrection || []).includes(true)) {
  await videoTrack.applyConstraints({
    advanced: [{lightingCorrection: true}]
  });
} else {
  // Lighting correction is not supported by the platform or by the camera.
  // Consider falling back to some other method.
}

// Show to user.
const videoElement = document.querySelector("video");
videoElement.srcObject = stream;
</script>

VoiceIsolation constraint

Some platforms offer functionality for voice isolation: Attempting to remove all parts of an audio track that do not correspond to a human voice. Some platforms even attempt to remove extraneous voices, leaving the "main voice" as the dominant component of the audio. The exact methods used may vary between implementations.

This constraint permits the platform to turn on that functionality, with the desired result being that the "main voice" in the audio signal is the dominant component of the audio.

This will have large effects on audio that is presented for other reasons than to transmit voice (for instance music or ambient noises), so needs to be off by default.

This constraint is a stronger version of noise cancellation, which means that if the "noiseSuppression" constraint is set to false and "voiceIsolation" is set to true, the value of "noiseCancellation" will be ignored.

This constraint has no such relationship with any other constraint; in particular it does not affect echoCancellation.

The WebIDL changes are the following:

partial dictionary MediaTrackSupportedConstraints {
  boolean voiceIsolation = true;
};

partial dictionary MediaTrackConstraintSet {
  ConstrainBoolean voiceIsolation;
};

partial dictionary MediaTrackSettings {
  boolean voiceIsolation;
};

partial dictionary MediaTrackCapabilities {
  sequence<boolean> voiceIsolation;
};

Processing considerations

When the "voiceIsolation" setting is set to true by the ApplyConstraints algorithm, the UA will attempt to remove the components of the audio track that do not correspond to a human voice. If a dominant voice can be identified, the UA will attempt to enhance that voice.

When the "voiceIsolation" constraint setting is set to false by the ApplyConstraints algorithm, the UA will process the audio according to other settings in its normal fashion.

Introduction

Terminology

In-browser camera and microphone picker

getUserMedia "user-chooses" semantics

Web compatibility and migration

MediaDevices Interface Extensions

Attributes

MediaStreamConstraints dictionary extensions

Dictionary {{MediaStreamConstraints}} Members

GetUserMediaSemantics enum

Algorithms

Examples

Transferable MediaStreamTrack

The powerEfficientPixelFormat constraint

MediaTrackSupportedConstraints dictionary extensions

Dictionary {{MediaTrackSupportedConstraints}} Members

MediaTrackCapabilities dictionary extensions

Dictionary {{MediaTrackCapabilities}} Members

MediaTrackSettings dictionary extensions

Dictionary {{MediaTrackSettings}} Members

Constrainable Properties

Exposing MediaStreamTrack source background blur support

Exposing MediaStreamTrack source eye gaze correction support

Processing considerations

Examples

Exposing MediaStreamTrack source automatic face framing support

Processing considerations

Examples

Exposing MediaStreamTrack source lighting correction support

Processing considerations

Examples

VoiceIsolation constraint

Processing considerations

Exposing change of MediaStreamTrack configuration