Browser Softphone Tutorial

In this tutorial we will explain how to create the basic functionality for a browser-based software phone (softphone) using the Jabra library in conjunction with WebRTC. WebRTC is a set of native browser APIs that provides a relatively simple way of doing real-time peer-to-peer communication with audio and video.

Try Demo

Prerequisites

  • This tutorial assumes some knowledge on how to use the Jabra library covered in other documentation, such as the Managing Calls section of the Developer's Guide.
  • The tutorial refers to device assuming that selection of an active Jabra device is handled in the UI.

Summary

We will go through the following steps:

  • The flow of controlling audio tracks in conjunction with the Jabra library.
  • Initialize an audio stream with the selected Jabra device.
  • Initialize a peer connection (for demo purposes both local and remote connection will be within same webpage).
  • Playback "remote" audio.

The tutorial will focus on audio, but most concepts can be used with video as well.

Softphone flow

A softphone consists of three key elements: local audio (your audio), remote audio (audio of the person(s) you are talking to) and the connection between the two. Luckily, the browser does most of the heavy lifting for all three elements, so our primary job is to handle the flow correctly.

Create an ICallControl object

The first step of performing any call control-related functionality on a device is to initialize the Call Control module. This is done by creating a CallControlFactory object and using it to create an ICallControl object associated with a Jabra device.

See the Initializing a Module section of the Initialize page of the Developer's Guide for more information on the above.

[Note] For the rest of this tutorial, we will assume the ICallControl instance for device "X" is called callControl.

Start a call

The first step is to start a call. In our example we assume that the user presses a start-call button in the UI to trigger this intent.

As explained in the call control tutorial, we need to acquire a call lock from the device in order to use call-control APIs.

const gotLock = await callControl.takeCallLock();

if (gotLock) {
  // proceed
}

If we get the lock, we can proceed with initializing the local audio stream, initializing the peer connection and lastly playback the remote audio stream. We will go into depth with these procedures in a later step - for now we just show the flow.

// Get microphone permission and init audio stream
const stream = await initAudioStream(
  callControl.device.browserLabel,
  callControl.device.name
);
const localTrack = stream.getTracks()[0];

// Try to establish peer connection
const connection = await initPeerConnection(stream);

// Set device to "in call" state
callControl.offHook(true);

// Playback remote audio
const remoteAudio = playbackRemoteAudio(connection.remoteStream);

[Note] In this tutorial, we pass back our own audio as the peer connected audio for testing purposes. Hence, when running the demo you should be able to hear your own voice with a slight delay.

[Note] In a real world example, you would also need to handle accept/reject scenarios from the remote peer.

[Note] If the device is a child device, i.e. it is connected via a Jabra dongle device, then you should use the dongle device's name and browser label when calling initAudioStream. This is because Chrome does not have a way of recognizing the "child" (i.e. the headset). The following snippet shows how to easily use the dongle:

let stream;
if (callControl.device.parent) {
  stream = await initAudioStream(
    callControl.device.parent.browserLabel,
    callControl.device.parent.name
  );
} else {
  stream = await initAudioStream(
    callControl.device.browserLabel,
    callControl.device.name
  );
}

### Mute/unmute

Some Jabra devices will actively stop audio from the device when set to mute state, but other devices *only* signal mute by turning on LEDs leaving it up to the integrator to handle the actual audio suspension.

Therefore, when muting *your* device we also need to disable the local audio track - and vice versa for unmute - like this:

```javascript
// Mute
callControl.mute(true);
localTrack.enabled = false;

// Unmute
callControl.mute(false);
localTrack.enabled = true;

[Note] The mute state can be triggered through the library as we are doing here, but it can also be triggered by pressing the physical mute button that most devices have.

Hold/resume call

WebRTC offers different ways of holding a call, but the simplest one is to mute both local and remote audio stream. This method also makes it possible to replace the local audio stream with a music track to provide "waiting music".

Put call on hold:

// Set device to hold state
callControl.hold(true);

// Suspend remote audio context
remoteAudio.context.suspend();

// Disable local audio track
localTrack.enabled = false;

Resume call:

// Set device back to "in call" state
callControl.hold(false);

// Resume remote audio context
remoteAudio.context.resume();

// Enable local audio track
localTrack.enabled = true;

End call

Clean up when ending a call.

// Close local and remote peer connection
connection.localPeer.close();
connection.remotePeer.close();

// Stop audio track and context to free up memory
localTrack.stop();
remoteAudio.context.close();
remoteAudio.source.disconnect();
remoteAudio.element.srcObject = null;

// Set device state to "not in call / idling"
callControl.offHook(false);

// Release call lock so other apps can use the device
callControl.releaseCallLock();

Initialize local audio stream

In the section "Start a call" we initialized a local audio stream:

const stream = await initAudioStream(
  callControl.device.browserLabel,
  callControl.device.name
);
const localTrack = stream.getTracks()[0];

In this section we will cover how to do that.

Get microphone permission

Google Chrome requires that the user actively grants permission to use the microphone on a given webpage. This permission is only necessary to obtain once per domain. Trigger the dialogue like this:

  const stream = await navigator.mediaDevices.getUserMedia({'video': false, 'audio': true});
  stream.getTracks()[0].stop(); // stop the stream immediately, will setup later with correct device

[Note] Unfortunately, a browser quirk means that we need to call getUserMedia to get microphone permission before we can get a list of connected devices, and then call it again to get the stream of the correct device. The Permissions API wil be a better way of handling this when the spec is out of draft.

Get Jabra device from browser

The browser holds a list of connected audio devices, and after getting microphone permission we can get this list by calling navigator.mediaDevices.enumerateDevices(). Most laptops have built-in microphones so we need to filter the list to only return our selected Jabra device. We can do that by using the browserLabel property on the device object returned from the Jabra library.

async function getBrowserDevice(browserLabel) {
  const devices = await navigator.mediaDevices.enumerateDevices();
  const audioDevices = devices.filter(device => device.kind === 'audioinput');
  const browserDevice = audioDevices.find(a => a.label.indexOf(browserLabel) > -1)
  if(!browserDevice) throw new Error('Could not find any Jabra devices from the browser\'s list of devices');
  return browserDevice;
}

Initialize stream for Jabra device

Lastly, we initialize the stream for the selected Jabra device.

const stream = await navigator.mediaDevices.getUserMedia({'video': false, 'audio': { deviceId: browserDevice.deviceId }});

This will return a stream object, which is capturing audio without playing it back (we do that at a later step).

Initialize a peer connection

The WebRTC specification lets you set up a communication channel between peers via an ICE (Internet Connectivity Establishment) Server. We will not go into details about this part but advise reading Getting started with peer connections from webrtc.org.

In our example we setup a simple peer connection within the same webpage routing the outgoing signal back to ourselves.

// We pass in the stream obtained from our Jabra device in previous step
function initPeerConnection(stream) {
  return new Promise(async resolve => {
    // Setting up peer connection
    const localPeer = new RTCPeerConnection();
    const remotePeer = new RTCPeerConnection();

    localPeer.addEventListener('icecandidate', ({ candidate }) => candidate && remotePeer.addIceCandidate(candidate));
    remotePeer.addEventListener('icecandidate', ({ candidate }) => candidate && localPeer.addIceCandidate(candidate));

    // Add local audio track to localPeer to be send to remotePeer when handshake completes
    localPeer.addTrack(stream.getTracks()[0], stream);

    // Wait for track-event and resolve promise with our peers and the remote stream
    remotePeer.addEventListener('track', ({ streams: [remoteStream] }) => {
      resolve({ localPeer, remotePeer, remoteStream });
    });

    // Offer/answer handshake
    const offer = await localPeer.createOffer({
      offerToReceiveAudio: true,
      offerToReceiveVideo: false
    });

    await localPeer.setLocalDescription(offer);
    await remotePeer.setRemoteDescription(offer);

    const answer = await remotePeer.createAnswer();

    await remotePeer.setLocalDescription(answer);
    await localPeer.setRemoteDescription(answer);
  });
}

Playback remote audio

Last step is to play back remote audio via an AudioContext.

function playbackRemoteAudio(remoteStream) {
  // Muted audio element
  let element = new Audio();
  element.srcObject = remoteStream;
  element.muted = true;
  element.play();

  // Audio context
  let context = new AudioContext();
  let source = context.createMediaStreamSource(remoteStream);
  source.connect(context.destination);

  return { context, source, element }
}

[Note] A Chrome bug requires us to setup a muted audio element alongside the audio context in order to play back sound sent over RTC. Please see this issue on StackOverflow and this bug report on the Chromium project.

Wrapping up

These are steps needed to create the basic functionality for a browser-based softphone.

The full flow and how everything is tied together should become clearer when trying out and reading the source code of the call simulation demo, which is based on the concepts of this tutorial.

There are several other desirable features required for a production-ready softphone such as handling incoming calls, rejecting calls, multiple calls, reacting to device signals and setting up a real remote connection. Please see webrtc.org and MDN for more in-depth knowledge about the WebRTC specification. This list of WebRTC samples is also a useful resource.