So, how accurate is passive voice biometrics? If built correctly – very accurate!

Bradleigh Scott

OneVault Executive: Data & Product

B.Comm, MBA

So, how accurate is passive voice biometrics? If built correctly – very accurate!

Building better background models to drive accuracy in passive voice biometrics

“What is the accuracy like?” is a consistent question from existing and prospective clients on biometric accuracy of a voice authentication solution. I thought it would be worthwhile to share some of the insights I have gathered over the years on voice biometric accuracy:

1) Voice biometrics relies heavily on a background model*1 which influences the accuracy of verifications against a voiceprint. Most voice biometric software installations come with built in background models which are sufficient for proof of concepts or initial production installations; but over time these background models need to be calibrated with audio collected from the client’s actual environment. Models built with a customer’s own data always perform better than any “factory-fitted or out the box” background model.

2) Better performance on custom background models is also only achieved with tons of data. By data, I mean audio recordings and associated meta data (like customer ID, channel, etc.) which are either obtained from the client’s telephony platform or collected within the voice biometric software deployed. It is very rare, however, that background models can be built using audio files collected from the client’s existing telephony system largely because:
a. the audio is usually in the incorrect mono format including both the customer and agent in the recording
b. the audio is most often highly compressed or in the wrong audio file format
c. the audio cannot easily be associated with a customer ID.

Based on the above limitations, the majority of background models I have built have always been with audio collected after the initial deployment of a voice biometric solution that represents the legitimate customer environment

To give you a sense of what is required, a standard custom background model generally requires one (1) audio recording for about 300 people per channel e.g. landline, cellular or smartphone application. For better background model performance, multiple calls for thousands (or tens of thousands) of people are required.

3) Biometric accuracy is heavily influenced by cross-channel transactions. Cross-channel refers to a customer enrolling on one device and verifying off another e.g. the customer enrols through a smartphone application but verifies their voice on a landline or via a Bluetooth cellular connection in their car. Biometric accuracy always performs best when transactions occur on the same device for a customer. However, the good news is that cross-channel performance can be improved by building background models with audio recordings collected where speakers have used multiple channels to transact. Again, the more data the better.

4) The usual voice biometric brochureware refers to voice biometrics analysing around 150 unique points of a person’s voice to create a voiceprint. Whether is fact or a marketing exaggeration, I have found that in creating and evaluating dozens of different background models, that race, gender, age, family relation (e.g. twins, sons/daughters, sisters/brothers) and channel make a big difference in determining if the audio recording – or what we commonly refer to as a voiceprint - is a match or mismatch. The biggest influence, however, this is often channel. Let me explain: If customer A enrols and verifies on his same mobile device, he will get better biometric accuracy than customer B who uses different devices. However, should customer B happen to try and verify off customer A’s device and claim to be customer A, there is a strong possibility that customer B could be incorrectly authenticated. This probability of a false match can increase if customer B happens to be the same gender or race as customer A or belong to the same family.
Over time with more background model audio data, the models can better discriminate between true and imposter callers – as noted above - but there will always remain exceptions. Voice biometrics is not a full proof system but does, however, offer better protection against fraudster calls than any traditional knowledge-based authentication solutions. At OneVault, we always advocate that clients supplement voice biometric match decisions with other factors like SMS/message notifications to the customer cellular number on file. This mitigates the risk – at least in the early phase of an implementation – when limited data is available.

For me it is critical that customers understand that voice biometrics remains a GREAT solution but needs to be continually managed and optimised. It is not something that can be implemented and forgotten. My personal experience is that, when implemented correctly, passive voice biometrics provides clients with an authentication solution that:
a) Offers customers a better call centre experience
b) Reduces overall contact centre fraud
c) Gives the customer more confidence in the account security
d) Provides an optimised business process for the client organisation

Voice biometrics requires IP and expertise, and the creation of robust and relevant background models is critical to deliver a successful voice biometric implementation. We would be happy to help you craft your solution that achieves the business imperatives you seek to achieve. At OneVault we have the experience of many different environments and are happy to share our insight with you.

*1Background Model definition: this is a codified representation of a client’s customers and the audio channels (cellular, landline, smartphone app) they use to engage with the client. The background model is used as a template to enrol audio to create voiceprints. It is also instrumental to authenticate an audio sample against a claimed voiceprint thus discriminating between the true caller or an imposter.


Bradleigh Scott

OneVault Executive: Data & Product