ASR v Human Transcription: Which Is Best For Multi-Voice?

ASR v Human Transcription: Which Is Best For Multi-Voice? A blog post by OutSec the UK's leading online transcription company

For many people investing in transcription services, security and quality are the top priority. Whilst automated voice recognition offers cheap pricing and quick turnarounds, they do not offer quality control. Basically, you get what you get. If your audio is low quality or is multi-voice (and has many speakers), you may end up with a transcript that is simply gibberish. In terms of data and protection and security, do you know how and where your data is being handled?

What is Automated Speech Recognition (ASR)?

ASR is a technology that converts speech into text. It enables applications such as voice assistants, speech analytics, voice biometrics and speech translation.

Quality: The Facts Speak for Themselves

There are issues revolving around ASRand audio quality, accents, speaking patterns, and colloquial language. ASR can struggle to detect words when the audio is low or of poor quality. Where us humans, can fair much better. It has issues with accents – which can be an issue in a country like the UK where language is diverse. ASR also struggles with slang words, technical terms or acronyms. As a result this can mean any transcript produced by ASR of poor quality.

Can you Afford to Lose out on Quality?

Even under ideal circumstances, automated voice recognition struggles to hit accuracy rates above 80%.

ASR is no Match for the Human Ear!

Quality control mechanisms, if the audio is poor, means that you will spend more time reviewing the content and editing. A human ‘ear’ however can detect slang with relative ease. It can understand different accents, and handle poor audio quality with less difficulty.

Fundamentally, we are not in a place where voice recognition can match the language processing power of the human brain.

Human Transcription Services Offer More Variety

A top-quality human transcription service will also give you variety around the service. You have a choice:

Intelligent Verbatim transcripts will clean up the audio for you, removing ‘ums’, ‘errs’ and false starts. It keeps the content on track so you receive something that is easier to read and ‘flows’.
Summary/Notes transcripts give an overview of a recording. These allow you to digest the meaning of the content without having to read every single word.

Verbatim transcripts deliver every single detail. You get a word-for-word transcript that also includes any ‘ums’ and ‘ahs’, pauses, and so on. If required, they can also include typist notes regarding tone and laughter for instance.

Data Protection & Security

When you are paying for a transcript, you need to be certain that all the information is completely safe and secure.

With both human and automated services, you need to look for evidence of data security. For example,

  • Where is your data stored?
  • Does the provider use end-to-end encryption?

With human transcription services, data security can make all the difference! If you need security guarantees, look out for the aforementioned bullet points. You can also ask for a Non-Disclosure Agreement or evidence of Confidentiality statements of the person(s) conducting the transcripts. With Automated Speech Recognition (ASR) it is much harder and you do need to read the fine print.

GDPR Issues Using ASR

using ASR also poses some challenges and risks for data security and privacy, especially in the context of the General Data Protection Regulation (GDPR), which is a law that protects the personal data of individuals in the European Union (EU) and the UK.

One of the main issues with ASR is that voice data is considered as personal data under the GDPR, as it can reveal information about the speaker’s identity, location, health, preferences, opinions, and emotions. Therefore, ASR users need to obtain explicit and informed consent from the speakers before collecting, processing, or storing their voice data. They also need to inform the speakers about the purpose and duration of the data processing, as well as their rights to access, correct, or delete their data.

Another issue with ASR is that voice data may be processed by third-party services or cloud providers, which may not comply with the GDPR or have adequate security measures to protect the data from unauthorised access or breach. For example, some ASR services may use text translation or text-to-speech features that involve sending the voice data to other services or locations. This may expose the data to potential risks of interception, modification, or leakage. Therefore, ASR users need to ensure that they have a valid legal basis and a clear contract with the service providers that specify the terms and conditions of the data processing, as well as the security and privacy guarantees.

A third issue with ASR is that voice data may contain sensitive or special categories of personal data under the GDPR, such as racial or ethnic origin, political opinions, religious beliefs, health conditions, or biometric identifiers. These types of data require a higher level of protection and a stricter legal basis for processing, such as explicit consent or substantial public interest. Therefore, ASR users need to be careful about what kind of voice data they collect and process, and avoid collecting or processing any unnecessary or excessive data that may infringe on the speakers’ rights and freedoms.

ASR Data Breaches

There have been a number of ASR data breaches too.

It was reported by The Guardian that “Google workers can listen to what is said on its AI home devices“. This became apparent after certain recordings were leaked.

Again, The Guardian reported that Amazon could well be invading privacy in its article: ‘Alexa, are you invading my privacy?‘. In the US, Amazon is being sued over child recordings.

Even, Apple, who prides itself on data privacy and security, said sorry that workers listened to Siri voice recordings.

It raises the question whether the big tech companies could actually be trusted to be open and transparent in relation to data and privacy. 

Here are a couple of articles which deal with GPDR data and security concerns:

  • Voice Recognition Tech Privacy and Cybersecurity Concerns. This article discusses the challenges and risks of using voice recognition technology in various applications and domains. Especially in the context of the GDPR and other relevant laws and regulations.
  • Data, privacy, and security for Speech to text. This article provides some high-level details on how speech-to-text processes data provided by customers. Also what privacy and security obligations need to comply with. It also reminds people that they are responsible for obtaining all necessary permissions for processing the data. Including any licenses, permissions or other proprietary rights required for the content they input into the speech-to-text service.

Summary

By looking at the price of transcription services it is easy to think that they are expensive. But, the more you think about the pros and cons, the more you start to realise that human transcription services are efficient. But they are incredibly affordable and probably the easiest ones to nail down on quality, data protection and security.

Essentially, everything boils down to the quality of service. A human transcription service will always provide a higher degree of accuracy.

Remember, you get what you pay for. So, if you want the best service possible from the start, then be prepared to invest in it.

About OutSec

OutSec is the UK’s leading online transcription company whose business has grown substantially since 2002. We are one of the most successful transcription companies in the United Kingdom.

OutSec provides secure outsourced transcription services to the medicallegalproperty and surveyinguniversitiesmedia and interviewsadvisory boards, conferences & seminarsinventoriesfinancialcorporateHR, recruitment and Executive Search sectors.

Why is Dictation More Efficient than Typing?

Well, the simple fact is that we can all speak considerably faster than we can physically type:

“The average person types between 38 and 40 words per minute”.

A “good rate of speech ranges between 140 -160 words per minute.

In other words, dictation is up to four times faster than we can type. Therefore, simply dictating a document is more cost-efficient, giving you more time to dedicate your efforts elsewhere in your business.

Therefore why not add OutSec as a business continuity option for your business? Accounts are free, you pay on a per-minute basis (rounded to the nearest minute) on a pay-as-you-go basis, with no contracts or minimum spend. What do you have to lose? Why not open an account today!

Picture Attribution:

Ai Brain Vectors by Vecteezy

Scroll to Top