Re-visiting Captions and Automatic Speech Recognition results.

YouTube video in German with automatic speech recognition captions translated into English https://youtu.be/2c9qaNqSOS0

Around this time last year (November 2021) we discussed the challenges that are linked to the evaluation of captions generated by the use of Automatic Speech Recognition (ASR) in two blogs ‘Collaboration and Captioning’ and ‘Transcripts from Captions?’

We mentioned other methods in comparison to word error rates where any mistakes in the caption are caught and provided as a percentage. This does not take into account how much is actually understandable or whether the output would make sense if it was translated or even used in a transcript.

“Word Error Rate (WER) is a common metric for measuring speech-to-text accuracy of automatic speech recognition (ASR) systems. Microsoft claims to have a word error rate of 5.1%. Google boasts a WER of 4.9%. For comparison, human transcriptionists average a word error rate of 4%. “

Does Word Error Rate Matter? HELENA CHEN Jan 2021

Helena Chen in her blog goes on to explain how to calculate WER

Word Error Rate = (Substitutions + Insertions + Deletions) / Number of Words Spoken

Where errors are:

  • Substitution: when a word is replaced (for example, “shipping” is transcribed as “sipping”)
  • Insertion: when a word is added that wasn’t said (for example, “hostess” is transcribed as “host is”)
  • Deletion: when a word is omitted from the transcript (for example, “get it done” is transcribed as “get done”)”

She describes how acoustics and background noises affect the results, although we may be able to cope with a certain amount of distraction and understand what is being said. She highlights issues with homophones and accents that may fail with ASR, as well as cross talk where ASR can omit one speaker’s comments, that could affect the results. Finally, there is the complexity of the content and as we know with STEM subjects and medicine this may mean specialist glossaries are needed.

Improvements have been made by the use of natural language understanding with the use of Artificial Intelligence (AI). However, it seems that as well as the acoustic checks, we need to delve into the criteria needed for improved comprehension and this may be to do with the consistency of errors that can be omitted in some situations. For an example a transcript of a science lecture may not need all the lecturer’s ‘ums’ and ‘aahs’, unless there is the need to add an emotional feel to the output. These would be counted as word errors, but actually do not necessarily help understanding.

There may also be the need to flag up the amount of effort needed to understand the captions. This involves the need to check output for automatic translations, as well as general language support.

Image by Katrin B. from Pixabay

Roof us, the dog caught the bull and bought it back to his Mrs

Dak ons, de hond ving de stier en kocht hem terug aan zijn Mrs  

Babelbar for reading aloud and changing the look of Facebook, Twitter and Google docs

“Babelbar works with Facebook, Twitter and Google docs.  It is useful if you do not have your own text to speech program.”

babelbar

 

google docs

Babelbar is an extension or add-on for internet Explorer, Firefox, Chrome and Safari and appears over the web page.  You need to highlight the text first then it will speak.  It can change the colour background, font size and translate text.

 

 

 

Attendee at Accessing the Higher Ground Conference

Google translator for unknown words

Google translate

English is not my first language so I use Google translator to help me quickly find the translation for unknown words.  It is not always right but you can use the dictionary and there is way of listening to the word.  I sometimes put the word back into the left side to see what happens – Google can be set to automatically recognise the language you want and remembers your choice so when you return to the page it is very quick.

You can also add Google translate as a bookmark just for your language.

Web Science Student