Moving on withTranscripts

Laptop and notepad on the laps of students in a lecture

Over the years researchers have shown how it is possible to have live interactive highlighted transcripts without character or line restrictions, such as is needed with captions. This is only possible when using technology, but with many more students using tablets, mobiles and phones during lectures it is surprising to find how few lecture capture systems offer this option.

It has been shown that physically writing notes by hand can aid retention and using laptops etc in lectures means there is access to all the other distractions such as social media and emails! However, having the availability of a transcript that provides interaction allows for key points to be selected and annotation improves retention for those students who find it hard to take notes whether by hand or using technology (Wald, 2018).

Systems that also offer transcript annotation linked to the presentation slides, intergrated with the ability to make personal notes alongside the synchronised text, are hard to find. Ways to correct words as you hear or see them, where there are subject complexities can also be difficult.

As was described in our last blog it is clear that all the corrections needed, tend to be measured by different forms of accuracy levels, whether it is the number of incorrect words, ommissions and substitutions. Further work on the NLive transcript has also shown that where English is not a first language those manually making corrections may falter when contractions and conditional tense are used and if the speaker is not a fluent English speaker, corrections can take up to five times longer (according to a recent discussion held by the Disabled Students’ Commission on 6th December).

Difficulties with subject related words have been addressed by CaptionEd with related glossaries, which is the case with many specialist course captioning offerings where companies have been employed to provide accurate outputs. Other companies, such as Otter.ai and Microsoft Teams automatically offer named speaker options which is also helpful.

Professor Mike Wald has produced a series of interesting figures as a possible sample of what can happen when students just see an uncorrected transcript, rather than actually listen to the lecture. This is important as not all students can hear the lecture or even attend in person or virtually. It is also often the case that the transcript of a lecture is used long after the event. The group of students he was working with six years ago found that:

  • Word Error Rate counts all errors (Deletions, substitutions and insertions in the classical scientific way used by speech scientists): WER was 22% for a 2715 word transcript.
  • Concept Error Rate counts errors of meaning: This was 15% assuming previous knowledge of content (i.e. ignoring errors that would be obvious if student knew topic content) but 30% assuming no previous knowledge of content.
  • Guessed error rate counts errors AFTER student has tried to correct transcript by them ‘guessing’ if words have errors or not: there was little change in Word Error Rate as words guessed correctly were balanced by words guessed incorrectly (i.e. correct words that student thought were incorrect and changed).
  • Perceived error rate asks student to estimate % errors: Student readers’ perception of Word Error Rate varied from 30% – 50% overall and 11% – 70% for important/key words: readers thought there were more errors than there really were and so found it difficult and frustrating.
  • Key Errors (i.e. errors that change meaning/understanding) were 16% of the total errors and therefore would only require 5 corrections per minute to improve Concept Error Rate from 15% to 0% (speaking rate was 142 wpm and there were approx 31 errors per minute) but it is important to note that this only improves the scientifically calculated Word Error Rate from 22% to 18%.

This is such an important challenge for many universities and colleges at the moment, so to follow on from this blog you may be interested to catch up with the transcript provided from the Disabled Students’ Commission Roundtable debate held on 6th December. One of the summary comments highlighted the importance of getting the technology right as well as manual support, but overriding this all was the importance of listening to the student voice.

Finally, if you ever wonder why speech recognition for automated captioning and transcription still fails to work for us all, have a look at a presentation by Speechmatics about AI bias, inclusion and diversity in speech recognition . An interesting talk about using word error rates, AI and building models using many hours of audio with different phonetic structures to develop language models that are more representative of the voices heard across society.

Guidance for captioning rich media from Advance HE (26/02/2021)