Over the years captioning has become easier for the non-professional with guidance on many platforms including YouTube and a blog about “Free Tools & Subtitle Software to Make Your Video Captioning Process Easier” from Amara. This does not mean that we are good at it, nor does it mean that it does not take time! However, artificial intelligence (AI) and the use of speech recognition can help with the process.
Nevertheless, as Professor Mike Wald said only 11 months ago in an article titled Are universities finally complying with EU directive on accessibility? “Finding an affordable way to provide high quality captions and transcripts for recordings is proving very difficult for universities and using automatic speech recognition with students correcting any errors would appear to be a possible solution.”
The idea that there is the possibility of collaboration to improve the automated output at no cost is appealing and we saw it happening with Synote over ten years ago! AI alongside the use of speech recognition has improved accuracy and Verbit advertise their offerings as being “99%+Accuracy”. but sadly do not provide prices on their website!
Meanwhile Blackboard Collaborate as part of their ‘Ultra experience’ offers attendees the chance to collaborate on captioning when managed by a moderator, although at present Blackboard Collaborate does not include automated live captioning. There are many services that can be added to online video meeting platforms in order to support captioning such as Otter.ai. Developers can also make use of options from Google, Microsoft, IBM, and Amazon. The TechRepublic describe 5 speech recognition apps that auto-capton videos on mobiles. Table 1. shows the options available in three platforms often used in higher education.
Caption Options | Zoom | Microsoft Teams | Blackboard Collaborate |
Captions – automated | Yes | Yes | Has to be added |
Captions – live manual correction | When set up | When set up | When set up |
Captions – live collaborative corrections | No | No | No |
Captions Text Colour adaptations | Size only | Some options | Set sizes |
Caption Window resizing | No | Suggested, not implemented | Set sizes |
Compliance – WCAG 2.1 AA | Yes | Yes | Yes |
It is important to ask if automated live captioning is used with collaborative manual intervention, who is checking the errors? Automated captioning is only around 60 – 80% accurate depending on content complexity, quality of the audio and speaker enunciation. Even, 3Playmedia in an article on “The Current State of Automatic Speech Recognition” admits that human intervention is paramount when total accuracy is required.
Recent ‘Guidance for captioning rich media’ for Advance HE, highlights the fact that the Web Content Accessibility Guidelines 2.1 (AA) require “100% accurate captioning as well as audio description.” They acknowledge the cost entailed, but perhaps this can be reduced with the increasing accuracy of automated processes in English and error correction can be completed with expert checks. It also seems to make sense to require those who have the knowledge of a subject to take more care when the initial video is created! This is suggested alongside the AdvanceHE good practice bullet points such as
“…ensure the narrator describes important visual content in rich media. The information will then feature in the captions and reduces the need for additional audio description services, benefiting everyone.”.
Let’s see how far we can go with these ideas – expert correctors, proficient narrators and willing student support!