Automated Captioning for Enhanced Education

Aims and Scope of Project

Aims and Scope of Project: Elaboration

Our proposed automated captioning project and research area seeks to deliver efficient and cost effective captioning of audio and video materials, in order to enhance learning for everyone and increase the availability and usability of new and already produced/archived content.

Transcription is one of the bottlenecks in producing high-quality captioning. High-quality manual transcription can cost up to $200 an hour, depends on the availability of contracted stenographers, and is naturally more error prone because of transcriptionist unfamiliarity with specialized vocabularies.

Our proposed project investigates the potential for accurate transcription via use of voice recognition software. It also proposes workflows for production and delivery of captioned content and suggests possible "best practices" delivery methods, while making use of available campus technologies and resources.

We would like to do initial work on the project over Summer, 2006. We hope to attract a few instructors who are willing to participate and have some of their in-class presentations audio and video recorded. We will experiment to try to determine what is the most effective way to use voice recognition software to generate easily editable transcriptions of presentations. In workshops this Summer and in the Fall, the WAC will be experimenting both with recording for later automated transcription and live captioning.

We will look at workflows for moving from transcription to captioned content and try to hone one or more model solutions, leveraging local resources such as OSU Streaming Media.

The project group will investigate possibilities for searching on captioned archives and try out a number of templates to determine an optimal web delivery format. Delivery formats can begin to be developed and tested shortly. Search capabilities and results delivery methods will take more time to develop.

Simultaneous with the in-house solutions mentioned above, the group will try to solicit interest in hosting a trial deployment of IBM ViaScribe. We hope to bring an IBM representative to campus in June, 2006. We believe it makes sense to simultaneously test in-house and vendor solutions to weigh costs and benefits.

Why Automated Captioning?

Why Now?

How Much?

How much will it cost? This is a difficult question to answer. A pilot test run Spring quarter, 2006, had mixed results. Ultimately, the pilot was cheaper than paying a service for transcription and production of captioned video, but only slightly cheaper: All labor and materials for three hours of captioned video cost $800. Professionally transcribed and produced, the same would have cost a minimum of $900 and probably closer to $1200. As our methods for recording, transcribing, editing, time-coding, and delivering become rigorized into a workflow, these costs will go down significantly. There is also the possibility of incorporating a vendor solution. The initial costs would be high if we went this route, but, over time, the reduced student, faculty, or staff labor could be economically convincing. It is also likely that there would be a greater rate of adoption with the more polished, less labor intensive vendor solution.

How much time and effort will be required to produce the captioning? We don't know yet. But an idealized goal would be two hours of transcription editing for each hour of in-class presentation.

An in-house solution would require significantly more manual labor by the lecturer and in post-lecture editing and media file creation than would a vendor solution. For this reason, we believe adoption, continued usage, and user satisfaction would likely be greater with a vendor solution than an in-house solution. For example, IBM ViaScribe, which we consider on the vendor solutions page, synchronizes captioning with audio, presentation, and video at the time of lecture delivery. This radically cuts post-lecture work for the lecturer. Also, production of the media files for delivery on the web (or elsewhere) is almost fully automated.

How Might We Proceed?

IBM's ViaScribe licensing model is based on hours of captioned output. Pricing schedules are negotiated per enterprise. IBM claims it will significantly undercut the cost of other equivalent solutions. Hence, we believe it would be a good idea to pilot ViaScribe simultaneously with our in-house solutions to determine the real costs of each.

As we mentioned, an in-house solution could be piloted to come up with a rigorous workflow. Staff would be allocated to run the project. It probably would make the most sense to shadow a small cohort of instructors, attempting to caption all or a significant portion of their lectures and/or workshops over a quarter. This would produce data from which we could extrapolate final costs—primarily in terms of staff hours, as software costs are negligible for in-house transcription and captioning solutions: Dragon Preferred Edition (Windows) and iListen (Mac) voice recognition software, which would be used for automated transcription, both retail for less than $200, and education and bulk purchase discounts are available for Dragon. The free tool, MAGpie (from NCAM), could be used for generating QuickTime timecode files, SMIL, and SAMI files to be used for delivering content, QuickTime Pro, required for use of MAGpie, retails for $30.

Search, one of the key benefits of the captioning project, would likely need to be a research project within the larger captioning project. OSU would need to author its own search feature. We have yet to see any commercial products that would suffice. In industry, both Google and Yahoo! have captioned content search initiatives. Neither link to specific points within the video or audio content, however. Pinpoint keyword search—searches that deep-link into a video or audio file to start at the searched for keywords—is highly desirable. We know it is technically possible, as well. In addition, video and audio stored in OSU archives would have associated searchable meta-data. Cost for such functionality would be measured in development time.

ViaScribe output that targets Internet Explorer and Windows Media Player allows for a user to highlight a passage in the caption text and have the audio skip to that point. This functionality would likely make implementing pinpoint search relatively easy. However, only implementing it for IE users is problematic.