Automated Captioning for Enhanced Education
Aims and Scope of Project
- Help to solve the transcription bottleneck in production of high-quality captioned content through use of voice recognition technology
- Create flexible and effective workflows for the production of captioned content, leveraging OSU resources and minimizing the imposition on faculty, staff, and other content providers
- Promote "best practices" for delivery of captioned content—on the web, on static media, etc.
- Create highly effective methods for searching captioned content
Aims and Scope of Project: Elaboration
Our proposed automated captioning project and research area seeks to deliver efficient and cost effective captioning of audio and video materials, in order to enhance learning for everyone and increase the availability and usability of new and already produced/archived content.
Transcription is one of the bottlenecks in producing high-quality captioning. High-quality manual transcription can cost up to $200 an hour, depends on the availability of contracted stenographers, and is naturally more error prone because of transcriptionist unfamiliarity with specialized vocabularies.
Our proposed project investigates the potential for accurate transcription via use of voice recognition software. It also proposes workflows for production and delivery of captioned content and suggests possible "best practices" delivery methods, while making use of available campus technologies and resources.
We would like to do initial work on the project over Summer, 2006. We hope to attract a few instructors who are willing to participate and have some of their in-class presentations audio and video recorded. We will experiment to try to determine what is the most effective way to use voice recognition software to generate easily editable transcriptions of presentations. In workshops this Summer and in the Fall, the WAC will be experimenting both with recording for later automated transcription and live captioning.
We will look at workflows for moving from transcription to captioned content and try to hone one or more model solutions, leveraging local resources such as OSU Streaming Media.
The project group will investigate possibilities for searching on captioned archives and try out a number of templates to determine an optimal web delivery format. Delivery formats can begin to be developed and tested shortly. Search capabilities and results delivery methods will take more time to develop.
Simultaneous with the in-house solutions mentioned above, the group will try to solicit interest in hosting a trial deployment of IBM ViaScribe. We hope to bring an IBM representative to campus in June, 2006. We believe it makes sense to simultaneously test in-house and vendor solutions to weigh costs and benefits.
Why Automated Captioning?
- Increased Access to Educational Materials
- For deaf and hearing impaired instructors and learners
- For learners in noisy venues
- For learners in venues where quiet must be maintained
- For "multi-modal" learners
- For enhancing language acquisition among non-native and low-literacy learners
- For traditional students who miss course content or who inaccurately record or misplace notes
- Study aid for all students
- Improved Usability: Searchable and Indexed Content
- Text transcripts can be indexed and linked to video and audio clips, maximizing availability of educational resources both inside and outside of the university
- Search query results can link to specific phrases in audio and video content, allowing users to pinpoint and play parts of particular interest within longer clips or search on meta-data descriptions of video or audio content
- Maximized Search Engine Optimization (SEO)
- Within the university means greater accuracy in searching for materials
- Outside the university means greater Return on Investment (ROI), as educational materials become more efficiently available to a wider audience
- The "Right Thing To Do"
- Provides essential accommodation for people with hearing impairments
- Satisfies legal and policy requirements for accessibility of educational materials on campus and for distance learning
- Takes Advantage of Underutilized Resources
- Captioning of archived video and audio archived materials. Automated transcription and development of rigorous captioning workflows and delivery would make it feasible to caption, over time and with established priorities, the vast library of uncaptioned video, audio, and film in the OSU archives. OSU Streaming services estimates that less than 4% of materials on the streaming server is captioned. Making this material available to search could be invaluable to research and would improve re-use of learning objects.
Why Now?
- Captioning is a crucial delivery mode as the electronic delivery of educational materials grows (think: "Podcasting") and distance learning becomes more prevalent
- Captioning is a key element in the Social Responsibility (SR) profile of the university, increasing the visibility of OSU as a premier institution
- Captioning enhances OSU's reputation as a technology innovator
How Much?
How much will it cost? This is a difficult question to answer. A pilot test run Spring quarter, 2006, had mixed results. Ultimately, the pilot was cheaper than paying a service for transcription and production of captioned video, but only slightly cheaper: All labor and materials for three hours of captioned video cost $800. Professionally transcribed and produced, the same would have cost a minimum of $900 and probably closer to $1200. As our methods for recording, transcribing, editing, time-coding, and delivering become rigorized into a workflow, these costs will go down significantly. There is also the possibility of incorporating a vendor solution. The initial costs would be high if we went this route, but, over time, the reduced student, faculty, or staff labor could be economically convincing. It is also likely that there would be a greater rate of adoption with the more polished, less labor intensive vendor solution.
How much time and effort will be required to produce the captioning? We don't know yet. But an idealized goal would be two hours of transcription editing for each hour of in-class presentation.
An in-house solution would require significantly more manual labor by the lecturer and in post-lecture editing and media file creation than would a vendor solution. For this reason, we believe adoption, continued usage, and user satisfaction would likely be greater with a vendor solution than an in-house solution. For example, IBM ViaScribe, which we consider on the vendor solutions page, synchronizes captioning with audio, presentation, and video at the time of lecture delivery. This radically cuts post-lecture work for the lecturer. Also, production of the media files for delivery on the web (or elsewhere) is almost fully automated.
How Might We Proceed?
IBM's ViaScribe licensing model is based on hours of captioned output. Pricing schedules are negotiated per enterprise. IBM claims it will significantly undercut the cost of other equivalent solutions. Hence, we believe it would be a good idea to pilot ViaScribe simultaneously with our in-house solutions to determine the real costs of each.
As we mentioned, an in-house solution could be piloted to come up with a rigorous workflow. Staff would be allocated to run the project. It probably would make the most sense to shadow a small cohort of instructors, attempting to caption all or a significant portion of their lectures and/or workshops over a quarter. This would produce data from which we could extrapolate final costs—primarily in terms of staff hours, as software costs are negligible for in-house transcription and captioning solutions: Dragon Preferred Edition (Windows) and iListen (Mac) voice recognition software, which would be used for automated transcription, both retail for less than $200, and education and bulk purchase discounts are available for Dragon. The free tool, MAGpie (from NCAM), could be used for generating QuickTime timecode files, SMIL, and SAMI files to be used for delivering content, QuickTime Pro, required for use of MAGpie, retails for $30.
Search, one of the key benefits of the captioning project, would likely need to be a research project within the larger captioning project. OSU would need to author its own search feature. We have yet to see any commercial products that would suffice. In industry, both Google and Yahoo! have captioned content search initiatives. Neither link to specific points within the video or audio content, however. Pinpoint keyword search—searches that deep-link into a video or audio file to start at the searched for keywords—is highly desirable. We know it is technically possible, as well. In addition, video and audio stored in OSU archives would have associated searchable meta-data. Cost for such functionality would be measured in development time.
ViaScribe output that targets Internet Explorer and Windows Media Player allows for a user to highlight a passage in the caption text and have the audio skip to that point. This functionality would likely make implementing pinpoint search relatively easy. However, only implementing it for IE users is problematic.