Enterprise-Level Vendor Solution: IBM ViaScribe
IBM ViaScribe is an enterprise-level automated media captioning solution. The instructor voice-trains ViaScribe, in a manner similar to training Dragon or other speech-to-text program, and then uses ViaScribe in the classroom. Its voice recognition can be trained, also, to understand archived audio and video materials, potentially allowing for transcription and captioning of OSU's extensive video and audio lecture archives.
Off the shelf, ViaScribe can output PowerPoint presentations into web pages, with synchronized and sequentially highlighted captioning. As it transcribes, it automatically breaks recognized speech into chunks appropriate for display within caption fields. Output of synced presentations to web format and to a variety of media player formats occurs with a "touch of a button."
Purchase of ViaScribe is usually a two-step process. First, the institution buys a limited-time license with a restricted number of physical deployments—for example, a six-month license with three or four software seats. The institution uses this period to assess the utility and adoption of the software. If the institution approves purchase of a longer-term license, terms are negotiated with IBM. IBM licensing charges per hour of captioned output and negotiates that and other service and equipment fees to attempt to undercut the cost of competing solutions.
Advantages Over In-House Solution
- Automated synchronization. Chunking of transcription and synchronization of transcript to video or audio is handled by the program without after-the-fact manual syncing.
- Simpler training. Editing and training tools appear simpler to use than in Dragon. However, this may be functionality borrowed from IBM's speech recognition software, ViaVoice. (We have not tested ViaVoice yet. This point will be updated to reflect our findings.)
- Fully automated export. The export of result files is completely automated and ViaScribe exports to a very wide variety of media player and other formats. The ability to export directly into a web-viewable "presentation" format is available only with ViaScribe (see the unedited, frames-based slide show and ViaScribe output from the OSU demonstration on June 19, 2006 by IBM researcher, Guido Corona, in the sidebar).
- Highlighted caption text. Synchronized highlighting of text is potentially beneficial to learners with attention disorders and reading disabilities, as well as being helpful to non-native speakers learning the language.
- Simplicity for end-user. ViaScribe is an all-in-one solution, from the perspective of the instructor. A Dragon-based or other voice recognition software-based solution would need to tie together a variety of applications.
- Short time to production. ViaScribe has a quicker time to production due to high level of automation. This would probably increase adoption both by the lecturers who use ViaScribe to caption their presentations and by students and other content consumers (especially in cases where timeliness of content delivery is essential.)
- Simplifies phrase/pinpoint search. The Internet Explorer-based presentation allows the user to simply highlight a passage in the caption content and immediately skip to that point in the presentation. This facility would likely make it easier to implement useful search features.
- Matching. IBM is currently perfecting an addition to ViaScribe, which is in the latest versions of the product, whereby transcripts can be synced with untrained voice tracks (from video or audio files) after the fact. The facility performs "matching", as opposed to recognition, of speech. This could be highly useful in situations where a transcript is available prior to the presentation—for example, with presentation of conference papers or other prepared speeches or with films for which there exists a transcription.
Disadvantages Compared to In-House Solution
- Cost. ViaScribe would cost more initially than an in-house solution. (In the long run?)
- Some features optimized only for IE. Current web-based presentation output with highlighting is an IE-only proposition, cutting out most Mac and Linux users. (They would access the content through a different player, such as QuickTime or Real.) IBM says they will work to target other browsers in subsequent releases.
Other Potential Problems
- Best for particular pedagogical modes. This one applies similarly to all solutions rooted in voice recognition, but as voice recognition is utterly central to the ViaScribe product, we are putting it here as a disadvantage. Voice recognition is most appropriate for a certain style of lecturing. Voice recognition software tends to do best when the speaker speaks slowly and clearly. In many cases, this is an idealization of the reality of lecturing. Also, questions from the audience require the speaker to repeat the question so that it can be captured by voice recognition. Though there are certainly pedagogical advantages to this style, again, it may be unrealistic to expect it as a common mode of delivery and does not fit with all teaching styles.
- Optimal for use with PowerPoint-based lectures. One of the major benefits of ViaScribe is its ability to work with PowerPoint. Not all lectures are delivered in this fashion. Dragon Naturally Speaking, iListen (for Macintosh) or IBM's own ViaVoice all do the recognition part equally well. In fact, ViaVoice is the recognition engine for ViaScribe. When this is considered, the real value-adds of ViaScribe come in its ability to synchronize captioning, sync with PowerPoint, and automate output to the various media players. These are potentially big value-adds, but they need to be weighed against the cost of the software.
- Training commitment. Like any voice recognition software, to be truly effective ViaScribe must be trained—not only to the speakers voice but also to "terms of art". ViaScribe would need to learn lexicons appropriate to the lecturer's subject matter. This requires a significant time commitment from the lecturer.
- Too much automation? To some degree, automating the process of captioning distances the producer from her product. If everything is seemingly handled by the tool one is using, the producer may be less likely to closely examine the end result. After-the-fact editing will be a reality for almost all uses of ViaScribe, as it would be for any other solution. Will ViaScribe users be willing to perform the necessary corrections or will they expect more from the tool than it in fact can offer? In addition, the automatic (or seemingly automatic) mode of production may have the negative effect of desensitizing users and causing them to ignore usability and accessibility concerns. For example, cuing the viewer to non-speech information such as speaker changes, background sounds, pauses or silence, etc., are very important to both usability and accessibility.