Process Overviews

Sketch of a Potential Workflow

(adapted from Eric Todd's "Captioning Streaming Media: Anticipating 508 Compliance Needs")

  1. A presentation is videotaped or audio-only is recorded into a digital recorder. The presenter wears a noise-cancelling headset so that high quality, clean audio is produced.
  2. The presenter, before or after the presentation, trains Dragon using the same equipment as used for the presentation. (Consistent audio helps with the transcription process.)
  3. Video is edited and output to the final form. A resolution of 320x240 is often used. Experience shows the Sorenson codec as best for video compression.
  4. A preliminary transcript is produced using Dragon. The transcript is then corrected, using the Dragon editing facilities. (Correcting inside of Dragon helps refine the voice profile so accuracy increases.)
  5. The transcript is put into some combination of MS Word, PDF, text, and HTML formats. HTML format is most easily read by screen readers and so should be a favored target format.
  6. The transcript is "chunked." (line breaks are inserted into the text file corresponding to each screen of captioning). CPC automates this somewhat. We are also developing a simple "chunking" program.
  7. MAGpie or CPC is used to synchronize the captions with the SMTPE time code on the video or audio.
  8. MAGpie or CPC is used to export both SAMI and QT/Real timecode files.
  9. SMIL, ASX, and Flash-player files are created to be used by the various players. (SMIL, Flash-player, ASX, and SAMI are all XML, or XML-like).
  10. The ASX (for WMP), SMIL, and Flash player XML files "glue together" the time code and video files.
  11. The final files and meta-data file are uploaded to the server and the presentation is registered in the server's database.

A Way To Implement Search

Jon Udell, writing about "MP3 Sound Bites" on the O'Reilly Network web site, discusses a means for loading audio files from some point beyond the beginning of the file—starting in the middle of a song or podcast.

A similar idea to Udell's could be used to implement search within an archive of captioned video and audio. A user would search on a phrase or word within the archive. Instead of simply returning the text snippet, the search engine would take the timecodes preceding and following the timecode snippet returned from the search. The search result is within context of timecode snips surrounding the searched for phrase. Below the text snippet, with the searched for words highlighted, one or more links to the movie or audio clip, which is cued to start playback at the timecode in the snippet.

In addition to the facility to search on the text contents of video or audio, the user would also be able to search on meta-data stored for each clip. Meta-data would include the title of the presentation or lecture, its date, its quarter, the speaker, course name and number, location, etc. This information could be stored either in a database or, perhaps more efficiently, in XML files in the same directory as the movie or audio files (one directory per clip). The search indexing engine would build its indexes from caption files and meta-data files.

The search implementation would need to do some parsing of the caption files to extract the captions for display. It would also produce, on the fly, a temporary, truncated caption file and SMIL/SAMI and ASX files—files used for serving QuickTime, Real, and Windows Media content.

The pinpoint clip search might work like so:

  1. A server side program calculates the duration of the movie file from its associated caption file and maps the start and end times extracted to a byte range.
  2. The program uses the HTTP 1.1 Range header to ask the remote server to reach the appropriate distance into the requested movie file and stream back the indicated range.
  3. A file handle for this custom clip is inserted, on the fly into a temporary SMIL or ASX file.
  4. The time code sections of interest in the caption file are extracted, their time stamps are rebased, assuming a start time of 00.00.00:00.
  5. A temporary caption file is generated and inserted into the temporary SMIL or ASX file.