Captioning and Audio Description Tips

The following is not a tutorial on captioning and audio description. We provide the information below more to familiarize users of our JWPC player with these basic accessibility features, provide some general guidelines, and give suggestions for further information and links to tools. However, if you are new to captioning and audio description you may find useful material here.



JWPC can read in a text file and display it as pop-on captions. The text file needs to be in one of two timed-text formats: SubRip (SRT) or Timed-Text Markup Language (TTML), the XML format developed by the W3C. If you look at the skate-or-die.xml file in the media directory the download of JWPC, you can see a working example.

TTML contains one or more DIV elements that wrap a bunch of P (paragraph) elements, each of which contains a timed segment (or “chunk”) of the transcript. We recommend one or two lines, only, for each chunk. You can break a paragraph into two lines by inserting a BR tag into it. Here is a snippet:

<p begin="0:01:03.17" end="0:01:07.05">And you get points<br/>for the variety of your routine,</p>

You will not have to type TTML yourself. There are tools to help you create it, covered below.

The basic steps to getting captions to work in JWPC are:

  1. Get a transcript of the movie audio
  2. Chunk the transcript into logical and more or less semantically complete lines
  3. Add in speaker changes and sound-effect cues
  4. Convert your media
  5. Synchronize the chunked transcript with the audio and export to TTML
  6. Link up the TTML file with your video

1. Get a Transcript

To add captions to video, you first need an accurate transcript.

We prefer using speech recognition software to help dictate a transcript. The dominant speech recognition packages are Dragon Naturally Speaking (commercial, Windows), Windows 7 Speech Recognition (built into Windows 7), and MacSpeech Dictate (Mac). A person who as trained the software to recognize her voice echoes the spoken content of the video, and the software converts speech to text. The software enables rapid and accurate dictation, compared to typing unassisted.

Difficulty controlling audio playback can make SR-assisted transcription frustrating. To smooth the process, we use Express Scribe (Windows and Mac, free) playback software for transcription that gives you simple, single key controls for pausing, backing up, and advancing the audio. It will also slow down or speed up playback without distorting pitch. If you get serious about using SR-assisted transcription, Express Scribe can be set up to use foot pedals to control audio playback.

Two additional hints for SR-assisted transcribing: turn of automatic punctuation and use a wired microphone headset. You will tend to pause often when you are transcribing and the automatic punctuation features of speech recognition systems will interpret those pauses as punctuation. Wireless headsets can be great for Skype or other VOIP but we have found that the audio quality is usually not good enough for speech recognition. Just use a high-quality wired headset.

Another possibility is to farm out your audio to a transcription service, such as Casting Words, which, for a fee, will return a transcript in a couple of days.

In an academic setting, especially, a highly accurate transcript is key to effective captioning. So it may be best to produce the transcript yourself, rather than relying on a commercial service that may be unfamiliar with vocabulary and has less at stake.

However it is obtained, the transcript should be complete, accurate, and clean. Do not transcribe “uh”s, “um”s, and other meaningless verbal fillers (unless they are essential to the meaning or atmosphere) but do leave grammar and syntax as in the original, that is, do not “proofread” to correct grammar and syntax mistakes. In essense a transcript is: verbatim, minus fillers.

2. Chunk the Transcript

Most information on captioning recommends line lengths of between 32 and 40 characters.

Line breaks should occur at logical places, so that each line is as semantically complete as possible to make for easy reading.

Example of Good and Poor Transcript Chunking
Good Poor
The greatest trick
the devil ever pulled

was convincing the world
he didn't exist.
The greatest trick the
devil ever pulled was

convincing the world he
didn't exist.
Take your stinking paws off me,
you damn dirty ape.
Take your stinking
paws off me, you

damn dirty ape.

3. Add Speaker Changes and Cues

There are a number of conventions—though, no standards—on how to signal introduction of or changes in speaker and how to indicate sound-effects, or non-spoken “cues”—things such as laughter, dog barking, door creaks and indicators of how something is spoken, such as giggling. Below are examples of how we do it.

For speaker change or introduction:

>> PROF. SMITHE: Good day, class.
Today we're talking about dinosaurs...

For a sound-effect cue:

[wind moans outside]

For a speech indicator cue:

>> BILL [snickering]:
That gun's made of rubber!

4. Convert Media

JWPC needs video in Flash FLV or MP4 formats. The tools we use to produce TTML files need movies in Apple QuickTime MOV format. But you may have media only in WMV, 3GP, AVI or some other format. You also may need to convert to and from various audio formats to create audio descriptions or for other reasons, such as playback in Express Scribe (which will only read MP3 or WAV format).

There are many free and open source and a number of commercial products that will convert to and from various media. The quality of the interfaces and output of many of the available installable programs varies greatly.

We suggest you experiment with some of the available installable programs and scripts. For Windows users we recommend Any Video Converter. On Mac, try ffmpegX or HandBrake. All are free and rely on open source conversion libraries and are regularly updated.

5. Synchronize the Transcript and Export to TTML

Aside from getting a transcript, the most painstaking and labor intensive part of the caption process is synchronizing (aligning) the transcript with the movie audio. We recommend two applications that can help: MAGpie (for Windows and Mac, free) and MovCaptioner (Mac only, $100).

(Note: Intel Mac users will need to run MAGpie using Windows via Parallels, VMware Fusion, VirtualBox, or Bootcamp, since MAGpie (version 2 and above) has problems with Java on the Mac.)

Both MovCaptioner and MAGpie allow you to load a MOV file and your caption text. During movie playback you click a button to mark when in the movie your caption chunk should be displayed. The programs record the start and end times for each caption chunk. Both programs allow you to review the accuracy of your alignments and then export the caption track in a variety of formats, one of which is TTML.

To get a sense of the workflow, review the help documentation for using MAGpie.

6. Link Up the TTML File

Once you have the TTML file containing your captions and are satisfied with the line breaks, chunk breaks, and timings, simply change the attribute in the DIV for your JWPC instance, so that it points to your TTML file:


Audio Description

Audio description is an audio-only track that plays synchronously with the video, describing necessary visual content in order to help comprehension for users who cannot see the video.

Some things that might be described to enhance and clarify comprehension are:

In general, try to speak descriptions when there is a pause in the primary audio track, but speak over the primary track when required to add to the understanding of the video. The narrator's voice should be able to be easily distinguished from the primary audio.

Joe Clark has developed a strong set of standard techniques in audio description.

JWPC supports loading an audio description track in MP3 format. The track should be the same length as the video. Our advice is to record audio descriptions using a headset or high-quality microphone in a sound-isolated area. Descriptions can be recorded as separate tracks and then, for the purpose of loading them with the movie, merged into a single track using Audacity or other sound recording/editing software, which will generate silence for the pauses between audio descriptions.

Once you have the audio description, change the attribute value in the DIV for your instance of JWPC to point to your MP3 file:

  jwpc_audiodescfile = "/path/to/my/audio-description.mp3">