HTML5 Video

December 23, 2010

<video>, Accessibility and HTML5 Today

[Photo: <video> Accessibility]

Back in July of 2009, I wrote a blog post spurred on by a dinner conversation with my friend Bruce Lawson. Since then, I’ve seen a few instances where people have pointed to that posting as important to understanding the issue of accessibility and video in HTML5. A lot has changed however since I wrote that piece, and I’ve been meaning to update that information for some time now. A recent email thread amongst some friends crystallized that requirement, and the following is adapted from the email note I wrote to that thread.

Accessibility in Video and Audio is tricky – we are dealing with multi-media content here, so ‘full’ accessibility becomes significantly more complex, as ‘accessibility’ for one user/user-group is not the same as the requirements for all user-groups: obvious when stated, but best kept in mind as we discuss media in HTML5. The following is *my* understanding of where things currently stand.

User Needs:

The media sub-team of the Accessibility Task Force for HTML5 (of which I am the Co-Chair) have (hopefully) identified all of the user-needs we could come up with, and it is an extensive list. The following URLs document that progress:

Which then led to us ‘grouping’ these needs into two basic constituencies:

See also:

It is important to note that this work is a collection of both author requirements as well as player (browser/user-agent) requirements. We have asked on multiple occasions for further comments (to ensure we’ve not missed anything) with no other feedback, so at this time it is presumed ‘complete’, however please let me know if you believe something is missing.

It is also important to underscore that “fallback” for <video> and <audio> is not the same as ensuring ‘accessibility’ for video or audio – it is simply intended for legacy browsers that would not support HTML5′s new elements. The best that we can say (when ‘teaching’ about this) is that it is similar to the old <noscript> construct (minus the need for an element), or the ‘fallback’ we once used to provide for Frames (and please, not “Your browser sucks, get a better browser”). While this fallback should be informative and ‘accessible’ – it is not intended to meet accessibility requirements or needs.

The <track> Element:

The <track> element
(, as a child element of <video> and <audio>, is the means where authors can specify alternatives or more correctly supporting content to the multi-media content. In the sub-team, we have taken to informally referring to the ‘video’ as the Primary Media Resource, with the alternative content being known as the Alternative Media Resource.

For now, <track> is the way we can reference time-stamped texts [I will return to that in a bit] that could/would include captions, but also sub-titles (i18n), extended text descriptions, and other potential timed text files; <track> ‘inserts’ the supplemental content into the DOM tree as children elements to <video>/<audio>. (If you have ever worked with Flash based video this is a similar authoring pattern to the <param> elements used in that environment.)

The <track> element takes the following attributes: SRC, KIND (caption, subtitle, etc.), SRCLANG, CHARSET, and LABEL, with the KIND attribute being the most important for accessibility needs. It is unclear at this time whether <track> will/could also reference supplemental multimedia content such as audio described content/extended described content, ‘picture-in-picture’ sign language files, etc. – this is on our current working agenda to be resolved.

For those who may already know this, my apologies, but a quick primer of the current HTML5 media formats is important at this time.

We currently have 3 media formats that are being discussed: MP4, OGV, and WebM. These are ‘wrapper’ formats that contain the encoded videos (using H.264 for MP4, Theora for OGV, and VP8 for WebM), but inside those wrapper formats other ‘content’ can be enclosed, including those timed-text files, meta-data files, etc. We can already do this today, and this is in fact how captioned video for iPhones, iPads, etc. is currently provided: the ‘bundling’ is a post-production process done in tools such as Final Cut Pro, QuickTime Pro etc. (see:

For video content that is ‘complete’ in this post-production process there is an API which “looks” inside the wrapper, and extracts/maps the bundled supplemental content to the <track> DOM node(s). For accessibility, this is also the ‘better’ authoring practice, as it ensures that the supplemental content remains bundled with the video (when re-purposed or shared by other sites). However, since this post-production is not always viable for all content authors (in part because there are not a lot of tools that make this simple to do today), actually authoring <track src=> (etc., etc.) is the means to associate the supplemental content as child elements of <video>, as it is being ‘hand-authored’ into the DOM. (The reason this is less optimal is that secondary users might ‘capture’ the video file, but not bother to capture the supplemental files when “copy/pasting”, thus degrading the general accessibility of those media files).

At this time, it is envisioned that a ‘menu’ of all track content would be expose-able to the end user in a fashion *similar to* an unordered menu list: again, since the <track>s are children elements of <video> (and <audio>) this appears fairly uncontroversial, although not yet implemented in any browser. (One possible solution is a focusable ‘drop-down’ included into the video controls, along with the basic start, stop and volume controls). There are already a few examples authored by Silvia Pfeiffer (under contract to Mozilla) that illustrate this method. (Note: As this is a Proof-of-Concept example, and Silvia produced this under contract to Mozilla, it works best in Firefox.)

Timestamp Formats (WebSRT, TTML, etc.):

Just as the codec ‘issue’ is still working itself out, so too it appears the time-stamp format issue. I have long predicted that this was going to be the trickiest issue we would face, and sadly it appears I was correct. We have currently 2 main formats being contemplated, and each has its strengths and weaknesses: no one format is deemed ‘complete’ as determined by our User Requirements – another exercise we undertook and have reported:

This is further complicated by the fact that the browsers today (for the most part) are leaning towards WebSRT (a viable candidate if it can be modified to fill the existing gaps identified) while commercial content producers are already moving towards a profile of TTML called SMPTE-TT ( There is also a move (currently being alluded to by Microsoft) that browsers could/should support more than one time-stamp format.

As a side-bar, I have a particular frustration with the Society of Motion Picture and Television Engineers (SMPTE) for not being involved in our discussions even though they were likely aware of them, and frustratingly, the full SMPTE-TT spec is not ‘freely’ available – they are charging $75.00 for a copy ( It is both arrogant and self-restricting, as it further distances their efforts from the non-commercial communities of web video producers. So much for an open web!

The status of all of this is very much in rapid flux, so today I can’t state which format (if any) will be included as a base-line format in HTML5 – it may in fact be punted altogether in a fashion similar to the codec ‘issue’ – hardly the best option but one that still exists. I am working with others to strive to ensure that this is not the case, but fear that lines are hardening here as well… (One important consideration at this time is that WebSRT is not a W3C technology per se, although there is no reason that this could not change, and in fact there is some tentative talk of chartering a new Working Group to do something like this – however it is too early to state if in fact this will happen, and there might be some whining about it from some WHAT WGers along the way).

Other Outstanding Issues:

At this time there are a few other issues that we are aware of, but have not teased out fully. They mostly center around support for extended descriptions, whether audio or text based. The questions mostly surround the controlling time-line (to which all other content is synced to) and a means of ‘pausing’ the video to allow for the execution of this described video. There are a few ideas floating around, but further discussion is ongoing at this time. There is also some discussion on whether or not media queries may have a role to play when offering up the appropriate supplemental files, with no clear consensus here either.

Content navigation by content structure is also an issue which requires more work. While the current spec suggests a means to navigate content via “Chapters” (a decidedly WebSRT concept), this construct is but one-level deep. The need to have a hierarchal navigation means (think H1, H2, H3) has been identified. While this type of richer structure can be achieved with TTML, it has not been fully explored as to how this would/could work in HTML5 and might require modifications to the basic JavaScript API used for those fully ‘wrapped’ media files.

As well, I am working on a Change Proposal (due early January) that would have the video ‘poster’ have the ability to take on @alt that would be different from a textual description of the media file – there is some push back on this topic from some of the engineers, but I believe, based upon other feedback, that the engineers are not fully understanding the real issue. Mark this one as a giant question mark.

Finally, in the near future, work will need to focus on creating good author guidance and instruction, not only for mainstream publication, but also written in a way so that it can be added to the WCAG Techniques For Success documentation. I hope to be able to help with this work as well, which I suspect will need to start being addressed by this summer at the latest.

CC BY-NC-SA 4.0 <video>, Accessibility and HTML5 Today by John Foliot is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Posted by John

I am a 16 year veteran of Web Accessibility, living and working in Austin, Texas. Currently Principal Accessibility Strategist at Deque Systems Inc., I have previously held accessibility related positions at JPMorgan Chase and Stanford University. I am also actively involved with the W3C - the international internet standards body - where I attempt to stir the pot, fight hard for accessibility on the web, and am currently co-chairing a subcommittee on the accessibility of media elements in HTML5.

View more posts from this author

Leave a Reply