The most pressing Accessibility issue in HTML5 today? <video>
Over a wonderfully authentic Thai dinner the other night with my friend, Opera Web Evangelist and HTML5 Doctor Bruce Lawson, conversation naturally turned to the outstanding accessibility issues that still need to be addressed in HTML5. While we both agree that finding a solution for <canvas> is going to be both vexing and messy, I suggested to Bruce that completing the <video> element to properly address accessibility before Last Call (currently scheduled for October, 2009) was likely even more pressing. Here’s why:
On June 26, 2009, Massachusetts Representative Edward Markey (D) introduced "The Twenty-first Century Communications and Video Accessibility Act of 2009" (H.R. 3101) [Download the PDF]. From 1987 to 2008, Rep. Markey served either as the Chairman or the Ranking Member of the House Energy and Commerce Committee’s Subcommittee on Telecommunications. (The proposed bill is also co-sponsored by California Reps. Linda Sanchez (D) and Barbara Lee (D)). If (when?) enacted, this comprehensive disabilities communications legislation will amend the United States Communications Act to ensure that new Internet-enabled telephone and television products and services are accessible to and usable by people with disabilities. It will also close existing disability gaps in telecommunications law. The bill in part proposes:
- Requiring caption decoder circuitry or display capability in all video programming devices, including PDAs, computers, iPods, cell phones, DVD players, TiVo devices and battery-operated TVs
- Extending closed captioning obligations to television-type video programming distributed over the Internet: covers web-based video services that offer television programs, movies, web clips, and live video streaming
- Requires easy access to closed captions via remote control and on-screen menus
Before the chicken little crowd has a chance to start squealing, the bill would exempt user-generated content such as family videos and other personal videos on YouTube, etc. However for large content providers (including those that may still choose to use delivery platforms such as YouTube and iTunes) this will be a critical business issue. Finally, I believe that for developers of the tools that act as the user interface (i.e. Browsers), failing to have a solution (API) to this requirement could place them squarely in the sites of organizations such as the American Association of People with Disabilities (AAPD), the National Association of the Deaf (NAD) and/or other advocacy groups, or at the very least render the <video> element a developmental eunuch, given that current proprietary plug in solutions do offer this functionality.
Who Doesn’t Caption Online
- A&E
- ABC Family (owned by ABC)
- AMC (Rainbow Media)
- Animal Planet (Discovery networks)
- BBC America
- BET
- Biography (A&E)
- Blinkx (retransmitter)
- Boomerang (Turner)
- Bravo
- Cartoon Network (Turner)
- CBS
- Cinemax (HBO)
- CMT (MTV)
- Comedy Central (MTV)
- CNBC (NBC)
- CNN (Turner)
- CNN en Espanol (Turner)
- CNN International (Turner)
- Discovery (Discovery networks)
- Discovery Health (Discovery networks)
- Discovery Kids (Discovery networks)
- Disney (ABC)
- DIY
- E!
- ESPN
- Fancast
- FitTV (Discovery networks)
- Food Network
- Fox Movie Channel
- Fox News Channel
- FX
- GSN
- Hallmark
- HBO
- HD Theater (Discovery networks)
- HGTV
- History Channel (A&E)
- HLN (Turner)
- IFC (Rainbow Media)
- IMDB.com (retransmitter)
- Investigation Discovery (Discovery networks)
- Joost (retransmitter)
- Lifetime
- Military Channel (Discovery networks)
- MSNBC
- MTV (also CMT)
- National Geographic Channel
- Nickelodeon
- Online TV (retransmitter)
- Oxygen
- PBS
- QVC
- Science Channel (Discovery networks)
- Sci Fi
- Showtime
- Speed
- Spike
- Starz
- Style
- Sundance (Rainbow Media)
- TBS (Turner)
- TCM (Turner)
- The Weather Channel
- Tidal TV (retransmitter)
- TLC (Discovery networks)
- TNT (Turner)
- Travel Channel
- TruTV (Turner)
- Truveo
- TV.com (retransmitter, owned by CBS.com)
- TVLand
- USA
- Veoh (retransmitter)
- VH1 (MTV)
- Voom (Rainbow Media)
- We (Rainbow Media)
Who Captions Online
- ABC
- CNET
- Fox
- Hulu (retransmitter)
- NBC
Source: Caption Action 2
And that is the key point: current proprietary plug-ins can render this functionality, although each solution requires content expressly created for that player, and the way in which caption files are associated to the media differs from player to player (and in the case of captioning for the iPhone, requires arcane binary files burned into the media asset – providing on-screen captioning to the deaf but seeing users, but continuing to shut out deaf/blind users and making searchability problematic).
A Tangled Mess
I suspect that one of the reasons why this issue has not emerged more prominently is due to the current impasse surrounding a standardized codec that the <video> element should support natively, with two firmly entrenched front-runners (Ogg/Theora and H.264) and outside third-parties further arguing for support of any codec. With confusion and disharmony around simply how to implement a common media stream natively within the <video> element, there is little wonder that how to further support captioning has taken a back seat in the discussion. The problem is further compounded by questions surrounding time-stamping formats for the transcripts (DFXP, SRT, SCC, others), and in-band vs. out-band delivery of the transcript to the user-agent/user interface; in-band captioning ensures one single file that includes the captioning is ‘burned in’ (so that when media assets are re-distributed the captioning remains), whilst out-band more easily allows re-purposing the transcript file to alternative user-agents – for example to Braille output devices, or for fuller indexing by search engines, etc. (Google today can index an external caption file and use that data to improve SEO results of videos). However, externally referenced files can be separated from the media file quite accidentally, so there are issues with how to ensure that media and captioning remain ‘bound’.
In an June 24, 2009 posting to the HTML WG mailing list from Silvia Pfeiffer, she noted that: … it has been decided that the first version of HTML5 <video> (and <audio>) will not have an in-built solution for captions, audio annotations and the like, because it is possible to do such with javascript and external files.
(Who actually did this deciding remains a problematic political issue, as surely the W3C WAI PFWG would not put forth such a reccomendation.)
While I’ve seen such experimental proofs of concept demonstrated, there remains interoperability issues: the linked example relies on one codec delivery (in this case ogg/theora), and sadly the example only works in one browser (Mozilla). Given that the whole point of the <video> element was to make it simple for content authors to embed video into their web pages, yet currently authors today need to supply multiple streams (theora/H.264) plus ‘roll-your-own’ javascripts (I am unaware of any libraries that facilitate extraction of caption files), one has to wonder why content creators wouldn’t simply continue to use existing proprietory solutions, especially since embeddable media players such as the Flash-based JW FLV Player deliver what might soon be legislated functionality.
No, if those working on the HTML5 specification truly want to see real-world commercial and institutional uptake of the <video> element become a reality, then solving the captioning issue prior to Last Call should be imperative; failing to do so will doom <video> to the ‘great ideas that didn’t catch on’ pile of the world wide web.
Read More:
- Accessibility/Video Accessibility Study ‘08 – Mozilla wiki
- Progress on captions for HTML5 video – Silvia Pfeiffer
- Multimedia Accessibility <Audio> <Video> – W3C ESW Wiki
- Caption Action 2 – Grassroots Advocacy
- Legislation Would Make Online Video Accessible to the Hearing- and Vision-Impaired – StreamingMedia.com
Hey! I'm John Foliot, and this is my personal blog.
People born in the Year of the Pig are chivalrous and gallant. Whatever they do, they do with all their strength. For Boar Year people, there is no left or right and there is no retreat. They have tremendous fortitude and great honesty. They don't make many friends but they make them for life, and anyone having a Boar Year friend is fortunate for they are extremely loyal. They don't talk much but have a great thirst for knowledge. They study a great deal and are generally well informed.
July 26th, 2009 at 5:52 pm
If I went to a site that had an html5 video, would I be able to stop it half-loaded by pressing the stop button on my browser?
If not, that’s something that needs to be implemented, too (and it’s also accessibility, after a fashion). I can stop big pictures from loading, so I should be able to stop big videos.
Meanwhile… I have a feeling that some vendors just aren’t going to go ahead with the video tag. When you get right down to it, that’s what we’re waiting on.
If the vendors can all agree on the codec and decide that they’ll implement video in their next browser (those who haven’t already), we might see more of a push towards getting the rest of it completed.
July 26th, 2009 at 9:50 pm
@Michael – exactly right: until we have a common carrier, we can’t standardize on a common method of ensuring captioning. However the current “..we’ll defer to later” position is unacceptable.
Re: your question – I don’t know, but will look to test this week – stand by for results (unless somebody already knows…)
July 28th, 2009 at 2:36 am
The problem does not lie directly with HTML5. The problem lies, in a very large part, with the lack of an agreed upon standard of time text formats for captions/subtitles that is supported in container formats that works in various container formats (even when looking at just Ogg and MP4), and which adequately addresses all of the requirements that captioning and subtitling entail, including language selection, styling (fonts, colours, positioning, etc.) and whatever else.
These things take time and experience to develop, and we simply can’t rush into it with HTML5. There are people working on finding a suitable solution, but if we were to add something to the spec now, it would very likely be an inferior solution. It is far better to work on developing a good solution than simply whinging about it currently being absent from the spec.
July 28th, 2009 at 3:13 pm
“Burned in,” John? This isn’t hardsubbing we’re talking about.
July 29th, 2009 at 2:59 pm
@lachlan The W3C has released a Working Draft for a Timed Text format called DFXP (http://www.w3.org/TR/2009/WD-ttaf1-dfxp-20090511 ) which has been worked on by such groups as Apple, the BBC, Microsoft, RealNetworks and WGBH’s National Center for Accessible Media (to name some contributing authors). It is scheduled to go to Last Call very soon, so if you have ideas and/or concerns about this W3C emergent standard, please do lodge your issues with the appropriate Working Group.
My understanding is that the two front-running codecs (Ogg and H.264) have no issue with DFXP per se (and in fact, David Singer of Apple was part of the author group, and Apple is the main proponent for H.264…), so that particular argument rings a tad hollow as a barrier for captioning implementation (unless it is once again a WHAT WG N.I.H. argument)
You don’t want to “rush into” anything in HTML5, then why is HTML5 “rushing” into the <video> element when the element is half complete? You suggest that adding something to the draft spec now would likely be an inferior solution, yet in many other areas of the draft specification there are existing but inferior suggestions already – items that are flagged as “…close but until we can return to it…”, so from my perspective better an inferior solution than no solution, which is what we have now.
@Joe there was a reason why I used single quotes on that term Joe… Technically files such as .M4V (which is the preferred file format for the iPhone) are re-processed (re-burned?) to include the binary .SCC caption file as part of the single downloaded file, making further extraction of this data nearly impossible, and thus inaccessible to Adaptive Technology such as Braille Output devices.
Do you have any concrete solution to propose as a way forward? Your years of experience in both captioning and web accessibility make you well positioned to offer valuable input – what say you?
July 30th, 2009 at 8:31 pm
I think the Timed Text Working Group, or whatever it’s called now, is run by a spook (intel agencies want to mark up voice-recognized text and intercepted messages); I haven’t looked at its specs (beyond once two weeks ago for five minutes) in like six years; AndrewWK at Adobe (a WGBH alum), along with WGBH and its many acolytes, will push the thing through as a spec right away.
YouTube will continue to ignore it and continue to publish self-congratulatory blog points about their atrocious “solution.”
dotSub and Subply and TED and the other crowdsourcing acolytes will continue to devalue the actual practice of captioning, a decline hastened by – wait for it – WGBH and NCI when they laid off their own staff circa three years ago.
Also, pretty much anything I say is going to get ignored anyway, or get criticized as an assault on the free market by redhead Objectivists. And I’m not being paid to work on this. Everybody else is. Even Pfeiffer.
So that’s my valuable input. I don’t know whether or not DXFP or whatever it’s called will be any good, but in every other facet of this issue my aphorism, varied slightly here, holds true: Everybody is walking around thinking the 4/5 of the problem they understand is the whole problem, and they’re about to announce a miracle solution for it.