Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

#1230: Captioning for XR Accessibility with W3C’s Michael Cooper

#1230: Captioning for XR Accessibility with W3C’s Michael Cooper

FromVoices of VR


#1230: Captioning for XR Accessibility with W3C’s Michael Cooper

FromVoices of VR

ratings:
Length:
25 minutes
Released:
Jul 13, 2023
Format:
Podcast episode

Description

Michael Cooper works for the World Wide Web Consortium's Web Accessibility Initiative, and he was attending the XR Access Symposium to learn more about the existing XR accessibility efforts but also to moderate a break-out session about captions in XR. One of Cooper's big takeaways is that there is no magical, one-size-fits-all solution to captioning in XR because people have different needs, different preferences, and different contexts that means that there is a need for frameworks to help make captions easily customizable. There are a lot of potential customizable options for spatial captions that include distance from the speaker, text size, text color, weight, layout, the size of the box, whether or not they have transparent boxes, preventing occlusion of objects by the captions, whether it moves with the speaker or not, and how to handle off-screen speakers.



Gallaudet University's Christian Vogler warned XR Access participants about the dangers of doing a one-to-one translation for how captions are handled in 2D into how they're handled in 3D since there different modalities like haptics that could help reduce information overload. One of the demos that was being shown at the XR Access Symposium implemented a wide range of these different spatial caption options, and so there is a need to develop an framework for the different game engines and the open web with WebXR as well as an opportunity at the platform level like with the Apple Vision Pro or Meta Quest ecosystem to implement a captioning framework.



Cooper told me in this interview, "I do think that we need design guidance. There are a lot of good ways to do captions in XR. There are some bad ways to do it, and so we need people to know about that. Going down the road, I think that we are going to need to develop semantic formats for the captions and for the objects that they represent. So there's a lot of excitement about that. But again, there's a big sense of caution that the space is so early that we don't want to overstandardize. And as a person who works for a standards organization, that's a big takeaway that I have to take."



It's again worth bringing up what Khronos Group President Neil Trevett told to me about the process of standardization, “The number one golden rule of standardization is don’t do R&D by standardization committee… Until we have multiple folks doing the awesome necessary work of Darwinian experimentation, until we have multiple examples of a needed technology and everyone is agreeing that it’s needed and how we would do it, but we’re just doing it in annoyingly different ways. That’s the point at which standardization can help.”



It's still very early days for this type of Darwinian experimentation with Owlchemy Labs' innovations of captioning starting with Vacation Simulator in October 2019 as well as the captioning experiments and accessibility features by ILM Immersive (formerly ILMxLAB) within Star Wars: Tales from the Galaxy’s Edge. The live captioning within social VR platform of AltSpaceVR was also pretty groundbreaking (RIP AltSpaceVR), and VRChat has had a number of Speech-to-Text implementations including ones that can be integrated into an avatar including VRCstt (and their RabidCrab's TTS Patreon), VRCWizard's TTS-Voice-Wizard, and VRC STT System. There were also a number of unofficial, community-made accessibility mods before VRChat's Easy Anti-Cheat change eliminated all quality-of-life mods such as VRC-CC and VRC Live Captions Mod. There have also been a number of different strategies within 360 videos over the years that would burn in captions at either 1, 2, or 3 different locations. The more locations the captions, then more ability one has to to look around without missing any action within the environment and still be able to read the captions. At Laval Virtual 2023, I saw some integrations of OpenAI's Whisper to do live transcription and captioning as they were feeding text into ChatGPT 3.5.
Released:
Jul 13, 2023
Format:
Podcast episode

Titles in the series (100)

Designing for Virtual Reality