Text-to-speech (TTS) technologies read written text aloud, providing essential accommodations for users with certain impairments. Modern TTS engines are capable of synthesizing natural, human-like audio from virtually any piece of text, with applications ranging from in-home assistants (like Alexa and Google Home) to language-learning tools and screen readers.
While TTS is often promoted for an adaptive technology for people with vision impairments, it can also be useful for:
- People with dyslexia and other cognitive impairments
- People who speak, but cannot read the website’s language
- People who are multitasking and not looking at the page
- People who prefer to listen for other reasons
These users often utilize screen readers — text-to-speech programs that interpret and read text — for everyday interactions with websites. Screen readers can distinguish the site’s structures, identify images, and highlight text while it’s being read.
However, software can only follow its programming. When a website has severe issues, text-to-speech software may leave users feeling confused or frustrated.
When websites are built for accessibility, these types of issues occur less frequently. As we’ve pointed out in other blogs, accessible websites provide a better experience for all users, not just people with disabilities. By understanding the capabilities and limits of TTS, it’s easier to see the benefits of an accessible approach.
Semantic HTML helps text-to-speech technologies work.
To create a website that’s text-to-speech accessible, web developers should recognize the limits of current technologies. Screen readers are great at reading articles; they’re less adept at navigating forms, sorting through a complex HTML structure, and interpreting multimedia.
Semantic HTML markup gives the screen reader the information it needs to perform more effectively. Essentially, semantic HTML is a set of elements that allow a website to identify and define its structure. The elements don’t do anything that directly affects the typical on-page experience — they simply provide context, which allows assistive text-to-speech technologies to provide their users with options.
For example, many people prefer to scan a web page’s headings to find the content they need. If headings aren’t clearly identified with HTML, screen readers can’t find them. If the site’s structure is detailed via semantic HTML, the technology works more effectively, and users can browse easily without reading every word on the page.
Multimedia can present challenges to screen readers.
Webmasters should always provide text alternatives for non-text content. That includes multimedia, images, form controls, charts, and anything else that appears on the page.
For example, if a site doesn’t properly identify a table or form, the screen reader will simply read the text out loud; the user may not understand why they’re suddenly listening to a long string of numbers or words. If a site uses pictures to convey information — without appropriate tags explaining what the picture conveys — the user won’t be able to access that content.
Providing text and alt-tags to describe videos, pictures, forms, and other content keeps text-to-speech software on the right track. The TTS engine has the information it needs to present the page correctly, and the user won’t miss out on context.
Proper language identification is a crucial part of text-to-speech technology.
Modern text-to-speech engines are capable of reading text naturally, using the context of the sentence to articulate words realistically. Of course, the engine needs to know which language it’s speaking.
As we’ve discussed in other articles, developers need to use HTML and XHTML language tags to identify both the language and language variant of the content. British English sounds quite different from American English, and taking the time to identify an American English page might greatly improve the on-page experience for British English speakers using a text-to-speech technology.
When a site is built for accessibility, it’s much more likely to provide a pleasant experience for users who rely on TTS. It’s also more likely to appeal to search engines, which can interpret the on-page content more naturally. And practically, an accessible approach to web design will benefit all users, regardless of the technologies they use to interact with the content.