I thought I might start a blog to discuss how and why Rhapsode works the way it does. And what better place to start than “why is Rhapsode an auditory web browser?”
The blind, amongst numerous others, deserves as excellent a computing experience as the rest of us! Yet webdesigners far too often don’t consider them, and webdevelopers far too often exclude them in favour of visual slickness.
Anyone who can’t operate a mouse, keyboard, or touchscreen, anyone who can’t see well or at all, anyone who can’t afford the latest hardware is being excluded from our conversations online. A crossfade is not worth this loss!
Currently the blind are reliant on “screenreaders” to describe the webpages, and applications, they’re interacting with. Screenreaders in turn rely on webpages to inform it of the semantics being communicated visually, which they rarely do.
But even if those semantics were communicated, screenreaders would still offer a poor experience! As they retrofit an auditory output upon an inherantly visual experience.
It’s unfortunately not considered cool to show disabled people the dignity they deserve.
But you know what is considered cool? Voice assistants! Or at least that’s what Silicon Valley wants us to believe as they sell us Siri, Cortana, Alexa, and other privacy-invasive cloud-centric services.
Guess what? These feminine voices are accessable to many people otherwise excluded from modern computing! Maybe voice assistants can make web accessability cool? Maybe I can deliver an alternative web experience people will want to use even if they don’t need to?
On a visual display you can show multiple items onscreen at the same time for your eyes to choose where to focus their attention moment-to-moment. You can even update those items live without confusing anyone!
In contrast in auditory communication, information is positioned in time rather than space. Whilst what you say (or type) is limited by your memory rather than screen real estate.
Visual and auditory user experiences are two totally different beasts, and that makes developing a voice assistant platform interesting!
Webpages in general are still mostly text. Text can be rendered to audio output just as (if not more) readily as it can be rendered to visual output. HTML markup can be naturally communicated via tone-of-voice. And links can become voice commands! A natural match!
You may be surprised to learn it’s actually simpler for me to start my browser developments with an auditory offering like Rhapsode! This is because laying out text on a one-dimensional timeline is trivial, whilst laying it out in 2-dimensional space absolutely isn’t. Especially when considering the needs of languages other than English!
Once downloaded (along with it’s CSS and sound effects), rendering a webpage essentially just takes applying a specially-designed CSS stylesheet! This yields data that can be almost directly passed to basically any text-to-speech engine like eSpeak NG.
Whilst input, whether from the keyboard or a speech-to-text engine like CMU Sphinx, is handled through string comparisons against links extracted from the webpage.
I could discuss how the efficiency gained from the afforementioned simplicity is important because CPUs are no longer getting any faster, only gaining more cores. But that would imply that it was a valid strategy to wait for the latest hardware rather than invest time in optimization.
Because performant software is good for the environment!
Not only because speed loosely correlates with energy efficiency, but also because if our slow software pushes others to buy new hardware (which again, they might not be able to afford) manufacture that new computer incurs significant environmental cost.