Rendering a webpage in Rhapsode takes little more than applying a useragent stylesheet to decide how the page’s semantics should be communicated. In addition to any installed userstyles and optionally author styles.
Once the CSS has been applied Rhapsode sends the styled text to eSpeak NG to be converted into the sounds you hear. So how does Rhapsode apply that CSS?
Parser implementations differ mainly in what they implement rather than how. They repeatedly look at the next character(s) in the input stream to decide how to represent it in-RAM. Often there’ll be a “lexing” step (for which I use Haskell CSS Syntax) to categorize consecutive characters into “tokens”, thereby simplifying the main parser.
My choice to use Haskell, however, does change things
a little. In Haskell there can be no side effects;
all outputs must be returned.
So in addition to the parsed tree, each part of the parser must return the rest
of text that still needs to be parsed by another sub-parser. Yielding a type
:: [Token] -> (a, [Token]),
leading Haskell to allow you to combine these subparsers together in what’s
called “parser combinators”.
Once each style rule is parsed, a method is called on a
to return a modified datastructure containing the new rule. And a different method
is called to parse any at-rules.
Many of my
StyleSheet implementations handle only certain aspects of CSS,
handing off to another implementation to perform the rest.
For example most pseudoclasses (ignoring interactive aspects I have no plans to
implement) can be re-written into simpler selectors. So I added a configurable
StyleSheet decorator just
to do that!
This pass also resolves any namespaces,
to be parsed as pseudoelements.
CSS defines a handful of at-rules which can control whether contained style rules will be applied:
@documentallows user & useragent stylesheets to apply style rules only for certain (X)HTML documents & URLs. An interesting Rhapsode-specific feature is
@document unstyledwhich applies only if no author styles have already been parsed.
@mediaapplies it’s style rules only if the given media query evaluates to true. Whilst in Rhapsode only the
-rhapsodemediatypes are supported, I’ve implemented a full caller-extensible Shunting Yard interpretor.
@importfetches & parses the given URL if the given mediatype evaluates to true when you call
loadImports. As a privacy protection for future browsers, callers may avoid hardware details leaking to the webserver by being more vague in this pass.
@supportsapplies style rules only if the given CSS property or selector syntax parses successfully.
Since media queries might need to be rechecked when, say, the window has been resized
@media (and downloaded
@import) are resolved to populate a new
implementation only when the
function is called. Though again this is overengineered for Rhapsode’s uses as
instead of window it renders pages to an infinite auditory timeline, media queries
are barely useful here.
Ultimately Rhapsode parses CSS style rules to be stored in a hashmap (or rather a Hash Array Mapped Trie) indexed under the right-most selector if any. This dramatically cuts down on how many style rules have to be considered for each element being styled.
So that for each element needing styling, it looks up just those style rules which match it’s name, attributes, IDs, and/or classes. However this only considers a single test from each rules’ selector, so we need a…
To truly determine whether an element matches a CSS selector, we need to actually evaluate that selector! I’ve implemented this in 3 parts:
Whether there’s actually any compilation happening is another question for the Glasgow Haskell Compiler, but regardless I find it a convenient way to write and think about it.
Selectors are interpreted from right-to-left as that tend to shortcircuit sooner, upon an alternate inversely-linked representation of the element tree parsed by XML Conduit.
NOTE In webapp-capable browser engines
tends to use a slightly different selector interpretor because there we know
the ancestor element. This makes it more efficient to interpret those selectors
Style rules should be sorted by a “selector specificity”, which is computed by counting tests on IDs, classes, & tagnames. With ties broken by which come first in the source code and whether the stylesheet came from the browser, user, or webpage.
This is implemented as a decorator around the interpretor & (in turn) indexer.
Another decorator strips
off the end of any relevant CSS property values, generating new style rules with
!important is stripped off, the embedding application is given a chance
to validate whether the syntax is valid &, as such, whether it should participate
in the CSS cascade. Invalid properties are discarded.
At the same time the embedding application can expand CSS
into one or more longhand properties. E.g. convert
border-left: thin solid black;
border-left-width: thin; border-left-style: solid; border-left-color: black;.
This was trivial to implement! Once you have a list of style rules listed by specificity, just load all their properties into a hashmap & back!
Maybe I’ll write a little blogpost about how many webdevs seem to be scared of the cascade…
After cascade, methods are called on a given
to parse each longhand property into an in-memory representation that’s easier
to process. This typeclass also has useful decorators, though few are needed
for the small handful of speech-related properties.
Haskell’s pattern matching syntax makes the tidious work of parsing the sheer variety of CSS properties absolutely trivial. I didn’t have to implement a DSL like other browser engines do! This is the reason why I chose Haskell!
In CSS3, any property prefixed with
will participate in CSS cascade to specify what tokens the
var() function should
substitute in. If the property no longer parses successfully after this substitution
it is ignored. A bit of a gotcha for webdevs,
but makes it quite trivial for me to implement!
In fact, beyond prioritizing extraction of
---prefixed properties, I needed little
more than a trivial
There’s a handful of CSS properties
which alters the text parsed from the HTML document, predominantly by including
counters. Which I use to render
elements. Or to generate marker labels for the arrow keys to jump to.
To implement these I added a
abstraction to hold the relationship between all parsed
objects & aid tree traversals. From there I implemented a second
PropertyParser decorator with two tree traversals:
to collapse whitespace & the other
to track counter values before substituting them (as strings) in-place of any
In most browser engines any resource references (via the
url() function, which
incidentally requires special effort to lex correctly & resolve any relative links)
is resolved after the page has been fully styled. I opted to do this prior to
styling instead, as a privacy measure I found just as easy to implement as it
would be not to do so.
Granted this does lead to impaired functionality of the
attribute, but please don’t use that anyways!
This was implemented as a pair of
StyleSheet implementations: one to extract
relevant URLs from the stylesheet, and the other to substitute in the filepaths
where they were downloaded. eSpeak NG will parse these
files when it’s ready to play these sound effects.
Future browser engines of mine will handle this differently, but for Rhapsode I simply reformat the style tree into a SSML document to hand to straight to eSpeak NG.
eSpeak NG (running in-process) will then parse this XML with the aid of a stack to convert it into control codes within the text it’s later stages will gradually convert to sound.
While all this is useful to webdevs wanting to give a special feel to their webpages (which, within reason, I don’t object to), my main incentive to implement CSS was for my own sake in designing Rhapsode’s useragent stylesheet. And that stylesheet takes advantage of most of the above.
Sure there are features (like support for CSS variables or most pseudoclasses) I decided to implement just because they were easy, but the only thing I’d consider extra complexity beyond the needs of an auditory browser engine are media queries. But I’m sure I’ll find a use for those in future browser engines.
Otherwise all this code would have to be in Rhapsode in some form or other to give a better auditory experience than eSpeak NG can deliver itself!