How Does CSS Work?

Rendering a webpage in Rhapsode takes little more than applying a useragent stylesheet to decide how the page’s semantics should be communicated. In addition to any installed userstyles and optionally author styles.

Once the CSS has been applied Rhapsode sends the styled text to eSpeak NG to be converted into the sounds you hear. So how does Rhapsode apply that CSS?


Parser implementations differ mainly in what they implement rather than how. They repeatedly look at the next character(s) in the input stream to decide how to represent it in-RAM. Often there’ll be a “lexing” step (for which I use Haskell CSS Syntax) to categorize consecutive characters into “tokens”, thereby simplifying the main parser.

My choice to use Haskell, however, does change things a little. In Haskell there can be no side effects; all outputs must be returned. So in addition to the parsed tree, each part of the parser must return the rest of text that still needs to be parsed by another sub-parser. Yielding a type signature of :: [Token] -> (a, [Token]), leading Haskell to allow you to combine these subparsers together in what’s called “parser combinators”.

Once each style rule is parsed, a method is called on a StyleSheettypeclass” to return a modified datastructure containing the new rule. And a different method is called to parse any at-rules.


Many of my StyleSheet implementations handle only certain aspects of CSS, handing off to another implementation to perform the rest.

For example most pseudoclasses (ignoring interactive aspects I have no plans to implement) can be re-written into simpler selectors. So I added a configurable StyleSheet decorator just to do that!

This pass also resolves any namespaces, and corrects :before & :after to be parsed as pseudoelements.

Media Queries & @import

CSS defines a handful of at-rules which can control whether contained style rules will be applied:

Since media queries might need to be rechecked when, say, the window has been resized @media (and downloaded @import) are resolved to populate a new StyleSheet implementation only when the resolve function is called. Though again this is overengineered for Rhapsode’s uses as instead of window it renders pages to an infinite auditory timeline, media queries are barely useful here.


Ultimately Rhapsode parses CSS style rules to be stored in a hashmap (or rather a Hash Array Mapped Trie) indexed under the right-most selector if any. This dramatically cuts down on how many style rules have to be considered for each element being styled.

So that for each element needing styling, it looks up just those style rules which match it’s name, attributes, IDs, and/or classes. However this only considers a single test from each rules’ selector, so we need a…


To truly determine whether an element matches a CSS selector, we need to actually evaluate that selector! I’ve implemented this in 3 parts:

Whether there’s actually any compilation happening is another question for the Glasgow Haskell Compiler, but regardless I find it a convenient way to write and think about it.

Selectors are interpreted from right-to-left as that tend to shortcircuit sooner, upon an alternate inversely-linked representation of the element tree parsed by XML Conduit.

NOTE In webapp-capable browser engines querySelectorAll tends to use a slightly different selector interpretor because there we know the ancestor element. This makes it more efficient to interpret those selectors left-to-right.


Style rules should be sorted by a “selector specificity”, which is computed by counting tests on IDs, classes, & tagnames. With ties broken by which come first in the source code and whether the stylesheet came from the browser, user, or webpage.

This is implemented as a decorator around the interpretor & (in turn) indexer. Another decorator strips !important off the end of any relevant CSS property values, generating new style rules with higher priority.


Once !important is stripped off, the embedding application is given a chance to validate whether the syntax is valid &, as such, whether it should participate in the CSS cascade. Invalid properties are discarded.

At the same time the embedding application can expand CSS shorthands into one or more longhand properties. E.g. convert border-left: thin solid black; into border-left-width: thin; border-left-style: solid; border-left-color: black;.

CSS Cascade

This was trivial to implement! Once you have a list of style rules listed by specificity, just load all their properties into a hashmap & back!

Maybe I’ll write a little blogpost about how many webdevs seem to be scared of the cascade

After cascade, methods are called on a given PropertyParser to parse each longhand property into an in-memory representation that’s easier to process. This typeclass also has useful decorators, though few are needed for the small handful of speech-related properties.

Haskell’s pattern matching syntax makes the tidious work of parsing the sheer variety of CSS properties absolutely trivial. I didn’t have to implement a DSL like other browser engines do! This is the reason why I chose Haskell!

CSS Variables var()

In CSS3, any property prefixed with -- will participate in CSS cascade to specify what tokens the var() function should substitute in. If the property no longer parses successfully after this substitution it is ignored. A bit of a gotcha for webdevs, but makes it quite trivial for me to implement!

In fact, beyond prioritizing extraction of ---prefixed properties, I needed little more than a trivial PropertyParser decorator.


There’s a handful of CSS properties which alters the text parsed from the HTML document, predominantly by including counters. Which I use to render <ol> elements. Or to generate marker labels for the arrow keys to jump to.

To implement these I added a StyleTree abstraction to hold the relationship between all parsed PropertyParser style objects & aid tree traversals. From there I implemented a second PropertyParser decorator with two tree traversals: one to collapse whitespace & the other to track counter values before substituting them (as strings) in-place of any counter() or counters() functions.


In most browser engines any resource references (via the url() function, which incidentally requires special effort to lex correctly & resolve any relative links) is resolved after the page has been fully styled. I opted to do this prior to styling instead, as a privacy measure I found just as easy to implement as it would be not to do so.

Granted this does lead to impaired functionality of the style attribute, but please don’t use that anyways!

This was implemented as a pair of StyleSheet implementations: one to extract relevant URLs from the stylesheet, and the other to substitute in the filepaths where they were downloaded. eSpeak NG will parse these .wav files when it’s ready to play these sound effects.

CSS Inheritance

Future browser engines of mine will handle this differently, but for Rhapsode I simply reformat the style tree into a SSML document to hand to straight to eSpeak NG.

eSpeak NG (running in-process) will then parse this XML with the aid of a stack to convert it into control codes within the text it’s later stages will gradually convert to sound.

While all this is useful to webdevs wanting to give a special feel to their webpages (which, within reason, I don’t object to), my main incentive to implement CSS was for my own sake in designing Rhapsode’s useragent stylesheet. And that stylesheet takes advantage of most of the above.

Sure there are features (like support for CSS variables or most pseudoclasses) I decided to implement just because they were easy, but the only thing I’d consider extra complexity beyond the needs of an auditory browser engine are media queries. But I’m sure I’ll find a use for those in future browser engines.

Otherwise all this code would have to be in Rhapsode in some form or other to give a better auditory experience than eSpeak NG can deliver itself!