Haskset: Deckset presentations in reveal.js7 minutes read | 1286 words by Ruben Berenguel
After my happy experience with Haskell rewriting the ticketiser from Python to Haskell, I moved next on my list of rewrites.
A year or so ago I wrote a script in AWK that transformed a Markdown presentation formatted for Deckset into a Markdown presentation that Pandoc can convert into a reveal.js presentation, awkrdeck.
I wrote the script to be able to share my presentations with animated
GIFs (which don’t show in the
Deckset). As such, it is very tied to how I format my presentations, and only offers features I use. With it I was able to export the initial versions of my PySpark talk.
The AWK code was not ready to handle variable footers and several features I had to use for my presentation at Spark+AI summit 2020. After writing a Markdown parser for bear-note-graph (using custom parser combinators), it was a no-brainer to rewrite this AWK mess into an extendable system.
I really love AWK. AWK is one of the best languages for quick-and-dirty exploration, even better than Python in some cases.
hticketiser (my previous Haskell project) made me re-think, so instead of reusing my Markdown parser in Python I decided to rewrite everything in Haskell from scratch. As before, it was an interesting problem, involving parsing and evaluation. A good excuse to learn to use parser combinators in Haskell.
I had already used parser combinators to parse user stories in the format “As a _, I want to _ so that _” in
hticketiser, since I knew I was going to rewrite
awkrdeck in Haskell. I have used the Parsec library. I’m curious to see how to use attoparsec, which is supposed to be faster.
Parsec is fast enough for this code.
The problem to solve
Pandoc can convert presentations written in Markdown and using
--- as slide separator into dynamic presentations that can be presented using
reveal.js. Done quickly, they won’t look as good as the
Deckset uses beautiful themes and custom configuration settings that can be embedded in Markdown.
The problem is reformatting the Markdown as written for
Deckset into Markdown (possibly with HTML or CSS embedded) that
Pandoc can convert into a nice
reveal.js presentation. This also involves writing some CSS. That is kind of another problem I won’t address here, it was not fun.
The fundamental features I wanted to enable were:
- Split-images: If you add $N$ images in a slide, they will automatically split vertically. It works great for mixing concepts, or for the introductory slide
- Half-images: You can choose an image to have at the left or right, with text on the other. Makes slides more fluid, preventing too-much-text syndrome.
The AWK solution
The AWK version is at its core an (almost) one-pass parser-compiler, markdown-markdown. In other words, it reads the input line by line and emits new lines that already have the format adapted. It is partially split into a parsing and compiling step, since compiling needs lookback (or lookahead depending on how you see it): When you have $N$ images to use as a background, you need to know how many they are to choose the correct CSS classes for the split.
Internally, parsing generates an array of lines, that compiling rewrites into Markdown.
You can check the main parser/compiler here.
Problems with the AWK solution
There are no testing frameworks for AWK, and this is problem has complex testing issues: the final test is confirming, as a human that the generated presentation looks good.
Extending the parser/compiler is very hard. This is not because AWK is line noise, but because rolling a parser by hand with sparse documentation is problematic. Extending it is so much more.
The Haskell solution
The Haskell implementation is a more proper parser-compiler. It still needs improvement, but the parser is built using parser combinators, that emit
Object instances. A
Slide is formed by a list of
Objects. An object can be a Markdown header, a text block, a list…
You can find the parser here and the “compiler” here.
Here’s one of the parsers for clarity:
headerParser :: Parsec.ParsecT String () Identity Object headerParser = do Parsec.spaces header <- pack <$> Parsec.many1 (Parsec.char '#') Parsec.char ' ' title <- pack <$> Parsec.manyTill Parsec.anyChar newLineTryParser return (Header header title)
Change the import for
import qualified Text.Parsec as P to avoid writing Parsec so many times. Or just import unqualified.
In the “compiler”, a function
format takes an
Object (and some formatting directives, needed for some cases) and emits
Text with the correct formatting for
Pandoc. As easy as that.
Here’s part of this function. Now that I have a clearer idea, I should rewrite this area, though.
format :: [Directive] -> Object -> T.Text format _ (Header level text) = T.concat [level <> "#", " ", reformatFootnote (reformatInlinedEmph text)] format _ Blank = "\n" -- continues…
Here you can see already a couple of the “special stuff” that needs to be handled. Footnotes are not objects (due to how slides are created in the final formatting, they can’t be, at least not yet), thus need to be processed separately. Additionally, Markdown emphasis markers in headers are not (properly?) handled by
Pandoc, thus need to be wrapped in spaces for them to be rendered. This is
reformatInlinedEmph. Both this and footnotes are handled by parsers-rewriters, in one go.
This is what one of them looks like:
emphParser :: Parsec.ParsecT String () Identity String emphParser = do Parsec.choice [Parsec.try (Parsec.string "__"), Parsec.try (Parsec.string "_")] headerInlinedEmph :: Parsec.ParsecT String () Identity T.Text headerInlinedEmph = do start <- Parsec.many Parsec.alphaNum emph1 <- emphParser middle <- Parsec.many Parsec.alphaNum emph2 <- emphParser end <- Parsec.manyTill Parsec.alphaNum Parsec.eof return (pack (start ++ " " ++ emph1 ++ middle ++ emph2 ++ " " ++ end)) reformatInlinedEmph :: T.Text -> T.Text reformatInlinedEmph text = T.intercalate " " (map rightParse (splitOn " " text)) where rightParse text = case (Parsec.parse (Parsec.choice [Parsec.try headerInlinedEmph, Parsec.try anyUntilEof]) "" (unpack text)) of Right thing -> thing Left _ -> "[Failed parsing inlined formats]"
As you can see in the
Left case at the bottom, I’m not throwing or emitting errors. I prefer outputting errors directly into the slide. The reason is simple: I’d rather have slides with some text errors I can fix by editing the HTML than having to rewrite parts of the code because the error stops the process. Imagine being in a hurry and needing the
reveal.js version now.
newtype Parser = P.ParsecT String () Identity a or whatever makes me able to rewrite all those function headers, they are too verbose. Also use
concat for that
Simon Peyton-Jones has said
Haskell is the finest imperative programming language.
Maybe half-jokingly, but the
do block in
headerInlinedEmph looks very clear, step-wise imperative. Readable.
Also, for a very typed, supposedly painful programming language, there is a surprisingly small amount of type annotations. Only the function headers. That doesn’t sound so bad, doesn’t it? Usually I don’t even type those, but just write the body of the function and the Haskell language server infers some options for them, you can choose any and they are automatically added:
Does it work?
Check for yourself:
You can view the full
reveal.js presentation for this talk in this repository (last link in the README). You can compare with the exported PDF.
Obviously I still need to tweak the CSS settings, and many other things (I have a pretty long todo listed in the README), but it already works 💪.
Next Haskell exploration will be about web frameworks and database connectors.