Reducing Quarto file sizes with gzip compression in R and Observable JS

How easily you can do quite complex things in R again and again baffles me. In this post, I share the 12 lines of git2r and ggplot code that it takes to create a wordcloud of your Git commits.

Long time, no content. This is how it is when you get wrapped up in non-profit work, pick up new (D&D!) and old (soccer/football!) hobbies. šŸ¤· But the purpose of this space was never to be a consistent blogger but rather share things occassionally. Even if ā€œoccassionallyā€ means every two years or so. :)

Today, after 3-4 months of using it and after several discussions with my friend Ilja (@fubits), I am looking at Quarto. Quarto is a ā€œopen-source scientific and technical publishing system built on Pandocā€1. While it certainly still has its annoying childhood quirks and bugs, itā€™s a promising technology in my opinion. Do preprocessing and wrangling in R or Python and then use the power of Observable JS for interactivity, user input and data visualization. 2

How R and Observable JS work together

When we use R (or Python I suppose) and Observable JS (OJS) together in Quarto, the flow is as follows:

  1. Load, preprocess and wrangle data in R/Python
  2. Hand the data over to Obervable JS
  3. Do interactive things in Observable JS

As far as I understand, what we canā€™t do is to ā€œhand data backā€ from OJS to R, e.g.Ā user input such as filters or selections. This makes sense because the R parts are knitted by the knitr engine - and unless we have quarto preview running, the R parts of our quarto document wonā€™t re-knit. So once weā€™ve handed over our data to Javascript, weā€™re in Javascript.3

But back to the topic at hand: Today we look at Step 2, the ā€œhandoverā€. How does R hand over the data to Observable JS?

On a code-level, the answer is easy: we use the ojs_define function. E.g.

cars <- cars
ojs_define(cars_ojs = cars)

We can then use the cars_ojs object in our OJS cell ({ojs}) - this blog is still in distill so we unfortunately canā€™t see the output ;).

Inputs.table(cars)

What happens behind the scenes? Basically the data is converted to a JSON string and then is added to a <script type='ojs-define'> HTML tag in the head of the HTML output document so that Observable JS can access it. We can see this in action:

Screenshot of inspector in firefox. it shows the script type=ā€˜ojs-defineā€™ tag described earlier. the contents of the tag are the mtcars data in JSON form.

The problem

While this is a simple and effective approach, it also means that when we hand over larger datasets, weā€™ll significantly increase the size of the HTML output document. This is not a big problem if we just want to use Quarto locally for ourselves but once we start sharing/deploying our content, we should probably be a bit mindful about this.

So what can we do?

  1. not use Observable JS: thatā€™s obvious. But we also might need certain features - after all, user interactivity can significantly improve the UX of our reports.
  2. be mindful of how we use ojs_define: are there any ā€œhandoversā€ that are not used? Do we need all variables of the dataset in Observable?
  3. try making the ā€œhandoverā€ more efficient.

I have tried the last approach, using gzip to compress the data in R and uncompress it in OJS.

gzip compression

Useful things I learned along the way


  1. Source: https://quarto.org/ā†©ļøŽ

  2. Some might say now ā€œbut we have Shiny!ā€ā€¦ I am not a big fan of Shiny2 and the existing options for RMarkdown (e.g.Ā crosstalk) are fiddly and not a universal solution.ā†©ļøŽ

  3. while this seems daunting, there are libraries for data wrangling in Javascript, e.g.Ā arquero or tidyjs.ā†©ļøŽ

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://gitlab.com/friep/blog, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".