pr130.dev: Reducing Quarto file sizes with gzip compression in R and Observable JS

Long time, no content. This is how it is when you get wrapped up in non-profit work, pick up new (D&D!) and old (soccer/football!) hobbies. 🤷 But the purpose of this space was never to be a consistent blogger but rather share things occassionally. Even if “occassionally” means every two years or so. :)

Today, after 3-4 months of using it and after several discussions with my friend Ilja (@fubits), I am looking at Quarto. Quarto is a “open-source scientific and technical publishing system built on Pandoc”¹. While it certainly still has its annoying childhood quirks and bugs, it’s a promising technology in my opinion. Do preprocessing and wrangling in R or Python and then use the power of Observable JS for interactivity, user input and data visualization. ²

How R and Observable JS work together

When we use R (or Python I suppose) and Observable JS (OJS) together in Quarto, the flow is as follows:

Load, preprocess and wrangle data in R/Python
Hand the data over to Obervable JS
Do interactive things in Observable JS

As far as I understand, what we can’t do is to “hand data back” from OJS to R, e.g. user input such as filters or selections. This makes sense because the R parts are knitted by the knitr engine - and unless we have quarto preview running, the R parts of our quarto document won’t re-knit. So once we’ve handed over our data to Javascript, we’re in Javascript.³

But back to the topic at hand: Today we look at Step 2, the “handover”. How does R hand over the data to Observable JS?

On a code-level, the answer is easy: we use the ojs_define function. E.g.

cars <- cars
ojs_define(cars_ojs = cars)

We can then use the cars_ojs object in our OJS cell ({ojs}) - this blog is still in distill so we unfortunately can’t see the output ;).

Inputs.table(cars)

What happens behind the scenes? Basically the data is converted to a JSON string and then is added to a <script type='ojs-define'> HTML tag in the head of the HTML output document so that Observable JS can access it. We can see this in action:

go to this little example document of Ilja
right click anywhere in the document
click Inspect
navigate to the head and look for the script tag

Screenshot of inspector in firefox. it shows the script type=‘ojs-define’ tag described earlier. the contents of the tag are the mtcars data in JSON form.

The problem

While this is a simple and effective approach, it also means that when we hand over larger datasets, we’ll significantly increase the size of the HTML output document. This is not a big problem if we just want to use Quarto locally for ourselves but once we start sharing/deploying our content, we should probably be a bit mindful about this.

So what can we do?

not use Observable JS: that’s obvious. But we also might need certain features - after all, user interactivity can significantly improve the UX of our reports.
be mindful of how we use ojs_define: are there any “handovers” that are not used? Do we need all variables of the dataset in Observable?
try making the “handover” more efficient.

I have tried the last approach, using gzip to compress the data in R and uncompress it in OJS.

gzip compression

Useful things I learned along the way

Source: https://quarto.org/↩︎
Some might say now “but we have Shiny!”… I am not a big fan of Shiny² and the existing options for RMarkdown (e.g. crosstalk) are fiddly and not a universal solution.↩︎
while this seems daunting, there are libraries for data wrangling in Javascript, e.g. arquero or tidyjs.↩︎

Reducing Quarto file sizes with gzip compression in R and Observable JS

How R and Observable JS work together

The problem

gzip compression

Useful things I learned along the way

Corrections

Reuse