Quick XLSX automation

There are a bunch of commercial spreadsheet automation tools out there. Many of them are enterprise-level, and all of them are complex. A lot of them require people to write code. So if you’ve got to write some code, why not just do it all yourself? Here’s a quick way to run a sheet using containers and LibreOffice.

Let’s start by running a podman/docker container with Ubuntu, and installing LibreOffice. I’m pretty sure you can use OpenOffice as well, and it might be better maintained, but the scripts I’ve been running use LO.

# start podman (because docker seems to be getting worse by the day)
podman run \
  --interactive \
  --tty \
  --security-opt label=disable \
  --volume /tmp:/tmp \
  --name common \ 
  --env DISPLAY ubuntu:18.04

# install libreoffice
apt-get update
apt-get -y install software-properties-common
add-apt-repository ppa:libreoffice/ppa
apt update
apt -y install libreoffice
soffice --help

Our input file is just going to be something dead simple: a single XLSX file with one page that has the first cell (A1) as =NOW(). At save time it looks like this.

A1 is just a temporal formula, for testing.

The goal here is just to have a timestamp that is one value at save time, to prove that it’s different when we run the sheet.

Now we’re just going to run the LO command line tool, and convert an XLSX to XLSX. In the conversion process, LO runs all formulas.

soffice \
  --convert-to xlsx \
  --outdir /tmp/demo/out \ 

Checking our results is as simple as converting the resulting XLSX file to CSV. We can’t cat it because XLSX files are just zip of XML.

Check our results by converting the second XLSX file to CSV.

soffice \
  --convert-to csv \
  --outdir /tmp/demo/observed \

cat /tmp/demo/observed/test.csv

And we see 05/29/2022 12:35:54 as the output, which is different from our initially saved XLSX calculation, which was around 05/29/2022 07:49:00.

The part about this solution that I most like is that you can build whatever you want in Excel, or LO, or OO, and then use whatever scripting tool you’d to put your data in it (as long as you parameterize your sheet properly!) and you can then just run it. As far as I can tell, this is basically how a huge number of companies do their reporting. Someone builds an example in Excel, someone else checks it into Git, and then wraps it in a container, and automates it to run and email a VP with some numbers once a week.

Some weird things to note about this solution:

  • Mostly get the compatability that you want, but you’re not running Excel, so there are differences like CEILING and INFO.
  • If the spreadsheet is mostly data, this is not the fastest – you’re copying the entire thing just to run a few calculations.
spreadsheets | snippets | programming