Quick XLSX automation
There are a bunch of commercial spreadsheet automation tools out there. Many of them are enterprise-level, and all of them are complex. A lot of them require people to write code. So if you’ve got to write some code, why not just do it all yourself? Here’s a quick way to run a sheet using containers and LibreOffice.
Let’s start by running a podman/docker container with Ubuntu, and installing LibreOffice. I’m pretty sure you can use OpenOffice as well, and it might be better maintained, but the scripts I’ve been running use LO.
# start podman (because docker seems to be getting worse by the day) podman run \ --interactive \ --tty \ --security-opt label=disable \ --volume /tmp:/tmp \ --name common \ --env DISPLAY ubuntu:18.04 # install libreoffice apt-get update apt-get -y install software-properties-common add-apt-repository ppa:libreoffice/ppa apt update apt -y install libreoffice soffice --help
Our input file is just going to be something dead simple: a single XLSX file with one page that has
the first cell (A1) as
=NOW(). At save time it looks like this.
The goal here is just to have a timestamp that is one value at save time, to prove that it’s different when we run the sheet.
Now we’re just going to run the LO command line tool, and convert an XLSX to XLSX. In the conversion process, LO runs all formulas.
soffice \ --convert-to xlsx \ --outdir /tmp/demo/out \ /tmp/demo/test.xlsx
Checking our results is as simple as converting the resulting XLSX file to CSV. We can’t
it because XLSX files are just zip of XML.
Check our results by converting the second XLSX file to CSV.
soffice \ --convert-to csv \ --outdir /tmp/demo/observed \ /tmp/demo/out/test.xlsx cat /tmp/demo/observed/test.csv
And we see
05/29/2022 12:35:54 as the output, which is different from our initially saved XLSX
calculation, which was around
The part about this solution that I most like is that you can build whatever you want in Excel, or LO, or OO, and then use whatever scripting tool you’d to put your data in it (as long as you parameterize your sheet properly!) and you can then just run it. As far as I can tell, this is basically how a huge number of companies do their reporting. Someone builds an example in Excel, someone else checks it into Git, and then wraps it in a container, and automates it to run and email a VP with some numbers once a week.
Some weird things to note about this solution:
- Mostly get the compatability that you want, but you’re not running Excel, so there are differences
- If the spreadsheet is mostly data, this is not the fastest – you’re copying the entire thing just to run a few calculations.