Postgres notes

Some notes on Postgres. Sort of a part of a personal wiki I’m working on.


RLS is nice, and for things like PostgREST it’s pretty useful. But there are a lot of things to think about if you’re exposing it over the open internet using only RLS. For example, like Supabase does.

One big thing is simply column-level security. If we’re going to use RLS to say some users can only select/update/insert/delete some rows, it follows that you might want to say that only some users to be able to select or update some columns.

There aren’t a lot of good ways to do this. The sloppy way is to use CREATE VIEW based on a table, and omit some data from it. But you can’t insert into it. A more complete way is to use GRANT to control columns themselves.

---  revoke all
revoke all privileges on objects from authenticated;
---  explicitly grant by action and columns
grant select (id, title, size, created_at) on objects to authenticated;

A much weirder alternative is to use column encryption. (Suggested here https://www.enterprisedb.com/postgres-tutorials/how-implement-column-and-row-level-security-postgresql.) But then you end up having to load the key for specific queries, which is just about as much work as using CREATE VIEW.

On the whole there aren’t many rock-solid solutions here. RLS + CLS are pretty enterprise-ish features, so it makes sense, as big organizations would be the ones willing to pay the tax to use them. Still, it seems very natural to want to do CLS, and expose some subset of your database to users/clients/partners without defining an entire API. For all its faults, SQL is far more expressive that any API/RPC/GraphQL wrapper.


For RLS policies, USING statements check existing table rows, while WITH CHECK are checking for new rows.


NOTIFY is neat.

Here (https://coussej.github.io/2015/09/15/Listening-to-generic-JSON-notifications-from-PostgreSQL-in-Go) is a pretty good post on how to use notify/listen logic. You basically create a trigger function on a table, and use pg_notify to send some value over a channel. Then, using external application code, like Go or something, you create a listener using the API, and listen on a specific channel name.

It seems like best practices are to create a custom record type or use JSON as the payload, but only include the data that your listener needs to know about. Then in your notification loop, you either use just that data, or you query to enrich your data. This is because your channel payload is limited to 8000 bytes.

Something like:

create or replace function notify_event() returns trigger as
$$
begin
    -- This is where you put your logic to build the event.
    -- Doesn't have to be JSON, could be whatever you'd like.
    -- Could even use table for a queuing system.
    perform pg_notify('events', '{"key":21921}');
    return null;
end;
$$ language plpgsql;

PostgREST (https://postgrest.org/en/stable/) has become more popular over the last couple of years. When combined with functions, it may be one of the best ways to move quickly with pg. I’d never expose it to the public internet without doing really thorough testing of RLS, policies, and permissions. But for an internal service? Seems about right.


You can write pg functions in sql, C, pgsql, TCL, Perl, and Python: https://www.postgresql.org/docs/current/xplang.html. My take on the PL languages is this; sql for functions makes sense, C is just a way to allow for external languages to be used, pgsql is basically just sql that lets you do for-loops and variables, but TCL, Perl, and Python all seem like a bridge too far. What are we doing here? Who let these clowns in?


Writing functions in an external C-ish language is possible. Good blog post about it here: https://www.opsdash.com/blog/postgresql-triggers-golang.html.

You basically write an isolated function and use GCC compile it to a .so file, and then register that in pg by loading it via path, and telling pg to consider it written in C.

It’s a little rough, because you have to do all the error handling yourself inside your code, and depending on how you allocate memory, it could be an issue, but it’s possible.


Supabase is a solid way to work with PG locally, with a focus on a sort of “monolith-in-a-box” set of features. Comes with pgsodium, pg_stat_statements, pgcrypto, pgjwt, uuid-ossp. and pg_graphql installed. Also has an okay UI for running queries and goofing around if you don’t want to install DBeaver.


Backups are generally done with https://www.postgresql.org/docs/current/app-pgdump.html, or https://github.com/wal-g/wal-g. Both offer point-in-time snapshots, but I’ve heard from a couple of DBAs that WAL-G is the more robust solution that companies are using if they’re running their own DBs. Managed PG providers go lower than that, and just use block-level snapshots of a disk, because the instances are attached to logical volumes on physical SSDs.


Running cron tasks in pg seems to be done mostly with https://github.com/citusdata/pg_cron, but at the cost of running an additional sort-of “side-car” service. Many managed services that offer pg have this extension available, but maybe not enabled, by default.

I love nearly all of pg’s superpowers, but this is one where I question the requirement. Seems like there aught to be a better way to do scheduling stuff for pg, and it’s not a check-by-jowl service emulating a unix scheduler.

Keeping an eye on this space.


Monitoring the progress of a query seems super difficult. There are crude tricks you can use like using a total, and then incrementing a value towards that total. More on that here: https://www.brianlikespostgres.com/postgres-query-progress-bar.html.

There’s nothing like sqlite’s sqlite3_progress_handler, but that makes sense as pg is not an embedded db. (What would that even look like? Something inside like NOTIFY/LISTEN?)

There are more complex things you can do, some of them outlined here: https://postgres.ai/blog/20220114-progress-bar-for-postgres-queries-lets-dive-deeper. The tl;dr is that it’s somewhat possible, but you need to do a lot of weird things, and possibly patch the core, and even then you get down to it there are still a lot of caveats.

I’m keeping an eye on this space as well.


Foreign data wrappers are interesting. I think of them as roughly webhooks for postgres. postgres_fdw appears to be the most common – people have multiple databases or clusters per service, and they want to run queries across all of them. Although, as long as you properly conformed to pg’s wire protocol, you could use a postgres_fdw with almost any service. (I’m sure there’s a company out there doing exactly this to offer a BI or data lake service.)


When using VIEW and MATERIALIZED VIEW, one is stored, and the other is not. A plain VIEW is just a query that is named like a table, and is run when it’s used. But a MATERIALIZED VIEW is stored, and needs to be refreshed using the REFRESH statement.

I think of this as a VIEW being a reference to a query, and a MATERIALIZED VIEW is a cached copy of a query’s results, refreshed on demand.


Table inheritance could be useful, but also could be a foot-gun. My gut tells me this is what VIEW and MATERIALIZED VIEW are for, along with proper schema design.


The search path is a shorthand way of setting the schemas to use in a query, so we don’t have to write it out each time. Eg: SET search_path TO public; I prefer to set it on all functions, just to be explicit about it.


Quick and dirty way to check email validity inside a plpgsql function.

create
    or replace function "users.create"(email text)
    returns uuid
    language plpgsql
    security definer
    set search_path = public
as
$$
declare
    param_email text;
    output_id   uuid;
begin
    -- Using lowercase here because I use it elsewhere in my system.
    param_email := lower(email);
    output_id := gen_random_uuid();
    if
        param_email !~ '^[a-z0-9._%-]+@[a-z0-9.-]+[.][a-z]+$' then
        return 0;
    end if;
    -- Actually create the user here....
end;
$$;

A lot of opinions on doing this over at StackOverflow:

This isn’t a replacement to email validation/confirmation, but just helps me normalize things.


Using COMMENTS is nice, but depending on the object (class?) they’re on, it can be difficult to look them up. Easy for tables, but hard for policies, for example. Which is a bummer because that’s the thing that I’d like to use them on the most. Policies for RLS are the most complex parts of a schema, and a part where getting it wrong, or being unclear is very bad.

Here’s what I want to do:

create
    policy "objects.update"
    on objects for
    update
    using (id in (select object_id
                  from permissions
                  where individual_id = auth.uid()));
comment on policy "objects.update" on "objects"
    is 'Objects can be updated by individuals with access.';

While you can use commands like \d through psql, doing it without the command line gets harder.

--- List descriptions, but doesn't include _policy_ comments...
SELECT obj_description(oid)
FROM pg_class;

--- Gives you details on the policy logic, but doesn't include comments...
select *
from pg_policies;

--- Need to do something like this to get policy comments specifically...
select *
from pg_description
where objoid = (select oid
                from pg_policy
                where polname = 'objects.update');

You can get there in the end, but it’s just a little weird.


There are blank spaces in the pg error code ranges, which lets you build out your own custom errors.

create function __custom_raise_permissions_error() returns integer
    language plpgsql as
$$
begin
    raise 'permission denied' using errcode = 'D3GA2';
    return 1;
end;
$$;

If your database is absorbing some service logic (hey, a lot of people are using PostgREST these days) then it might be useful to add some errors that are specific to your schema.

The main drawbacks are that you’re now closer to writing code in the database, and some extensions implement their own error codes, so you’ve got to watch out for that too.


Useful docs pages that I keep coming back to.


  • Last edited 01/25/2023 11:58AM
postgres | sql | notes | programming
2023-01-19