7 comments

  • codegeek 11 minutes ago
    Stupid question but does this shard the database as well or do we shard manually and then setup the configuration accordingly ?
    • levkk 7 minutes ago
      It shards it as well. We handle schema sync, moving table data (in parallel), setting up logical replication, and application traffic cutover. The zero-downtime resharding is currently WIP, working on the PR as we speak: https://github.com/pgdogdev/pgdog/pull/784.
      • codegeek 5 minutes ago
        Incredible. I am really interested in trying pgdog for our B2B SAAS product. Will do some testing.
  • mijoharas 1 hour ago
    Happy pgdog user here, I can recommend it from a user perspective as a connection pooler to anyone checking this out (we're also running tests and positive about sharding, but haven't run it in prod yet, so I can't 100% vouch for it on that, but that's where we're headed.)

    @Lev, how is the 2pc coming along? I think it was pretty new when I last checked, and I haven't looked into it much since then. Is it feeling pretty solid now?

    • levkk 1 hour ago
      It feels better now, but we still need to add crash protection - in case PgDog itself crashes, we need to restore in-progress 2pc transaction records from a durable medium. We will add this very soon.
  • jackfischer 29 minutes ago
    Congrats guys! Curious how the read write splitting is reliable in practice due to replication lag. Do you need to run the underlying cluster with synchronous replication?
    • levkk 24 minutes ago
      Not really, replication lag is generally an accepted trade-off. Sync replication is rarely worth it, since you take a 30% performance hit on commits and add more single points of failure.

      We will add some replication lag-based routing soon. It will prioritize replicas with the lowest lag to maximize the chance of the query succeeding and remove replicas from the load balancer entirely if they have fallen far behind. Incidentally, removing query load helps them catch up, so this could be used as a "self-healing" mechanism.

  • noleary 46 minutes ago
    > If you build apps with a lot of traffic, you know the first thing to break is the database.

    Just out of curiosity, what kinds of high-traffic apps have been most interested in using PgDog? I see you guys have Coinbase and Ramp logos on your homepage -- seems like fintech is a fit?

    • levkk 32 minutes ago
      We have all kinds, it's not specific to any particular sector. That's kind of the beauty for building for Postgres - everyone uses it in some capacity!

      My general advice is, once you see more than 100 connections on your database, you should consider adding a connection pooler. If your primary load exceeds 30% (CPU util), consider adding read replicas. This also applies if you want some kind of workload isolation between databases, e.g. slow/expensive analytics queries can be pushed to a replica. Vertically scaling primaries is also a fine choice, just keep that vertical limit in mind.

      Once you're a couple instance types away from the largest machine your cloud provider has, start thinking about sharding.

      • mystifyingpoi 15 minutes ago
        > If your primary load exceeds 30% (CPU util), consider adding read replicas.

        I'm not an expert, but isn't this excessive? In theory you could triple the load and still have slack. I'd actually try to scale down, not up.

  • octoclaw 1 hour ago
    The cross-shard aggregate rewriting is really nice. Transparently injecting count() for average calculations sounds straightforward but there are so many edge cases once you add GROUP BY, HAVING, subqueries, etc.

    Curious about latency overhead for the common case. On a direct-to-shard read where no rewriting happens, what's the added latency from going through PgDog vs connecting to Postgres directly? Sub-millisecond?

    • levkk 56 minutes ago
      Subms typically, yeah. We measured the average latency between nodes in the same AZ (e.g., AWS availability zone) to be less than one ms, so you need to account for one extra hop and processing time by PgDog, which is typically fast.

      That being said if you don't currently use a connection pooler, you will notice some latency when adding one. It's usually table stakes for Postgres at scale since you need one anyway, but it can be surprising. This especially affects "chatty" apps - the ones that send 10+ queries to service one API request, and makes bugs like N+1s considerably worse.

      TLDR: not a free lunch, but generally acceptable at scale.

  • I_am_tiberius 24 minutes ago
    I really hope to use the sharding feature one day.
  • cpursley 1 hour ago
    Looks great - I'd love to include it in https://postgresisenough.dev (just put in a PR: https://github.com/agoodway/postgresisenough?tab=readme-ov-f...)
    • nebezb 39 minutes ago
      While the lift to add to your database is low, I don’t think you’re at a point you can outsource the work.

      But all the better if they do!

    • aram99 43 minutes ago
      .
    • verdverm 48 minutes ago
      Why don't you just do it yourself if you maintain a curated resource list?
      • cpursley 21 minutes ago
        Wanted to give them chance to write it up as they like