PGQ excels at asynchronous task processing and offers deep observability. This makes it a key component of the Dataddo platform.
Hi. We are Tomas and Tomas, CTO and Backend Team Lead at Dataddo, a company that provides a fully managed, end-to-end data integration platform. Clients in time zones across the globe use our platform, so, as you can imagine, we need to process loads of integration jobs at several peak periods throughout the day.
Optimized queuing, therefore, is essential for preventing the memory bloating of our data transferring and processing services, and ensuring the stability of our platform.
But finding a tool that could meet all our queueing needs proved to be a challenge. So, we built our own: PGQ—a free, open-source message broker built on top of Postgres, which is specifically designed to handle long-running jobs.
What Challenges Prompted Us to Build PGQ?
Before building PGQ, we considered a range of queueing tools and eventually decided that RabbitMQ would be the most suitable. RabbitMQ is indeed excellent for short-running jobs, but it couldn’t efficiently meet all our needs because:
- Our jobs are often too long-running, so preventing heartbeat timeouts and reconnects was challenging (especially since we were using a RabbitMQ instance managed by a cloud provider).
- RabbitMQ's observability is limited (you can only see what is waiting to be processed, not what is being or what has already been processed)
- RabbitMQ runs on technology separate from Postgres (our main production database), so we had a lot of new things to learn and maintain.
PGQ solves all of these issues for us because:
- It’s designed to handle long-running jobs
- It’s reliable and easily observable (we can see the details of any jobs that have been processed, are being processed, or will be processed)
- It queues on top of Postgres, so we don’t need any additional knowhow or engineering resources to use it
- It uses regular SQL statements, and its consumer and publisher implementations are basic, so even junior developers can use it
- Postgres just works and has stood the test of time
How Do We Use PGQ?
We use PGQ for four main things:
- Long-running jobs - Loading, writing, and processing data (200k+ jobs per day)
- Short-running jobs - Sending emails, saving logs, and updating entities (1000k+ jobs per day)
- Asynchronous app communication - Go, PHP, and Node.js
- Monitoring our platform - Consumer rate, errors, and peaks (AWS RDS cluster 2x db.r6g.large)
When Should You Use PGQ Instead of Other Technologies?
PGQ could work well for you if:
- You need to use queues in your architecture and already use Postgres
- You need a reliable queueing system, but don’t need to optimize for speed
- You don’t want to incur the costs of incorporating a new technology into your infrastructure (installation, operation, maintenance, learning curve)
- You want to use—and possibly contribute to the development of—an open-source tool
One example of an organization for which PGQ might be suitable: An ecommerce company whose eshop gets thousands of orders per day, which already has Postgres in place, and which doesn’t want to deploy a big-guns solution like Kafka.
It’s also worth mentioning that PGQ is currently only for Go. However, we already use PHP consumers and publishers internally, and will soon release a PHP open-source package to the public.
When Should You Not Use PGQ?
PGQ is probably not for you if:
- You have highly advanced requirements for message routing
- You process a very high volume of messages and need to optimise for throughput
- You only need to process small messages quickly (i.e., in milliseconds).
- You have some non-trivial routing requirements
PGQ Is a Core Part of the Dataddo Platform
Every day, the Dataddo platform processes thousands of large and small extract/write jobs—ELT/ELT jobs, reverse ETL jobs, database replication jobs, and event-based integrations. Many of these jobs are very long-running (for example, due to underperforming customer databases, or huge data loads in case of full database replication).
Other message brokers either can’t handle long-running jobs very well (e.g., RabbitMQ), or are way overcomplicated for our use case (e.g., Apache Kafka). PGQ handles long-running jobs perfectly, while still being very simple in design. This is why we want to share it with the developer community.
Think PGQ might be suitable for you? See PGQ details.
Connect All Your Data with Dataddo Move your data from any online services to any data warehouse, between any two warehouses, and from any warehouse into any operational applications. |
Comments