PostgreSQL Source FAQ

PostgreSQL Sources FAQ for Streamkap

This FAQ focuses on using PostgreSQL as a source in Streamkap, including general self-hosted setups and cloud variants (AWS RDS/Aurora, Azure, Google Cloud SQL, Neon). Streamkap’s PostgreSQL connector provides real-time CDC with managed features like automatic scaling, UI setup, and ETL transformations.

What is a PostgreSQL source in Streamkap?

A PostgreSQL source in Streamkap enables real-time Change Data Capture (CDC) from PostgreSQL databases, capturing row-level inserts, updates, and deletes with sub-second latency. It uses logical replication to stream changes to destinations, supporting snapshots for initial loads, schema evolution, and heartbeats for low-traffic DBs. Streamkap abstracts complexity, offering a serverless setup via UI or API.

What PostgreSQL versions are supported as sources?

PostgreSQL 10+ for basic CDC; 13+ for advanced features like read-only snapshots and partitioned tables with publish_via_partition_root. Compatible with PostgreSQL 15–17+ in cloud deployments.
PostgreSQL 9.4+ for logical decoding; 10+ for pgoutput plugin; 13+ for read-only incremental snapshots; 16+ for replica server slots; 17+ for failover-enabled slots.

What PostgreSQL deployments are supported?

Streamkap supports:

Self-hosted (on-prem/VM)
AWS RDS PostgreSQL (including Aurora and Serverless)
Azure Database for PostgreSQL
Google Cloud SQL for PostgreSQL
Neon PostgreSQL

Streamkap also supports these, plus CrunchyBridge and Amazon RDS, with similar configs.

What are the key features of PostgreSQL sources in Streamkap?

CDC: Log-based via pgoutput decoder; captures inserts/updates/deletes
Snapshots: Ad-hoc/initial backfills using read-only (PG 13+) or signal table methods; watermarking for minimal impact
Schema Evolution: Automatic handling of adds/drops/type changes
Heartbeats: Enabled by default to prevent WAL buildup in low-traffic DBs via a streamkap_heartbeat table
Data Types: Standard mappings (numerics, temporals, strings, binary as bytes/base64/hex, arrays/JSON/hstore); supports pgvector/VECTOR types in latest Streamkap
Ingestion Modes: Inserts (append) or upserts
Security: SSL, SSH/VPN, IP allowlisting
Monitoring: Latency, lag, WAL metrics in-app
Streamkap adds transaction metadata, ad-hoc snapshots with filters, and custom types like PostGIS/GEOMETRY

How does CDC work for PostgreSQL sources?

Streamkap reads PostgreSQL’s write-ahead log (WAL) via logical decoding, emitting changes as events. It uses Streamkap’s pgoutput plugin for native support. REPLICA IDENTITY FULL ensures complete before/after images for updates/deletes.

What is REPLICA IDENTITY and why does it matter?

REPLICA IDENTITY controls logged data for changes:

DEFAULT: Logs PK only (partial deletes)
FULL: Logs all columns (complete records, recommended for audits/soft deletes).

Set via ALTER TABLE ... REPLICA IDENTITY FULL;. Required for full delete handling.

How do snapshots work for PostgreSQL sources?

Trigger ad-hoc at source/table level.
Methods: Read-only (PG 13+, no signal table) or streamkap_signal table. Uses watermarking for incremental loads.
Modes like initial, always, when_needed; supports incremental (chunked, parallel) and read-only (PG 13+); ad-hoc via signaling. Streamkap simplifies triggering via UI.

What are heartbeats and how do they work?

Heartbeats ensure the Connector stays active and continues capturing changes, which is especially important for low-traffic or intermittent databases.Connectors use “offsets”—like bookmarks—to track their position in the database’s log or change stream. When no changes occur for long periods, these offsets may become outdated, and the Connector might lose its place or stop capturing changes.There are two layers of heartbeat protection:Layer 1: Connector heartbeats (enabled by default)The Connector periodically emits heartbeat messages to an internal topic, even when no actual data changes are detected. This keeps offsets fresh and prevents staleness.No configuration is necessary for this layer; it is automatically enabled.Layer 2: Source database heartbeats (requires configuration)For read-write connections (when Read only is No during Streamkap Setup—if applicable), you can configure the Connector to update a dedicated heartbeat table in the source database at regular intervals. This simulates activity, ensuring change events are generated consistently, maintaining log progress and preventing staleness. It’s especially useful for databases like PostgreSQL, where lack of regular changes can cause the write-ahead log (WAL) to grow rapidly during inactivity.This layer requires you to set up a heartbeat table in your source database.

Configure source database heartbeats

-- Create the streamkap schema
CREATE SCHEMA IF NOT EXISTS streamkap;

-- Switch to the streamkap schema
SET search_path TO streamkap;

-- Create the heartbeat table with id, text, and last_update fields
CREATE TABLE streamkap_heartbeat (
    id SERIAL PRIMARY KEY,
    text TEXT,
    last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Grant permission to the Streamkap user
GRANT USAGE ON SCHEMA streamkap TO STREAMKAP_USER;
GRANT SELECT, INSERT, UPDATE ON TABLE streamkap_heartbeat TO STREAMKAP_USER;

-- Insert the first row into the heartbeat table
INSERT INTO streamkap_heartbeat (text) VALUES ('test_heartbeat');

Heartbeat tables and PostgreSQL publicationsIf the streamkap_pub publication created during PostgreSQL Setup was for specific tables e.g. CREATE PUBLICATION streamkap_pub FOR TABLE table1, table2, table3, ...; instead of FOR ALL TABLES;, you must add the heartbeat table to the publication: ALTER PUBLICATION streamkap_pub ADD TABLE streamkap.streamkap_heartbeat;.

What data types are supported?

Basics: Integers, floats, strings, dates/timestamps (micro/nano precision modes)
Advanced: Arrays, JSON, binary (bytes/hex), decimals (precise/double/string modes)
Custom: Domain types, network addresses, PostGIS (GEOMETRY/GEOGRAPHY), pgvector (VECTOR/HALFVEC/SPARSEVEC)
Unsupported: Non-UTF8 encodings; some spatial/custom without config

How to monitor WAL for PostgreSQL sources?

Use queries like:

SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn))
FROM pg_replication_slots
WHERE slot_name = 'streamkap_slot';

Tools: Datadog (wal_bytes), New Relic/Grafana (custom queries)
Best Practices: Retain 3–5 days WAL; alert on growth; drop inactive slots
Streamkap: Monitor via JMX; WAL issues from inactive slots common

What are common limitations?

Non-UTF8 unsupported
Data loss risk on upgrades (recreate slots)
Read replicas limit snapshots
No DDL events
PK changes need manual handling
TOASTed values may be incomplete
Consistency issues pre-commit
Generated columns missing in pgoutput
WAL buildup without heartbeats
Schema changes during snapshots unsupported

How to handle deletes?

Use REPLICA IDENTITY FULL for complete records. Supports soft deletes. In upserts, deletes propagate as events.

Troubleshooting common issues

WAL Buildup: Enable heartbeats; monitor/drop slots; retain 3–5 days
Upgrade Data Loss: Stop writes, capture events, recreate slot, resnapshot
Connection Failures: Verify pg_hba.conf, firewalls, SSL
Missing Events: Ensure publication includes tables; check REPLICA IDENTITY

Can CDC capture database Views, Materialized Views, and other virtual objects?

No, CDC cannot capture Views or most virtual database objects.Why Views cannot be captured:
CDC captures changes by reading the database transaction log (binlog, WAL, oplog, redo log, etc.). Views are query-time computations over base tables—they don’t store data or generate transaction log entries. When you query a view, the database engine executes the underlying SELECT statement against the base tables. Since views don’t store data, they don’t generate transaction log entries.What cannot be captured:

Views: Virtual tables with no physical storage or WAL entries
Materialized Views: Special case—can be captured if they have REPLICA IDENTITY configured, but refresh operations may not generate standard change events. Better to capture the source tables.
Temporary Tables: Session-scoped, not logged persistently in WAL
Unlogged Tables: Explicitly excluded from WAL by design
Foreign Tables: Reference external data sources, not local storage
System/Catalog Tables (information_schema, pg_catalog): Metadata representations, not user data
CTEs (Common Table Expressions): Query-time constructs with no persistent storage

Solution:
Configure CDC on the underlying base tables that power your views. The view logic can be recreated in your destination or transformation layer.Example:
If you have a view sales_summary that queries tables orders and customers, capture the orders and customers tables instead, then recreate the view logic downstream.

Best practices for PostgreSQL sources

Use dedicated replication user
Limit publications to needed tables
Enable auto-vacuum; set WAL retention
Test snapshots in staging
For cloud: Monitor provider tools; use heartbeats for low traffic
Use PG 17+ for failover resilience

Getting Started

App

Deployment

Sources

Destinations

Transformation

Billing

Security

Rest API

PostgreSQL Sources FAQ for Streamkap

Configure source database heartbeats

Getting Started

App

Deployment

Sources

Destinations

Transformation

Billing

Security

Rest API

​PostgreSQL Sources FAQ for Streamkap

​Configure source database heartbeats

PostgreSQL Sources FAQ for Streamkap

Configure source database heartbeats