MySQL Source FAQ

This FAQ focuses on using MySQL as a source in streamkap, including general self-hosted setups and cloud variants (AWS RDS/Aurora, Google Cloud SQL, Azure MySQL Database). Streamkap’s MySQL connector provides real-time CDC with managed features like automatic scaling, UI setup, and ETL transformations.

What is a MySQL source in Streamkap?

A MySQL source in streamkap enables real-time Change Data Capture (CDC) from MySQL databases, capturing row-level inserts, updates, and deletes with sub-second latency. It uses the binary log (binlog) to stream changes to destinations, supporting snapshots for initial loads, schema evolution, and handling for schema changes. Streamkap offers a serverless setup via UI or API.What MySQL versions are supported as sources?Streamkap: MySQL 5.7+ for basic CDC; 8.0+ for advanced features like GTIDs and query log events; 8.4+ for latest enhancements. Compatible with MySQL 5.6.5+ in limited modes.

What MySQL deployments are supported?

Streamkap supports:

Self-hosted/Generic (on-prem/VM)
AWS RDS MySQL (including Aurora)
Google Cloud SQL for MySQL
Azure MySQL Database
Any other MySQL compatible platform

Streamkap also supports standalone, primary/replica, high availability clusters, and multi-primary topologies.

What are the key features of MySQL sources in Streamkap?

CDC: Binlog-based for inserts/updates/deletes; captures schema changes
Snapshots: Ad-hoc/initial using global/table locks; incremental or blocking
Schema Evolution: Automatic DDL tracking; emits change events
Data Types: Standard mappings (numerics, temporals, strings, binary configurable, JSON, ENUM/SET); temporal/decimal/binary handling modes
Ingestion Modes: Inserts (append) or upserts
Security: SSL, authentication
Monitoring: Latency, lag, binlog metrics
Heartbeats: For low-traffic databases to advance binlog position via a heartbeat table
Streamkap adds transaction metadata, row filtering, and original SQL queries if enabled

How does CDC work for MySQL sources?

Streamkap reads MySQL’s binary log (binlog) to capture committed operations, emitting changes as events.Requires binlog_format=ROW and binlog_row_image=FULL.
Uses GTID mode for position tracking in multi-primary setups, recommended for reliability.

What is binlog_row_image and why does it matter?

binlog_row_image controls logged data for changes:

MINIMAL: Logs PK only (partial deletes)
FULL: Logs all columns (complete records, recommended)

Set to FULL for full delete handling.

How do snapshots work for MySQL sources?

Trigger ad-hoc at source/table level.

Methods: Global read lock or table-level; incremental (chunked by PK, default 1024 rows) or blocking
Modes: initial (default), always, initial_only, no_data, when_needed, configuration_based, custom

Streamkap simplifies triggering via UI.

What data types are supported?

Basics: Booleans (BOOLEAN), integers (INT8/16/32/64), floats (FLOAT32/64), strings (STRING), dates/timestamps (adaptive/connect modes)
Advanced: Binary (BYTES/base64/hex), decimals (precise/double/string modes), JSON (STRING/io.debezium.data.Json), ENUM/SET (STRING/io.debezium.data.Enum/Set)
Character: CHAR/VARCHAR/TEXT (STRING)
Unsupported: Non-UTF8; spatial (GEOMETRY); MyISAM tables; binlog expiration losses

What are heartbeats and how do they work?

Heartbeats ensure the Connector stays active and continues capturing changes, which is especially important for low-traffic or intermittent databases.Connectors use “offsets”—like bookmarks—to track their position in the database’s log or change stream. When no changes occur for long periods, these offsets may become outdated, and the Connector might lose its place or stop capturing changes.There are two layers of heartbeat protection:Layer 1: Connector heartbeats (enabled by default)The Connector periodically emits heartbeat messages to an internal topic, even when no actual data changes are detected. This keeps offsets fresh and prevents staleness.No configuration is necessary for this layer; it is automatically enabled.Layer 2: Source database heartbeats (requires configuration)For read-write connections (when Read only is No during Streamkap Setup—if applicable), you can configure the Connector to update a dedicated heartbeat table in the source database at regular intervals. This simulates activity, ensuring change events are generated consistently, maintaining log progress and preventing staleness. It’s especially useful for databases like PostgreSQL, where lack of regular changes can cause the write-ahead log (WAL) to grow rapidly during inactivity.This layer requires you to set up a heartbeat table in your source database.

Configure source database heartbeats

-- Create the streamkap schema
CREATE SCHEMA IF NOT EXISTS streamkap;

-- Switch to the streamkap schema
USE streamkap;

-- Create the heartbeat table with id, text, and last_update fields
CREATE TABLE streamkap_heartbeat (
    id INT AUTO_INCREMENT PRIMARY KEY,
    text TEXT,
    last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Grant permission to the Streamkap user
GRANT SELECT, INSERT, UPDATE ON streamkap.streamkap_heartbeat TO STREAMKAP_USER;

-- Insert the first row into the heartbeat table
INSERT INTO streamkap_heartbeat (text) VALUES ('test_heartbeat');

How to monitor binlog for MySQL sources?

Use queries like SHOW BINARY LOGS for size/position; tools like Datadog/New Relic for lag.Best Practices: Retain 3–5 days; alert on expiration/growth.

What are common limitations?

MyISAM unsupported
Binlog expiration can lose events
PK changes require manual handling
TOASTed equivalents incomplete without FULL image
High binlog growth without monitoring
Schema changes during snapshots unsupported
Read replicas for snapshots in some cloud setups

How to handle deletes?

Captures deletes as events with before images if binlog_row_image=FULL
Supports soft deletes

What security features are available?

Encrypted connections (SSL), IP allowlisting, role-based access (dedicated user recommended).

Troubleshooting common issues

Binlog Buildup: Enable heartbeats; monitor retention; drop unused logs
Data Loss on Upgrade: Stop writes, capture all events, recreate positions post-upgrade, resnapshot
Missing Events: Ensure binlog enabled and tables included; check REPLICATION privileges

Can CDC capture database Views, Materialized Views, and other virtual objects?

No, CDC cannot capture Views or most virtual database objects.Why Views cannot be captured:
CDC captures changes by reading the database transaction log (binlog, WAL, oplog, redo log, etc.). Views are query-time computations over base tables—they don’t store data or generate transaction log entries. When you query a view, the database engine executes the underlying SELECT statement against the base tables. Since views don’t store data, they don’t generate transaction log entries.What cannot be captured:

Views: Virtual tables with no physical storage or binlog entries
Temporary Tables: Session-scoped, not logged persistently in binlog
MEMORY Tables: In-memory storage engine, data lost on restart, not in binlog
BLACKHOLE Tables: Discard all writes, no data to capture
System Tables (information_schema, performance_schema, mysql): Metadata representations, not user data
CTEs (Common Table Expressions): Query-time constructs with no persistent storage

Solution:
Configure CDC on the underlying base tables that power your views. The view logic can be recreated in your destination or transformation layer.MySQL-specific notes:

MyISAM tables: Not supported for CDC as they don’t participate in transactions or generate complete binlog entries. Convert to InnoDB.
Views created with ALGORITHM=TEMPTABLE: Still not capturable; capture the base tables instead

Example:
If you have a view customer_orders_summary that queries tables customers and orders, capture the customers and orders tables instead, then recreate the view logic downstream.

Best practices for MySQL sources

Use dedicated replication user (not root)
Enable GTID mode for better failover
Limit databases/tables to reduce binlog size
Enable auto-vacuum equivalents; set binlog retention to 3-5 days
For deletes, use FULL binlog_row_image
Test snapshots in staging; monitor lag via tools
For cloud: Use provider monitoring; enable heartbeats for low traffic

Getting Started

App

Deployment

Sources

Destinations

Transformation

Billing

Security

Rest API

Configure source database heartbeats

Getting Started

App

Deployment

Sources

Destinations

Transformation

Billing

Security

Rest API

​Configure source database heartbeats

Configure source database heartbeats