Skip to main content
This FAQ focuses on using MySQL as a source in streamkap, including general self-hosted setups and cloud variants (AWS RDS/Aurora, Google Cloud SQL, Azure MySQL Database). Streamkap’s MySQL connector provides real-time CDC with managed features like automatic scaling, UI setup, and ETL transformations.
A MySQL source in streamkap enables real-time Change Data Capture (CDC) from MySQL databases, capturing row-level inserts, updates, and deletes with sub-second latency. It uses the binary log (binlog) to stream changes to destinations, supporting snapshots for initial loads, schema evolution, and handling for schema changes. Streamkap offers a serverless setup via UI or API.What MySQL versions are supported as sources?Streamkap: MySQL 5.7+ for basic CDC; 8.0+ for advanced features like GTIDs and query log events; 8.4+ for latest enhancements. Compatible with MySQL 5.6.5+ in limited modes.
Streamkap supports:
  • Self-hosted/Generic (on-prem/VM)
  • AWS RDS MySQL (including Aurora)
  • Google Cloud SQL for MySQL
  • Azure MySQL Database
  • Any other MySQL compatible platform
Streamkap also supports standalone, primary/replica, high availability clusters, and multi-primary topologies.
  • CDC: Binlog-based for inserts/updates/deletes; captures schema changes
  • Snapshots: Ad-hoc/initial using global/table locks; incremental or blocking
  • Schema Evolution: Automatic DDL tracking; emits change events
  • Data Types: Standard mappings (numerics, temporals, strings, binary configurable, JSON, ENUM/SET); temporal/decimal/binary handling modes
  • Ingestion Modes: Inserts (append) or upserts
  • Security: SSL, authentication
  • Monitoring: Latency, lag, binlog metrics
  • Heartbeats: For low-traffic databases to advance binlog position via a heartbeat table
  • Streamkap adds transaction metadata, row filtering, and original SQL queries if enabled
Streamkap reads MySQL’s binary log (binlog) to capture committed operations, emitting changes as events.Requires binlog_format=ROW and binlog_row_image=FULL.
Uses GTID mode for position tracking in multi-primary setups, recommended for reliability.
binlog_row_image controls logged data for changes:
  • MINIMAL: Logs PK only (partial deletes)
  • FULL: Logs all columns (complete records, recommended)
Set to FULL for full delete handling.
Trigger ad-hoc at source/table level.
  • Methods: Global read lock or table-level; incremental (chunked by PK, default 1024 rows) or blocking
  • Modes: initial (default), always, initial_only, no_data, when_needed, configuration_based, custom
Streamkap simplifies triggering via UI.
  • Basics: Booleans (BOOLEAN), integers (INT8/16/32/64), floats (FLOAT32/64), strings (STRING), dates/timestamps (adaptive/connect modes)
  • Advanced: Binary (BYTES/base64/hex), decimals (precise/double/string modes), JSON (STRING/io.debezium.data.Json), ENUM/SET (STRING/io.debezium.data.Enum/Set)
  • Character: CHAR/VARCHAR/TEXT (STRING)
  • Unsupported: Non-UTF8; spatial (GEOMETRY); MyISAM tables; binlog expiration losses
Heartbeats ensure the Connector stays active and continues capturing changes, which is especially important for low-traffic or intermittent databases.Connectors use “offsets”—like bookmarks—to track their position in the database’s log or change stream. When no changes occur for long periods, these offsets may become outdated, and the Connector might lose its place or stop capturing changes.There are two layers of heartbeat protection:Layer 1: Connector heartbeats (enabled by default)The Connector periodically emits heartbeat messages to an internal topic, even when no actual data changes are detected. This keeps offsets fresh and prevents staleness.No configuration is necessary for this layer; it is automatically enabled.Layer 2: Source database heartbeats (requires configuration)For read-write connections (when Read only is No during Streamkap Setup—if applicable), you can configure the Connector to update a dedicated heartbeat table in the source database at regular intervals. This simulates activity, ensuring change events are generated consistently, maintaining log progress and preventing staleness. It’s especially useful for databases like PostgreSQL, where lack of regular changes can cause the write-ahead log (WAL) to grow rapidly during inactivity.This layer requires you to set up a heartbeat table in your source database.

Configure source database heartbeats

-- Create the streamkap schema
CREATE SCHEMA IF NOT EXISTS streamkap;

-- Switch to the streamkap schema
USE streamkap;

-- Create the heartbeat table with id, text, and last_update fields
CREATE TABLE streamkap_heartbeat (
    id INT AUTO_INCREMENT PRIMARY KEY,
    text TEXT,
    last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Grant permission to the Streamkap user
GRANT SELECT, INSERT, UPDATE ON streamkap.streamkap_heartbeat TO STREAMKAP_USER;

-- Insert the first row into the heartbeat table
INSERT INTO streamkap_heartbeat (text) VALUES ('test_heartbeat');
Use queries like SHOW BINARY LOGS for size/position; tools like Datadog/New Relic for lag.Best Practices: Retain 3–5 days; alert on expiration/growth.
  • MyISAM unsupported
  • Binlog expiration can lose events
  • PK changes require manual handling
  • TOASTed equivalents incomplete without FULL image
  • High binlog growth without monitoring
  • Schema changes during snapshots unsupported
  • Read replicas for snapshots in some cloud setups
  • Captures deletes as events with before images if binlog_row_image=FULL
  • Supports soft deletes
Encrypted connections (SSL), IP allowlisting, role-based access (dedicated user recommended).
  • Binlog Buildup: Enable heartbeats; monitor retention; drop unused logs
  • Data Loss on Upgrade: Stop writes, capture all events, recreate positions post-upgrade, resnapshot
  • Missing Events: Ensure binlog enabled and tables included; check REPLICATION privileges
No, CDC cannot capture Views or most virtual database objects.Why Views cannot be captured:
CDC captures changes by reading the database transaction log (binlog, WAL, oplog, redo log, etc.). Views are query-time computations over base tables—they don’t store data or generate transaction log entries. When you query a view, the database engine executes the underlying SELECT statement against the base tables. Since views don’t store data, they don’t generate transaction log entries.
What cannot be captured:
  • Views: Virtual tables with no physical storage or binlog entries
  • Temporary Tables: Session-scoped, not logged persistently in binlog
  • MEMORY Tables: In-memory storage engine, data lost on restart, not in binlog
  • BLACKHOLE Tables: Discard all writes, no data to capture
  • System Tables (information_schema, performance_schema, mysql): Metadata representations, not user data
  • CTEs (Common Table Expressions): Query-time constructs with no persistent storage
Solution:
Configure CDC on the underlying base tables that power your views. The view logic can be recreated in your destination or transformation layer.
MySQL-specific notes:
  • MyISAM tables: Not supported for CDC as they don’t participate in transactions or generate complete binlog entries. Convert to InnoDB.
  • Views created with ALGORITHM=TEMPTABLE: Still not capturable; capture the base tables instead
Example:
If you have a view customer_orders_summary that queries tables customers and orders, capture the customers and orders tables instead, then recreate the view logic downstream.
  • Use dedicated replication user (not root)
  • Enable GTID mode for better failover
  • Limit databases/tables to reduce binlog size
  • Enable auto-vacuum equivalents; set binlog retention to 3-5 days
  • For deletes, use FULL binlog_row_image
  • Test snapshots in staging; monitor lag via tools
  • For cloud: Use provider monitoring; enable heartbeats for low traffic