What is a MySQL source in Streamkap?
What is a MySQL source in Streamkap?
What MySQL deployments are supported?
What MySQL deployments are supported?
- Self-hosted/Generic (on-prem/VM)
- AWS RDS MySQL (including Aurora)
- Google Cloud SQL for MySQL
- Azure MySQL Database
- Any other MySQL compatible platform
What are the key features of MySQL sources in Streamkap?
What are the key features of MySQL sources in Streamkap?
- CDC: Binlog-based for inserts/updates/deletes; captures schema changes
- Snapshots: Ad-hoc/initial using global/table locks; incremental or blocking
- Schema Evolution: Automatic DDL tracking; emits change events
- Data Types: Standard mappings (numerics, temporals, strings, binary configurable, JSON, ENUM/SET); temporal/decimal/binary handling modes
- Ingestion Modes: Inserts (append) or upserts
- Security: SSL, authentication
- Monitoring: Latency, lag, binlog metrics
- Heartbeats: For low-traffic databases to advance binlog position via a heartbeat table
- Streamkap adds transaction metadata, row filtering, and original SQL queries if enabled
How does CDC work for MySQL sources?
How does CDC work for MySQL sources?
binlog_format=ROW and binlog_row_image=FULL.Uses GTID mode for position tracking in multi-primary setups, recommended for reliability.
What is binlog_row_image and why does it matter?
What is binlog_row_image and why does it matter?
binlog_row_image controls logged data for changes:- MINIMAL: Logs PK only (partial deletes)
- FULL: Logs all columns (complete records, recommended)
How do snapshots work for MySQL sources?
How do snapshots work for MySQL sources?
- Methods: Global read lock or table-level; incremental (chunked by PK, default 1024 rows) or blocking
- Modes:
initial(default),always,initial_only,no_data,when_needed,configuration_based,custom
What data types are supported?
What data types are supported?
- Basics: Booleans (BOOLEAN), integers (INT8/16/32/64), floats (FLOAT32/64), strings (STRING), dates/timestamps (adaptive/connect modes)
- Advanced: Binary (BYTES/base64/hex), decimals (precise/double/string modes), JSON (
STRING/io.debezium.data.Json), ENUM/SET (STRING/io.debezium.data.Enum/Set) - Character: CHAR/VARCHAR/TEXT (STRING)
- Unsupported: Non-UTF8; spatial (GEOMETRY); MyISAM tables; binlog expiration losses
What are heartbeats and how do they work?
What are heartbeats and how do they work?
Layer 1: Connector heartbeats (enabled by default)
The Connector periodically emits heartbeat messages to an internal topic, even when no actual data changes are detected. This keeps offsets fresh and prevents staleness.No configuration is necessary for this layer; it is automatically enabled. We recommend keeping this layer enabled for all deployments.Layer 2: Source database heartbeats (recommended)
- Read-write connections (when Read only is No during Streamkap Setup): The Connector updates the heartbeat table directly.
- Read-only connections (when Read only is Yes during Streamkap Setup): A scheduled job on the primary database updates the heartbeat table, and these changes replicate to the read replica for the Connector to consume.
pg_cron for PostgreSQL, event_scheduler for MySQL)—on your source database.For read-write connections (when Read only is No during Streamkap Setup), the Connector writes to the heartbeat table directly.How to monitor binlog for MySQL sources?
How to monitor binlog for MySQL sources?
SHOW BINARY LOGS for size/position; tools like Datadog/New Relic for lag.Best Practices: Retain 3–5 days; alert on expiration/growth.What are common limitations?
What are common limitations?
- MyISAM unsupported
- Binlog expiration can lose events
- PK changes require manual handling
- TOASTed equivalents incomplete without FULL image
- High binlog growth without monitoring
- Schema changes during snapshots unsupported
- Read replicas for snapshots in some cloud setups
How to handle deletes?
How to handle deletes?
- Captures deletes as events with before images if
binlog_row_image=FULL - Supports soft deletes
What security features are available?
What security features are available?
What permissions does the Streamkap user need?
What permissions does the Streamkap user need?
| Permission | Scope | Purpose |
|---|---|---|
REPLICATION CLIENT | *.* | Read binlog metadata and positions |
REPLICATION SLAVE | *.* | Read binlog events for CDC streaming |
RELOAD | *.* | Flush operations required for consistent snapshots |
SHOW DATABASES | *.* | Discover available databases and schemas |
SELECT | {schema}.* | Read table data during initial and incremental snapshots |
| Permission | Scope | Purpose |
|---|---|---|
SELECT | streamkap.streamkap_signal | Read signal table state |
INSERT | streamkap.streamkap_signal | Trigger snapshot signals |
UPDATE | streamkap.streamkap_signal | Update signal table state |
DELETE | streamkap.streamkap_signal | Clean up processed signals |
| Permission | Scope | Purpose |
|---|---|---|
SELECT | streamkap.streamkap_heartbeat | Read heartbeat state |
INSERT | streamkap.streamkap_heartbeat | Write heartbeat records |
UPDATE | streamkap.streamkap_heartbeat | Update heartbeat timestamps |
DELETE | streamkap.streamkap_heartbeat | Clean up old heartbeat records |
Troubleshooting common issues
Troubleshooting common issues
- Binlog Buildup: Enable heartbeats; monitor retention; drop unused logs
- Data Loss on Upgrade: Stop writes, capture all events, recreate positions post-upgrade, resnapshot
- Missing Events: Ensure binlog enabled and tables included; check REPLICATION privileges
Can CDC capture database Views, Materialized Views, and other virtual objects?
Can CDC capture database Views, Materialized Views, and other virtual objects?
CDC captures changes by reading the database transaction log (binlog, WAL, oplog, redo log, etc.). Views are query-time computations over base tables—they don’t store data or generate transaction log entries. When you query a view, the database engine executes the underlying SELECT statement against the base tables. Since views don’t store data, they don’t generate transaction log entries.What cannot be captured:
- Views: Virtual tables with no physical storage or binlog entries
- Temporary Tables: Session-scoped, not logged persistently in binlog
- MEMORY Tables: In-memory storage engine, data lost on restart, not in binlog
- BLACKHOLE Tables: Discard all writes, no data to capture
- System Tables (information_schema, performance_schema, mysql): Metadata representations, not user data
- CTEs (Common Table Expressions): Query-time constructs with no persistent storage
Configure CDC on the underlying base tables that power your views. The view logic can be recreated in your destination or transformation layer.MySQL-specific notes:
- MyISAM tables: Not supported for CDC as they don’t participate in transactions or generate complete binlog entries. Convert to InnoDB.
- Views created with
ALGORITHM=TEMPTABLE: Still not capturable; capture the base tables instead
If you have a view
customer_orders_summary that queries tables customers and orders, capture the customers and orders tables instead, then recreate the view logic downstream.How can I optimize schema history for large databases?
How can I optimize schema history for large databases?
Cascade deletes not captured by CDC
Cascade deletes not captured by CDC
ON DELETE CASCADE events to the binlog. InnoDB handles foreign key cascades internally, so child table DELETE events from cascading foreign keys are NOT captured by CDC — only the parent table delete is propagated.This is a MySQL limitation, not a Streamkap limitation. If you rely on cascading deletes, consider using application-level deletes instead of database-level cascades.See Delete Handling for more details.'Schema isn't known to this connector' error
'Schema isn't known to this connector' error
- Contact Streamkap support for schema history recovery
- Consider enabling Capture Only Captured Tables DDL in the source’s Advanced settings to prevent recurrence
- See Schema History Optimization for more information
Best practices for MySQL sources
Best practices for MySQL sources
- Use dedicated replication user (not root)
- Enable GTID mode for better failover
- Limit databases/tables to reduce binlog size
- Enable auto-vacuum equivalents; set binlog retention to 3-5 days
- For deletes, use FULL
binlog_row_image - Test snapshots in staging; monitor lag via tools
- For cloud: Use provider monitoring; enable heartbeats for low traffic
Planning a database version upgrade?
Planning a database version upgrade?