Streamkap Setup

To set up the Connector, you will need to gather connection details and configure your DataBricks cluster. Log in to your Databricks Cloud Account and then follow the steps below.

Get connection details

Streamkap connects to Databricks via a JDBC URL. You can use either an All-Purpose Compute or a SQL Warehouse as the compute resource.

Option A: All-Purpose Compute

Open the Compute page from the sidebar and choose your cluster
Click on Advanced Options
Open the JDBC/ODBC tab
Copy the JDBC Connection URL

Option B: SQL Warehouse

A SQL Warehouse can automatically scale across multiple Spark clusters to handle concurrent workloads, but is generally more expensive than an All-Purpose Cluster. To get the JDBC URL for a SQL Warehouse:

Open the SQL Warehouses page from the sidebar
Select your warehouse
Open the Connection Details tab
Copy the JDBC URL

For both options, you can append ConnCatalog=<your catalog name> to the JDBC URL to select a catalog other than the default.

Generate an access token

For setting the Streamkap DataBricks’ Token:

Open Settings page from the sidebar and then User Settings
Open the Personal Access Tokens tab
Click + Generate New Token
(Optional) Enter a comment and change the token lifetime
Click Generate
Copy the access token

Create a temporary directory

Create tmp directory on the Databricks File System (DBFS)

How it works

As data’s streamed from the source in to topics (think of them as partitioned tables), the Databricks Sink connector will:

Check whether tables for the topics exist in Databricks, if not, it creates them
Automatically handle schema evolution when the source schema changes (e.g. new columns, data type changes)
Stream change data into Parquet files and upload them to the tmp directory on the Databricks File System (DBFS) and:
- Load data to the target table using SQL bulk import COPY
- Clean up the Parquet files

Ingestion Modes

Streamkap supports two ingestion modes for writing data to Databricks Delta Lake: Upsert and Append.

Upsert

Upsert mode uses a MERGE INTO statement to insert new records and update existing ones based on the primary key columns from the source table.

New records (no matching primary key in the target) are inserted
Existing records (matching primary key) are updated with the latest values
Deleted records (when hard delete is enabled) are physically removed from the target table
Out-of-order protection: Streamkap tracks record timestamps and offsets to ensure older records never overwrite newer data

Upsert is the recommended mode for most use cases, as it keeps your target table in sync with the source and handles updates and deletes automatically.

Append

Append mode uses a simple INSERT INTO statement to add all incoming records as new rows.

Every record is inserted regardless of whether a row with the same key already exists
No deduplication or update logic is applied
Deletes from the source are not reflected in the target

Append is useful for event logs, audit trails, or any scenario where you want to preserve every change as a separate row rather than maintaining a current-state replica.

Documentation Index

​Streamkap Setup

​Get connection details

​Option A: All-Purpose Compute

​Option B: SQL Warehouse

​Generate an access token

​Create a temporary directory

​How it works

​Ingestion Modes

​Upsert

​Append

Streamkap Setup

Get connection details

Option A: All-Purpose Compute

Option B: SQL Warehouse

Generate an access token

Create a temporary directory

How it works

Ingestion Modes

Upsert

Append