Work in Progress: This page is under development. Use the feedback button on the bottom right to help us improve it.

Delta Lake

Delta Lake connector for writing streaming data to Delta Lake tables. Supports S3, GCS, Azure, and local filesystem.

Quick Example

apiVersion: laminar.io/v1
kind: Table
spec:
  name: events_delta
  connector: delta
  config:
    type: sink
    path: s3://my-bucket/delta-tables/events/
    storage_options:
      s3.region: us-east-1
      s3.access-key-id: AKIAIOSFODNN7EXAMPLE
      s3.secret-access-key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
    rolling_policy:
      file_size_bytes: 134217728
      interval_seconds: 300
    partitioning:
      fields:
        - name: event_date
          transform: identity
      shuffle_by_partition:
        enabled: true
  schema:
    format:
      parquet: {}
    fields:
      - field_name: event_id
        field_type:
          type:
            primitive: Utf8
        nullable: false
      - field_name: event_date
        field_type:
          type:
            primitive: Date32
        nullable: false

Configuration

Required

PropertyTypeDescription
typestringMust be sink
pathstringURI of the Delta Lake table (s3://, gs://, az://, or local)

Optional

PropertyTypeDescription
storage_optionsobjectCloud storage credentials (see Storage Options tab)
rolling_policyobjectWhen to create new Parquet files
partitioningobjectData partitioning configuration

JSON Schema Reference

Connection Table Schema
{
  "type": "object",
  "properties": {
    "type": {"const": "sink"},
    "path": {"type": "string"},
    "storage_options": {
      "type": "object",
      "additionalProperties": {"type": "string"}
    },
    "rolling_policy": {
      "type": "object",
      "properties": {
        "file_size_bytes": {"type": "integer"},
        "interval_seconds": {"type": "integer"},
        "inactivity_seconds": {"type": "integer"}
      }
    },
    "partitioning": {
      "type": "object",
      "properties": {
        "time_pattern": {"type": "string"},
        "fields": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": {"type": "string"},
              "transform": {"enum": ["identity", "hour", "year", "month"]}
            },
            "required": ["name"]
          }
        },
        "shuffle_by_partition": {
          "type": "object",
          "properties": {
            "enabled": {"type": "boolean"}
          }
        }
      }
    }
  },
  "required": ["type", "path"]
}