How to Run SeaTunnel in Separated Cluster Mode on K8s

How to Run SeaTunnel in Separated Cluster Mode on K8s
文章介绍了如何在Kubernetes上使用分离集群模式部署Apache SeaTunnel。步骤包括准备环境、构建Docker镜像、配置Headless Service和Hazelcast集群、配置SeaTunnel引擎、创建Kubernetes部署文件以及客户端提交作业。 2025-8-7 08:38:42 Author: hackernoon.com(查看原文) 阅读量:20 收藏

Apache SeaTunnel is a new generation of high-performance, distributed data integration and synchronization tool that has been widely recognized and applied in the industry. SeaTunnel supports three deployment modes: Local mode, Hybrid Cluster Mode, and Separated Cluster Mode.

This article aims to introduce the deployment of SeaTunnel in Separated Cluster Mode on Kubernetes, providing a comprehensive deployment process and configuration examples for those with relevant needs.

1. Preparation

Before starting deployment, the following environments and components must be ready:

Kubernetes cluster environment
kubectl command-line tool
docker
helm (optional)

For those familiar with Helm, you can directly refer to the official Helm deployment tutorial:

This article mainly introduces deployment based on Kubernetes and kubectl tools.

2. Build SeaTunnel Docker Image

The official images of various versions are already provided and can be pulled directly. For details, please refer to the official documentation: Set Up With Docker.

docker pull apache/seatunnel:<version_tag>

Since we need to deploy cluster mode, the next step is to configure cluster network communication. The network service of the SeaTunnel cluster is implemented via Hazelcast, so we will configure this part next.

Headless Service Configuration

The Hazelcast cluster is a network formed by cluster members running Hazelcast, which automatically join together to form a cluster. This automatic joining is achieved through various discovery mechanisms used by cluster members to find each other.

Hazelcast supports the following discovery mechanisms:

Auto Discovery, supporting environments like:
AWS
Azure
GCP
Kubernetes
TCP
Multicast
Eureka
Zookeeper

In this article’s cluster deployment, we configure Hazelcast using Kubernetes auto discovery mechanism. Detailed principles can be found in the official document: Kubernetes Auto Discovery.

Hazelcast’s Kubernetes auto discovery mechanism (DNS Lookup mode) requires Kubernetes Headless Service to work. Headless Service resolves the service domain name into a list of IP addresses of all matching Pods, enabling Hazelcast cluster members to discover each other.

First, we create a Kubernetes Headless Service:

# use for hazelcast cluster join
apiVersion: v1
kind: Service
metadata:
  name: seatunnel-cluster
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app.kubernetes.io/instance: seatunnel-cluster-app
    app.kubernetes.io/version: 2.3.10
  ports:
  - port: 5801
    name: hazelcast

Key parts of the above configuration:

metadata.name: seatunnel-cluster: service name, Hazelcast clients/nodes discover cluster members through this name
spec.clusterIP: None: critical configuration declaring this as Headless Service without virtual IP
spec.selector: selector matching Pod labels that will be selected by this Service
spec.port: port exposed for Hazelcast

Meanwhile, to access the cluster externally via REST API, we define another Service for the master node Pod:

# use for access seatunnel from outside system via rest api
apiVersion: v1
kind: Service
metadata:
  name: seatunnel-cluster-master
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app.kubernetes.io/instance: seatunnel-cluster-app
    app.kubernetes.io/version: 2.3.10
    app.kubernetes.io/name: seatunnel-cluster-master
    app.kubernetes.io/component: master
  ports:
  - port: 8080
    name: "master-port"
    targetPort: 8080
    protocol: TCP

After defining the above Kubernetes Services, next configure hazelcast-master.yaml and hazelcast-worker.yaml files according to Hazelcast’s Kubernetes discovery mechanism.

Hazelcast master and worker yaml configurations

In SeaTunnel’s separated cluster mode, all network-related configuration is contained in hazelcast-master.yaml and hazelcast-worker.yaml.

hazelcast-master.yaml example:

hazelcast:
  cluster-name: seatunnel-cluster
  network:
    rest-api:
      enabled: true
      endpoint-groups:
        CLUSTER_WRITE:
          enabled: true
        DATA:
          enabled: true
    join:
      kubernetes:
        enabled: true
        service-dns: seatunnel-cluster.bigdata.svc.cluster.local
        service-port: 5801
    port:
      auto-increment: false
      port: 5801
  properties:
    hazelcast.invocation.max.retry.count: 20
    hazelcast.tcp.join.port.try.count: 30
    hazelcast.logging.type: log4j2
    hazelcast.operation.generic.thread.count: 50
    hazelcast.heartbeat.failuredetector.type: phi-accrual
    hazelcast.heartbeat.interval.seconds: 30
    hazelcast.max.no.heartbeat.seconds: 300
    hazelcast.heartbeat.phiaccrual.failuredetector.threshold: 15
    hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200
    hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 200

Key configuration items:

cluster-name
This config identifies if multiple nodes belong to the same cluster; only nodes with the same cluster-name will join the same Hazelcast cluster. Different cluster-name nodes reject requests from each other.
Network configuration

rest-api.enabled: Hazelcast REST service is disabled by default in ST 2.3.10; it must be explicitly enabled here.
service-dns (required): full domain name of the Headless Service, generally ${SERVICE-NAME}.${NAMESPACE}.svc.cluster.local.
service-port (optional): Hazelcast port; if specified and > 0, overrides default port (5701).

Using this Kubernetes join mechanism, when Hazelcast Pod starts, it resolves the service-dns to get the IP list of all member Pods (via Headless Service), and then members attempt TCP connections over port 5801.

Similarly, the hazelcast-worker.yaml configuration is:

hazelcast:
  cluster-name: seatunnel-cluster
  network:
    rest-api:
      enabled: true
      endpoint-groups:
        CLUSTER_WRITE:
          enabled: true
        DATA:
          enabled: true
    join:
      kubernetes:
        enabled: true
        service-dns: seatunnel-cluster.bigdata.svc.cluster.local
        service-port: 5801
    port:
      auto-increment: false
      port: 5801
  properties:
    hazelcast.invocation.max.retry.count: 20
    hazelcast.tcp.join.port.try.count: 30
    hazelcast.logging.type: log4j2
    hazelcast.operation.generic.thread.count: 50
    hazelcast.heartbeat.failuredetector.type: phi-accrual
    hazelcast.heartbeat.interval.seconds: 30
    hazelcast.max.no.heartbeat.seconds: 300
    hazelcast.heartbeat.phiaccrual.failuredetector.threshold: 15
    hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200
    hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 200
  member-attributes:
    rule:
      type: string
      value: worker

Through the above, we complete Hazelcast cluster member discovery configuration based on Kubernetes. Next, proceed to configure SeaTunnel engine.

4. Configure SeaTunnel Engine

The configuration related to the SeaTunnel engine is all in the seatunnel.yaml file. Below is a sample seatunnel.yaml configuration for reference:

seatunnel:
  engine:
    history-job-expire-minutes: 1440
    backup-count: 1
    queue-type: blockingqueue
    print-execution-info-interval: 60
    print-job-metrics-info-interval: 60
    classloader-cache-mode: true
    http:
      enable-http: true
      port: 8080
      enable-dynamic-port: false
      port-range: 100
    slot-service:
      dynamic-slot: true
    checkpoint:
      interval: 300000
      timeout: 60000
      storage:
        type: hdfs
        max-retained: 3
        plugin-config:
          namespace: /tmp/seatunnel/checkpoint_snapshot
          storage.type: hdfs
          fs.defaultFS: hdfs://xxx:8020 # Ensure directory has write permission
    telemetry:
      metric:
        enabled: true

This includes the following configuration information:

history-job-expire-minutes: the retention period of task history records is 24 hours (1440 minutes), after which they will be automatically cleaned up.
backup-count: 1: number of backup replicas for task state is 1.
queue-type: blockingqueue: use a blocking queue to manage tasks to avoid resource exhaustion.
print-execution-info-interval: 60: print task execution status every 60 seconds.
print-job-metrics-info-interval: 60: output task metrics (such as throughput, latency) every 60 seconds.
classloader-cache-mode: true: enable class loader caching to reduce repeated loading overhead and improve performance.
dynamic-slot: true: allow dynamic adjustment of task slot quantity based on load to optimize resource utilization.
checkpoint.interval: 300000: trigger checkpoint every 5 minutes.
checkpoint.timeout: 60000: checkpoint timeout set to 1 minute.
telemetry.metric.enabled: true: enable collection of runtime task metrics (e.g., latency, throughput) for monitoring.

5. Create Kubernetes YAML Files to Deploy the Application

After completing the above workflow, the final step is to create Kubernetes YAML files for Master and Worker nodes, defining deployment-related configurations.

To decouple configuration files from the application, the above-mentioned configuration files are merged into one ConfigMap, mounted under the container's configuration path for unified management and easier updates.

Below are sample configurations for seatunnel-cluster-master.yaml and seatunnel-cluster-worker.yaml, covering ConfigMap mounting, container startup commands, and deployment resource definitions.

seatunnel-cluster-master.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: seatunnel-cluster-master
spec:
  replicas: 2  # modify replicas according to your scenario
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 50%
  selector:
    matchLabels:
      app.kubernetes.io/instance: seatunnel-cluster-app
      app.kubernetes.io/version: 2.3.10
      app.kubernetes.io/name: seatunnel-cluster-master
      app.kubernetes.io/component: master
  template:
    metadata:
      annotations:
        prometheus.io/path: /hazelcast/rest/instance/metrics
        prometheus.io/port: "5801"
        prometheus.io/scrape: "true"
        prometheus.io/role: "seatunnel-master"
      labels:
        app.kubernetes.io/instance: seatunnel-cluster-app
        app.kubernetes.io/version: 2.3.10
        app.kubernetes.io/name: seatunnel-cluster-master
        app.kubernetes.io/component: master
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: nodeAffinity-key
                operator: Exists
      containers:
        - name: seatunnel-master
          image: seatunnel:2.3.10
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 5801
              name: hazelcast
            - containerPort: 8080
              name: "master-port"
          command:
            - /opt/seatunnel/bin/seatunnel-cluster.sh
            - -r
            - master
          resources:
            requests:
              cpu: "1"
              memory: 4G
          volumeMounts:
            - mountPath: "/opt/seatunnel/config/hazelcast-master.yaml"
              name: seatunnel-configs
              subPath: hazelcast-master.yaml
            - mountPath: "/opt/seatunnel/config/hazelcast-worker.yaml"
              name: seatunnel-configs
              subPath: hazelcast-worker.yaml
            - mountPath: "/opt/seatunnel/config/seatunnel.yaml"
              name: seatunnel-configs
              subPath: seatunnel.yaml
            - mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
              name: seatunnel-configs
              subPath: hazelcast-client.yaml
            - mountPath: "/opt/seatunnel/config/log4j2_client.properties"
              name: seatunnel-configs
              subPath: log4j2_client.properties
            - mountPath: "/opt/seatunnel/config/log4j2.properties"
              name: seatunnel-configs
              subPath: log4j2.properties

      volumes:
        - name: seatunnel-configs
          configMap:
            name: seatunnel-cluster-configs

Deployment Strategy

Use multiple replicas (replicas=2) to ensure service high availability.
Use rolling update strategy for zero downtime deployment:
maxUnavailable: 25%: ensure at least 75% of Pods are running during updates.
maxSurge: 50%: temporarily allow 50% more Pods during transition for smooth upgrade.

Label Selectors

Use Kubernetes recommended standard label system
spec.selector.matchLabels: defines the scope of Pods managed by the Deployment based on labels
spec.template.labels: labels assigned to new Pods to identify their metadata

Node Affinity

Configure affinity to specify which nodes the Pod should be scheduled on
Replace nodeAffinity-key with labels matching your Kubernetes environment nodes

Config File Mounting

Centralize core configuration files in a ConfigMap to decouple management from applications
Use subPath to mount individual files from ConfigMap

The seatunnel-cluster-worker.yaml configuration is:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: seatunnel-cluster-worker
spec:
  replicas: 3  # modify replicas according to your scenario
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 50%
  selector:
    matchLabels:
      app.kubernetes.io/instance: seatunnel-cluster-app
      app.kubernetes.io/version: 2.3.10
      app.kubernetes.io/name: seatunnel-cluster-worker
      app.kubernetes.io/component: worker
  template:
    metadata:
      annotations:
        prometheus.io/path: /hazelcast/rest/instance/metrics
        prometheus.io/port: "5801"
        prometheus.io/scrape: "true"
        prometheus.io/role: "seatunnel-worker"
      labels:
        app.kubernetes.io/instance: seatunnel-cluster-app
        app.kubernetes.io/version: 2.3.10
        app.kubernetes.io/name: seatunnel-cluster-worker
        app.kubernetes.io/component: worker
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: nodeAffinity-key
                operator: Exists
      containers:
        - name: seatunnel-worker
          image: seatunnel:2.3.10
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 5801
              name: hazelcast
          command:
            - /opt/seatunnel/bin/seatunnel-cluster.sh
            - -r
            - worker
          resources:
            requests:
              cpu: "1"
              memory: 10G
          volumeMounts:
            - mountPath: "/opt/seatunnel/config/hazelcast-master.yaml"
              name: seatunnel-configs
              subPath: hazelcast-master.yaml
            - mountPath: "/opt/seatunnel/config/hazelcast-worker.yaml"
              name: seatunnel-configs
              subPath: hazelcast-worker.yaml
            - mountPath: "/opt/seatunnel/config/seatunnel.yaml"
              name: seatunnel-configs
              subPath: seatunnel.yaml
            - mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
              name: seatunnel-configs
              subPath: hazelcast-client.yaml
            - mountPath: "/opt/seatunnel/config/log4j2_client.properties"
              name: seatunnel-configs
              subPath: log4j2_client.properties
            - mountPath: "/opt/seatunnel/config/log4j2.properties"
              name: seatunnel-configs
              sub

              subPath: log4j2.properties

      volumes:
        - name: seatunnel-configs
          configMap:
            name: seatunnel-cluster-configs

After defining the above master and worker YAML files, you can deploy them to the Kubernetes cluster by running:

kubectl apply -f seatunnel-cluster-master.yaml
kubectl apply -f seatunnel-cluster-worker.yaml

Under normal circumstances, you will see the SeaTunnel cluster running with 2 master nodes and 3 worker nodes:

$ kubectl get pods | grep seatunnel-cluster

seatunnel-cluster-master-6989898f66-6fjz8                        1/1     Running                0          156m
seatunnel-cluster-master-6989898f66-hbtdn                        1/1     Running                0          155m
seatunnel-cluster-worker-87fb469f7-5c96x                         1/1     Running                0          156m
seatunnel-cluster-worker-87fb469f7-7kt2h                         1/1     Running                0          155m
seatunnel-cluster-worker-87fb469f7-drm9r                         1/1     Running                0          156m

At this point, we have successfully deployed the SeaTunnel cluster in Kubernetes using the separated cluster mode. Now that the cluster is ready, how do clients submit jobs to it?

6. Client Submits Jobs to the Cluster

Submit Jobs Using the Command-Line Tool

All client configurations for SeaTunnel are located in the hazelcast-client.yaml file.

First, download the binary installation package locally on the client (which contains the bin and configdirectories), and ensure the SeaTunnel installation path is consistent with the server. This is what the official documentation refers to as: Setting the SEATUNNEL_HOME the same as the server, otherwise errors such as "cannot find connector plugin path on the server" may occur because the server-side plugin path differs from the client-side path.

Enter the installation directory and modify the config/hazelcast-client.yaml file to point to the Headless Service address created earlier:

hazelcast-client:
      cluster-name: seatunnel-cluster
      properties:
        hazelcast.logging.type: log4j2
      connection-strategy:
        connection-retry:
          cluster-connect-timeout-millis: 3000
      network:
        cluster-members:
          - seatunnel-cluster.bigdata.svc.cluster.local:5801

After the client configuration is done, you can submit jobs to the cluster. There are two main ways to configure JVM options for job submission:

Configure JVM options in the config/jvm_client_options file:
JVM options configured here will apply to all jobs submitted via seatunnel.sh, regardless of running in local or cluster mode. All submitted jobs will share the same JVM configuration.
Specify JVM options directly in the command line when submitting jobs:
When submitting jobs via seatunnel.sh, you can specify JVM parameters on the command line, e.g.,
sh bin/seatunnel.sh --config $SEATUNNEL_HOME/config/v2.batch.config.template -DJvmOption=-Xms2G -Xmx2G.
This allows specifying JVM options individually for each job submission.

Next, here is a sample job configuration to demonstrate submitting a job to the cluster:

env {
  parallelism = 2
  job.mode = "STREAMING"
  checkpoint.interval = 2000
}

source {
  FakeSource {
    parallelism = 2
    plugin_output = "fake"
    row.num = 16
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }
}

sink {
  Console {
  }
}

Use the following command on the client to submit the job:

sh bin/seatunnel.sh --config config/v2.streaming.example.template -m cluster -n st.example.template -DJvmOption="-Xms2G -Xmx2G"

On the Master node, list running jobs with:

$ sh bin/seatunnel.sh -l

Job ID              Job Name             Job Status  Submit Time              Finished Time            
------------------  -------------------  ----------  -----------------------  -----------------------  
964354250769432580  st.example.template  RUNNING     2025-04-15 10:39:30.588

You can see the job named st.example.template is currently in the RUNNING state. In the Worker node logs, you should observe log entries like:

2025-04-15 10:34:41,998 INFO  [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0  rowIndex=1:  SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : bdaUB, 110348049
2025-04-15 10:34:41,998 INFO  [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=1  rowIndex=1:  SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : mOifY, 1974539087
2025-04-15 10:34:41,999 INFO  [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0  rowIndex=2:  SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : jKFrR, 1828047742
2025-04-15 10:34:41,999 INFO  [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=1  rowIndex=2:  SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : gDiqR, 1177544796
2025-04-15 10:34:41,999 INFO  [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0  rowIndex=3:  SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : bCVxc, 909343602
...

This confirms the job has been successfully submitted to the SeaTunnel cluster and is running normally.

Submit Jobs Using the REST API

SeaTunnel also provides a REST API for querying job status, statistics, submitting, and stopping jobs. We configured a Headless Service for Master nodes with port 8080 exposed. This allows submitting jobs via REST API from clients.

You can submit a job by uploading the configuration file via curl:

curl 'http://seatunnel-cluster-master.bigdata.svc.cluster.local:8080/submit-job/upload' --form 'config_file=@"/opt/seatunnel/config/v2.streaming.example.template"' --form 'jobName=st.example.template'

{"jobId":"964553575034257409","jobName":"st.example.template"}

If submission succeeds, the API returns the job ID and job name as above.

To list running jobs, query:

curl 'http://seatunnel-cluster-master.bigdata.svc.cluster.local:8080/running-jobs'

[{"jobId":"964553575034257409","jobName":"st.example.template","jobStatus":"RUNNING","envOptions":{"job.mode":"STREAMING","checkpoint.interval":"2000","parallelism":"2"}, ...}]

The response shows the job status and additional metadata, confirming the REST API job submission method works correctly.

More details on the REST API can be found in the official documentation: RESTful API V2

7. Summary

This article focused on how to deploy SeaTunnel in Kubernetes using the recommended separated cluster mode. To summarize, the main deployment steps include:

Prepare the Kubernetes environment: Ensure a running Kubernetes cluster and necessary tools are installed.
Build SeaTunnel Docker images: Use the official image if no custom development is needed; otherwise, build locally and create your own image.
Configure Headless Service and Hazelcast cluster:Hazelcast’s Kubernetes auto-discovery DNS Lookup mode requires Kubernetes Headless Service, so create a Headless Service and configure Hazelcast with the service DNS accordingly. The Headless Service resolves to all pods’ IPs to enable Hazelcast cluster member discovery.
Configure SeaTunnel engine: Modify seatunnel.yaml to set engine parameters.
Create Kubernetes deployment YAML files: Define Master and Worker deployments with node selectors, startup commands, resources, and volume mounts, then deploy to Kubernetes.
Configure the SeaTunnel client: Install SeaTunnel on the client, ensure SEATUNNEL_HOME matches the server, and configure hazelcast-client.yaml to connect to the cluster.
Submit and run jobs: Submit jobs from the client to the SeaTunnel cluster for execution.

The configurations and cases presented here serve as references. There may be many other configuration options and details not covered. Feedback and discussions are welcome. Hope this is helpful for everyone!

文章来源: https://hackernoon.com/how-to-run-seatunnel-in-separated-cluster-mode-on-k8s?source=rss
如有侵权请联系:admin#unsafe.sh