Apache SeaTunnel is a new generation of high-performance, distributed data integration and synchronization tool that has been widely recognized and applied in the industry. SeaTunnel supports three deployment modes: Local mode, Hybrid Cluster Mode, and Separated Cluster Mode.
This article aims to introduce the deployment of SeaTunnel in Separated Cluster Mode on Kubernetes, providing a comprehensive deployment process and configuration examples for those with relevant needs.
Before starting deployment, the following environments and components must be ready:
For those familiar with Helm, you can directly refer to the official Helm deployment tutorial:
This article mainly introduces deployment based on Kubernetes and kubectl tools.
The official images of various versions are already provided and can be pulled directly. For details, please refer to the official documentation: Set Up With Docker.
docker pull apache/seatunnel:<version_tag>
Since we need to deploy cluster mode, the next step is to configure cluster network communication. The network service of the SeaTunnel cluster is implemented via Hazelcast, so we will configure this part next.
The Hazelcast cluster is a network formed by cluster members running Hazelcast, which automatically join together to form a cluster. This automatic joining is achieved through various discovery mechanisms used by cluster members to find each other.
Hazelcast supports the following discovery mechanisms:
In this article’s cluster deployment, we configure Hazelcast using Kubernetes auto discovery mechanism. Detailed principles can be found in the official document: Kubernetes Auto Discovery.
Hazelcast’s Kubernetes auto discovery mechanism (DNS Lookup mode) requires Kubernetes Headless Service to work. Headless Service resolves the service domain name into a list of IP addresses of all matching Pods, enabling Hazelcast cluster members to discover each other.
First, we create a Kubernetes Headless Service:
# use for hazelcast cluster join
apiVersion: v1
kind: Service
metadata:
name: seatunnel-cluster
spec:
type: ClusterIP
clusterIP: None
selector:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
ports:
- port: 5801
name: hazelcast
Key parts of the above configuration:
metadata.name: seatunnel-cluster: service name, Hazelcast clients/nodes discover cluster members through this namespec.clusterIP: None: critical configuration declaring this as Headless Service without virtual IPspec.selector: selector matching Pod labels that will be selected by this Servicespec.port: port exposed for HazelcastMeanwhile, to access the cluster externally via REST API, we define another Service for the master node Pod:
# use for access seatunnel from outside system via rest api
apiVersion: v1
kind: Service
metadata:
name: seatunnel-cluster-master
spec:
type: ClusterIP
clusterIP: None
selector:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-master
app.kubernetes.io/component: master
ports:
- port: 8080
name: "master-port"
targetPort: 8080
protocol: TCP
After defining the above Kubernetes Services, next configure hazelcast-master.yaml and hazelcast-worker.yaml files according to Hazelcast’s Kubernetes discovery mechanism.
In SeaTunnel’s separated cluster mode, all network-related configuration is contained in hazelcast-master.yaml and hazelcast-worker.yaml.
hazelcast-master.yaml example:
hazelcast:
cluster-name: seatunnel-cluster
network:
rest-api:
enabled: true
endpoint-groups:
CLUSTER_WRITE:
enabled: true
DATA:
enabled: true
join:
kubernetes:
enabled: true
service-dns: seatunnel-cluster.bigdata.svc.cluster.local
service-port: 5801
port:
auto-increment: false
port: 5801
properties:
hazelcast.invocation.max.retry.count: 20
hazelcast.tcp.join.port.try.count: 30
hazelcast.logging.type: log4j2
hazelcast.operation.generic.thread.count: 50
hazelcast.heartbeat.failuredetector.type: phi-accrual
hazelcast.heartbeat.interval.seconds: 30
hazelcast.max.no.heartbeat.seconds: 300
hazelcast.heartbeat.phiaccrual.failuredetector.threshold: 15
hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200
hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 200
Key configuration items:
${SERVICE-NAME}.${NAMESPACE}.svc.cluster.local.Using this Kubernetes join mechanism, when Hazelcast Pod starts, it resolves the service-dns to get the IP list of all member Pods (via Headless Service), and then members attempt TCP connections over port 5801.
Similarly, the hazelcast-worker.yaml configuration is:
hazelcast:
cluster-name: seatunnel-cluster
network:
rest-api:
enabled: true
endpoint-groups:
CLUSTER_WRITE:
enabled: true
DATA:
enabled: true
join:
kubernetes:
enabled: true
service-dns: seatunnel-cluster.bigdata.svc.cluster.local
service-port: 5801
port:
auto-increment: false
port: 5801
properties:
hazelcast.invocation.max.retry.count: 20
hazelcast.tcp.join.port.try.count: 30
hazelcast.logging.type: log4j2
hazelcast.operation.generic.thread.count: 50
hazelcast.heartbeat.failuredetector.type: phi-accrual
hazelcast.heartbeat.interval.seconds: 30
hazelcast.max.no.heartbeat.seconds: 300
hazelcast.heartbeat.phiaccrual.failuredetector.threshold: 15
hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200
hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 200
member-attributes:
rule:
type: string
value: worker
Through the above, we complete Hazelcast cluster member discovery configuration based on Kubernetes. Next, proceed to configure SeaTunnel engine.
The configuration related to the SeaTunnel engine is all in the seatunnel.yaml file. Below is a sample seatunnel.yaml configuration for reference:
seatunnel:
engine:
history-job-expire-minutes: 1440
backup-count: 1
queue-type: blockingqueue
print-execution-info-interval: 60
print-job-metrics-info-interval: 60
classloader-cache-mode: true
http:
enable-http: true
port: 8080
enable-dynamic-port: false
port-range: 100
slot-service:
dynamic-slot: true
checkpoint:
interval: 300000
timeout: 60000
storage:
type: hdfs
max-retained: 3
plugin-config:
namespace: /tmp/seatunnel/checkpoint_snapshot
storage.type: hdfs
fs.defaultFS: hdfs://xxx:8020 # Ensure directory has write permission
telemetry:
metric:
enabled: true
This includes the following configuration information:
history-job-expire-minutes: the retention period of task history records is 24 hours (1440 minutes), after which they will be automatically cleaned up.backup-count: 1: number of backup replicas for task state is 1.queue-type: blockingqueue: use a blocking queue to manage tasks to avoid resource exhaustion.print-execution-info-interval: 60: print task execution status every 60 seconds.print-job-metrics-info-interval: 60: output task metrics (such as throughput, latency) every 60 seconds.classloader-cache-mode: true: enable class loader caching to reduce repeated loading overhead and improve performance.dynamic-slot: true: allow dynamic adjustment of task slot quantity based on load to optimize resource utilization.checkpoint.interval: 300000: trigger checkpoint every 5 minutes.checkpoint.timeout: 60000: checkpoint timeout set to 1 minute.telemetry.metric.enabled: true: enable collection of runtime task metrics (e.g., latency, throughput) for monitoring.After completing the above workflow, the final step is to create Kubernetes YAML files for Master and Worker nodes, defining deployment-related configurations.
To decouple configuration files from the application, the above-mentioned configuration files are merged into one ConfigMap, mounted under the container's configuration path for unified management and easier updates.
Below are sample configurations for seatunnel-cluster-master.yaml and seatunnel-cluster-worker.yaml, covering ConfigMap mounting, container startup commands, and deployment resource definitions.
seatunnel-cluster-master.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: seatunnel-cluster-master
spec:
replicas: 2 # modify replicas according to your scenario
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 50%
selector:
matchLabels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-master
app.kubernetes.io/component: master
template:
metadata:
annotations:
prometheus.io/path: /hazelcast/rest/instance/metrics
prometheus.io/port: "5801"
prometheus.io/scrape: "true"
prometheus.io/role: "seatunnel-master"
labels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-master
app.kubernetes.io/component: master
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodeAffinity-key
operator: Exists
containers:
- name: seatunnel-master
image: seatunnel:2.3.10
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5801
name: hazelcast
- containerPort: 8080
name: "master-port"
command:
- /opt/seatunnel/bin/seatunnel-cluster.sh
- -r
- master
resources:
requests:
cpu: "1"
memory: 4G
volumeMounts:
- mountPath: "/opt/seatunnel/config/hazelcast-master.yaml"
name: seatunnel-configs
subPath: hazelcast-master.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-worker.yaml"
name: seatunnel-configs
subPath: hazelcast-worker.yaml
- mountPath: "/opt/seatunnel/config/seatunnel.yaml"
name: seatunnel-configs
subPath: seatunnel.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
name: seatunnel-configs
subPath: hazelcast-client.yaml
- mountPath: "/opt/seatunnel/config/log4j2_client.properties"
name: seatunnel-configs
subPath: log4j2_client.properties
- mountPath: "/opt/seatunnel/config/log4j2.properties"
name: seatunnel-configs
subPath: log4j2.properties
volumes:
- name: seatunnel-configs
configMap:
name: seatunnel-cluster-configs
Deployment Strategy
replicas=2) to ensure service high availability.maxUnavailable: 25%: ensure at least 75% of Pods are running during updates.maxSurge: 50%: temporarily allow 50% more Pods during transition for smooth upgrade.Label Selectors
spec.selector.matchLabels: defines the scope of Pods managed by the Deployment based on labelsspec.template.labels: labels assigned to new Pods to identify their metadataNode Affinity
affinity to specify which nodes the Pod should be scheduled onnodeAffinity-key with labels matching your Kubernetes environment nodesConfig File Mounting
subPath to mount individual files from ConfigMapThe seatunnel-cluster-worker.yaml configuration is:
apiVersion: apps/v1
kind: Deployment
metadata:
name: seatunnel-cluster-worker
spec:
replicas: 3 # modify replicas according to your scenario
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 50%
selector:
matchLabels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-worker
app.kubernetes.io/component: worker
template:
metadata:
annotations:
prometheus.io/path: /hazelcast/rest/instance/metrics
prometheus.io/port: "5801"
prometheus.io/scrape: "true"
prometheus.io/role: "seatunnel-worker"
labels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-worker
app.kubernetes.io/component: worker
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodeAffinity-key
operator: Exists
containers:
- name: seatunnel-worker
image: seatunnel:2.3.10
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5801
name: hazelcast
command:
- /opt/seatunnel/bin/seatunnel-cluster.sh
- -r
- worker
resources:
requests:
cpu: "1"
memory: 10G
volumeMounts:
- mountPath: "/opt/seatunnel/config/hazelcast-master.yaml"
name: seatunnel-configs
subPath: hazelcast-master.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-worker.yaml"
name: seatunnel-configs
subPath: hazelcast-worker.yaml
- mountPath: "/opt/seatunnel/config/seatunnel.yaml"
name: seatunnel-configs
subPath: seatunnel.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
name: seatunnel-configs
subPath: hazelcast-client.yaml
- mountPath: "/opt/seatunnel/config/log4j2_client.properties"
name: seatunnel-configs
subPath: log4j2_client.properties
- mountPath: "/opt/seatunnel/config/log4j2.properties"
name: seatunnel-configs
sub
subPath: log4j2.properties
volumes:
- name: seatunnel-configs
configMap:
name: seatunnel-cluster-configs
After defining the above master and worker YAML files, you can deploy them to the Kubernetes cluster by running:
kubectl apply -f seatunnel-cluster-master.yaml
kubectl apply -f seatunnel-cluster-worker.yaml
Under normal circumstances, you will see the SeaTunnel cluster running with 2 master nodes and 3 worker nodes:
$ kubectl get pods | grep seatunnel-cluster
seatunnel-cluster-master-6989898f66-6fjz8 1/1 Running 0 156m
seatunnel-cluster-master-6989898f66-hbtdn 1/1 Running 0 155m
seatunnel-cluster-worker-87fb469f7-5c96x 1/1 Running 0 156m
seatunnel-cluster-worker-87fb469f7-7kt2h 1/1 Running 0 155m
seatunnel-cluster-worker-87fb469f7-drm9r 1/1 Running 0 156m
At this point, we have successfully deployed the SeaTunnel cluster in Kubernetes using the separated cluster mode. Now that the cluster is ready, how do clients submit jobs to it?
All client configurations for SeaTunnel are located in the hazelcast-client.yaml file.
First, download the binary installation package locally on the client (which contains the bin and configdirectories), and ensure the SeaTunnel installation path is consistent with the server. This is what the official documentation refers to as: Setting the SEATUNNEL_HOME the same as the server, otherwise errors such as "cannot find connector plugin path on the server" may occur because the server-side plugin path differs from the client-side path.
Enter the installation directory and modify the config/hazelcast-client.yaml file to point to the Headless Service address created earlier:
hazelcast-client:
cluster-name: seatunnel-cluster
properties:
hazelcast.logging.type: log4j2
connection-strategy:
connection-retry:
cluster-connect-timeout-millis: 3000
network:
cluster-members:
- seatunnel-cluster.bigdata.svc.cluster.local:5801
After the client configuration is done, you can submit jobs to the cluster. There are two main ways to configure JVM options for job submission:
config/jvm_client_options file:seatunnel.sh, regardless of running in local or cluster mode. All submitted jobs will share the same JVM configuration.seatunnel.sh, you can specify JVM parameters on the command line, e.g.,sh bin/seatunnel.sh --config $SEATUNNEL_HOME/config/v2.batch.config.template -DJvmOption=-Xms2G -Xmx2G.Next, here is a sample job configuration to demonstrate submitting a job to the cluster:
env {
parallelism = 2
job.mode = "STREAMING"
checkpoint.interval = 2000
}
source {
FakeSource {
parallelism = 2
plugin_output = "fake"
row.num = 16
schema = {
fields {
name = "string"
age = "int"
}
}
}
}
sink {
Console {
}
}
Use the following command on the client to submit the job:
sh bin/seatunnel.sh --config config/v2.streaming.example.template -m cluster -n st.example.template -DJvmOption="-Xms2G -Xmx2G"
On the Master node, list running jobs with:
$ sh bin/seatunnel.sh -l
Job ID Job Name Job Status Submit Time Finished Time
------------------ ------------------- ---------- ----------------------- -----------------------
964354250769432580 st.example.template RUNNING 2025-04-15 10:39:30.588
You can see the job named st.example.template is currently in the RUNNING state. In the Worker node logs, you should observe log entries like:
2025-04-15 10:34:41,998 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=1: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : bdaUB, 110348049
2025-04-15 10:34:41,998 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=1 rowIndex=1: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : mOifY, 1974539087
2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=2: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : jKFrR, 1828047742
2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=1 rowIndex=2: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : gDiqR, 1177544796
2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=3: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : bCVxc, 909343602
...
This confirms the job has been successfully submitted to the SeaTunnel cluster and is running normally.
SeaTunnel also provides a REST API for querying job status, statistics, submitting, and stopping jobs. We configured a Headless Service for Master nodes with port 8080 exposed. This allows submitting jobs via REST API from clients.
You can submit a job by uploading the configuration file via curl:
curl 'http://seatunnel-cluster-master.bigdata.svc.cluster.local:8080/submit-job/upload' --form 'config_file=@"/opt/seatunnel/config/v2.streaming.example.template"' --form 'jobName=st.example.template'
{"jobId":"964553575034257409","jobName":"st.example.template"}
If submission succeeds, the API returns the job ID and job name as above.
To list running jobs, query:
curl 'http://seatunnel-cluster-master.bigdata.svc.cluster.local:8080/running-jobs'
[{"jobId":"964553575034257409","jobName":"st.example.template","jobStatus":"RUNNING","envOptions":{"job.mode":"STREAMING","checkpoint.interval":"2000","parallelism":"2"}, ...}]
The response shows the job status and additional metadata, confirming the REST API job submission method works correctly.
More details on the REST API can be found in the official documentation: RESTful API V2
This article focused on how to deploy SeaTunnel in Kubernetes using the recommended separated cluster mode. To summarize, the main deployment steps include:
seatunnel.yaml to set engine parameters.SEATUNNEL_HOME matches the server, and configure hazelcast-client.yaml to connect to the cluster.The configurations and cases presented here serve as references. There may be many other configuration options and details not covered. Feedback and discussions are welcome. Hope this is helpful for everyone!