Disclaimer: This blog content has been taken from my latest book:
Comparison Spring Cloud Task vs Spring Batch
- Spring Cloud Task is complimentary of Spring Batch.
- Spring Batch can be exposed as a Cloud Cloud Task.
- Spring Cloud Task makes life easy to run and Java/Spring microservice application that do not need the robustness of the Spring Batch APIs.
- Spring Cloud Task has good integration with Spring Batch and Spring Cloud Data Flow (SCDF). SCDF provides feature of batch orchestration, and UI dashboard to monitor Spring Cloud Task.
- In nutshell, all Spring Batch services can be exposed/registered as Spring Cloud Task to have better control, monitoring, and manageability.
Best practices for Spring Batch:
- Use an external file system (Volume Services) for persistence of large files with PCF/PAS due to the file system limitations. Refer to this link.
- Always use SCDF abstraction layer with UI dashboard to manage, orchestrate, and monitor Spring Batch applications.
- Always use Spring Cloud Task with Spring Batch for additional batch functionality.
- Always register and implement vanilla Spring Batch applications as Spring Cloud Task in SCDF.
- Use Spring Cloud Task when you need to run a finite workload via a simple Java micro-service.
- For High Availability (HA), implement best suited horizontal scaling technique from the top scaling techniques based on the use cases on containers (K8s).
- For large PROD system, use SCDF as an orchestration layer with Spring Cloud Task to manage large number of batches for large data sets.
- App data and batch repo should live in the same schema for transaction synchronization.
Spring Batch Auto-scaling (both vertically and horizontally)
- Vertical Scaling: No issue with that. H/w or POD size can be increased any time based on the usage of CPU and RAM for better performance and reliability. As you give the process more RAM, you can typically increase the chunk size which will typically increase overall throughput, but it doesn’t happen automatically.
- Horizontal Scaling: There are popular techniques, watch this YouTube video for detail and refer this GitHub code –
- Multi-threaded Steps – Each transaction/chunk executed by its separate threads, state is not persisted, only an option if u don’t need non-restartibility.
- Parallel steps – Multiple independent steps run in parallel via threads.
- Single JVM Async Item Writer/Item Processor. ItemProcessor calls are executed within a Java Future. The AsyncItemWriter unwrapps the result of the Future and passes it to a configured delegate to write.
- Partitioning – Data is partitioned then assigned to n workers that are being executed either within the same JVM via threads or in external JVMs launched dynamically when using Spring Cloud Task’s partition extensions. A good option when restartability is needed.
- Remote Chunking- Mostly I/O bound, sometimes when you need more processing power beyond the single JVM. It sends actual data remotely, only useful when processing is the bottleneck. Durable middleware is required for this option.
Spring Batch Orchestration and Composition
SCDF doesn’t watch the jobs. It just shares the same DB as the batch job does so you can view the results. Once a job is launched via SCDF, SCDF itself has no interaction with the job. You can compose and orchestrate jobs by drag and drop and set dependency between jobs, which jobs should run in parallel and which one in sequence, execution order can also be set for multiple jobs scheduling.
Achieve Active-Active operation for High Availability(HA) between two Data Centers/AZs
There are two standard ways:
- Place a shard Spring Batch Job repository between two active-active DC/AZs. Parallel sync happens in the job repository database. App data and batch repo should in the same schema for better synchronization as noted above. Transaction isolation level set by default, so that one of active DC can run the job and other job should be failed when it tries to re-run the same job with same parameter.
- Spring Cloud Task has this built-in functionality to restrict Spring Cloud Task Instances- https://docs.spring.io/spring-cloud-task/docs/2.2.3.RELEASE/reference/#features-single-instance-enabled
Alerts and Monitoring of Spring Cloud Task and Spring Batch
- Spring Cloud Task includes Micrometer health check and metrics API out of the box.
- Plain Prometheus is not suitable for jobs, because it uses pull mechanism and it won’t tell when job has finished or has some issues. If you want to use Prometheus for application metrics with Grafana visualization then follow this Prometheus rsocket-proxy API- https://github.com/micrometer-metrics/prometheus-rsocket-proxy
- Running Spring Batch Applications in PCF- https://dzone.com/articles/getting-started-with-kafka-2
- Batch Processing with Spring Cloud Data Flow – https://www.baeldung.com/spring-cloud-data-flow-batch-processing
- An Intro to Spring Cloud Task- https://www.baeldung.com/spring-cloud-task