April 2020 – Modern Applications with Cloudification Zone

Monitor your apps and infrastructure with WaveFront (beyond APM): Use Cases and Solutions

In this blog, I will cover a quick introduction of WaveFront/Tanzu Observability (TO) and a couple of use cases and real challenges which can be solved using this:

What is WaveFront Tanzu Observability (TO)?

Monitor full-stack applications to cloud infrastructures with metrics, traces, span logs, and analytics. It provides extra features beyond any other APM tool
https://tanzu.vmware.com/observability

WaveFront is an APM tool and provides additional features beyond APM for monitoring your modern cloud native microservice applications, infrastructure, VMs, K8s clusters, and alerting in real-time, across multi-cloud, Kubernetes clusters, and on-prem at any scale. Traditional tools and environments make it challenging and time consuming to correlate data and get visibility thru a single plane of the glass or dashboard needed to resolve incidents in seconds in critical production environment. It’s a unified solution with analytics (including AI) that ingests visualizes, and analyses metrics, traces, histograms and span logs. So you can resolve incidents faster across cloud applications.

Features:

It can work with existing monitoring solutions open-sources like Prometheus, Grafana, Graphite
It has integration almost all popular monitoring solutions on VM and containers, SpringBoot, Kubernetes, messaging platforms, RabbiMQ, Databases etc.
It monitors containers and VMs stats
It captures all microservices APIs traces, usage and performance with topology view by it’s powerful service discovery features
It maintains versions of charts and dashboards
Currently it stores and archive old monitoring data for analytics purposes

High Level Technical Architecture

WaveFront use cases:

Multicloud visibility (mostly data center, moving to public cloud)
Application monitoring (+ tooling for Dev and Ops visibility)
Service performance and reliability optimization (assess-verify)
Observability and diagnostics of multi-cloud and on-prem K8s clusters
Business service performance & KPIs
App metrics: from New Relic, Prometheus and Splunk
Multicloud metrics: from vSphere, AWS, Kubernetes
All data center metrics: from compute, network, storage
Reliability and high availability operations
App and Infrastructure monitoring , analytics dashboards
Auto alerting mechanism for any production bug or high usage of infrastructure (CPU, RAM, Storage)
Instrument and monitor your Spring Boot application in Kubernetes
Other Tanzu products monitoring
System-wide monitoring and incident response – cut MTTR
Shared visibility across biz, app, cloud/infra, device metrics
IoT optimization with automated analytics on device metrics
Microservices monitoring and troubleshooting
Accelerated anomaly detection
Visibility across Kubernetes at all levels
Solving cardinality limitations of graphite
Easy adoption across hundreds of developers
System-wide monitoring and incident response – cut MTTR
Shared visibility across biz, app, cloud/infra, device metrics
IoT optimization with automated analytics on device metrics
AWS infrastructure visibility (cost and performance)
Kubernetes monitoring
Visualizing serverless workloads
Solving Day 2 Operations for production issues and DevOps/DevSecOps
Finding hidden problems early and increase SLA for service ticket resolution
Application and microservices API monitoring
Performance analytics
Monitoring CI/CD like Jenkins Environment with Wavefront

Live WaveFront Dashboard

References

Doc: https://docs.wavefront.com/
Integrations: https://www.wavefront.com/integrations/

Generic Demo Video -1

MicroServices Observability with WaveFront Demo Video -2

Tanzu Service Mesh (TSM) based on Istio : Use Cases & Solutions

In this blog, I will cover a quick introduction of TSM and a couple of use cases and real challenges which can be solved using this :

What is Tanzu Service Mesh (TSM)?

Radically simplify the process of connecting, protecting, and monitoring your microservices across any runtime and any cloud with VMware Tanzu Service Mesh. Provide a common policy and infrastructure for your modern distributed applications and unify operations for Application Owners, DevOps/SREs and SecOps without disrupting developer workflows.
https://www.vmware.com/in/products/tanzu-service-mesh.html

Tanzu Service Mesh is K8s operator side microservice orchestration tool to manage service discovery, traffic, mTLS secure payload, rate limiting, telemetry, observability of VM, microservices and circuit breaker across multi-clouds. Open-source service mesh technologies like Istio exist to help overcome some of the challenges around building microservices such as service discovery, mutualTLS (mTLS), resiliency, and visibility. However, maintaining and managing a service mesh like Istio is challenging, especially at scale.

It provides unified management, global policies, and seamless connectivity across complex, multi-cluster mesh topologies managed by disparate teams. It provides app-level observability across services deployed to different clusters, complementing/integrating into modern observability tools you use or are considering.

TSM Global NameSpace Architecture

As of now, only this enterprise product has this powerful feature to provide a global namespace for multi K8s clusters across multi-clouds . Istio open source doesn’t provide this feature.

TSM use Cases

Service discovery for multi Kubernetes clusters in different namespaces or multi-cloud using GNS
Distributed Microservice Discovery on multi-cloud
Traffic Monitoring and API communication tracing
Logging and K8s Infra Monitoring with admin dashboard visualization
Rate Limiting with the help of Redis
Business Continuity (BI)
Developer is responsible to provide all service- related configuration thru boiler-plate code
Secure Payload
Netflix OSS APIs (Eureka service discovery, Zuul API gateway, Ribbon- Load balancing, caching etc) , Hystrix (Circuit breaker) are legacy and no enterprise support, also its tightly coupled with application development source code
Open source Istio has no enterprise support as of now
Visibility for DevOps and DevSecOps

References

Doc – https://docs.pivotal.io/pks/1-7/nsxt-service-mesh.html
Public doc- https://tanzu.vmware.com/service-mesh

Demo for Microservices:

Tanzu Mission Control (TMC) for multi-cloud: Use Cases & Solutions

In this blog, I will cover a quick introduction of TMC and a couple of use cases and real challenges which can be solved using this :

What is Tanzu Mission Control (TMC)?

Operate and secure your Kubernetes infrastructure and modern apps across teams and multi clouds (on-prem, private, public, hybrid Kubernetes clusters.
https://tanzu.vmware.com/mission-control

VMware Tanzu Mission Control provides a single control glass of plane to easily provision and manage Kubernetes clusters and operate modern, containerized applications across multiple clouds and clusters. It works as a management cluster or Kubernetes control plane which provision and manage multi-clusters worker/data nodes including deploying and upgrading clusters, setting RBAC, security and other policies and configurations, monitor the health of clusters (VMs and K8s ) and provide the root cause of underlying production issues.

TMC Use Cases

Multi-cloud management of on-prem, public, hybrid cloud
Centralized Control Plane for provisioning K8s cluster for public cloud and on-prem
Centrally operates and manages all your Kubernetes clusters and applications at scale
App and service management
Enables developers with self-service access to Kubernetes for running and deploying applications
Manage security and configuration easily and efficiently through powerful policy engine like RBAC and inspection

References

Demo Video

Scale Spring Batch, comparison with Spring Cloud Task & best practices of Spring Batch!

Disclaimer: This blog content has been taken from my latest book:

“Cloud Native Microservices with Spring and Kubernetes”

Comparison of Spring Cloud Task vs Spring Batch

Spring Cloud Task is complimentary of Spring Batch.
Spring Batch can be exposed as a Spring Cloud Task.
Spring Cloud Task makes life easy to run and Java/Spring microservice applications that do not need the robustness of the Spring Batch APIs.
Spring Cloud Task has good integration with Spring Batch and Spring Cloud Data Flow (SCDF). SCDF provides features of batch orchestration, and a UI dashboard to monitor Spring Cloud Task.
In nutshell, all Spring Batch services can be exposed/registered as Spring Cloud Task to have better control, monitoring, and manageability.

Best practices for Spring Batch:

Use an external file system (Volume Services) for the persistence of large files with Cloud Foundry (PCF)/ Tanzue Application Services (TAS) due to file system limitations. Refer to this link.
Always use SCDF abstraction layer with UI dashboard to manage, orchestrate, and monitor Spring Batch applications.
Always use Spring Cloud Task with Spring Batch for additional batch functionality.
Always register and implement vanilla Spring Batch applications as Spring Cloud Task in SCDF.
Use Spring Cloud Task when you need to run a finite workload via a simple Java micro-service.
For High Availability (HA), implement best suited horizontal scaling technique from the top scaling techniques based on the use cases on containers (K8s).
For large PROD systems, use SCDF as an orchestration layer with Spring Cloud Task to manage a large number of batches for large data sets.
App data and batch repo should live in the same schema for transaction synchronization.

Spring Batch Auto-scaling (both vertically and horizontally)

Vertical Scaling: No issue with that. H/w or POD size can be increased at any time based on the usage of CPU and RAM for better performance and reliability. As you give the process more RAM, you can typically increase the chunk size which will typically increase overall throughput, but it doesn’t happen automatically.
Horizontal Scaling: There are popular techniques, watch this YouTube video for detail and refer to this GitHub code –

Multi-threaded Steps – Each transaction/chunk is executed by its separate threads, the state is not persisted, only an option if u don’t need non-resistibility.
Parallel steps – Multiple independent steps run in parallel via threads.
Single JVM Async Item Writer/Item Processor. ItemProcessor calls are executed within a Java Future. The AsyncItemWriter unwraps the result of the Future and passes it to a configured delegate to write.
Partitioning – Data is partitioned and then assigned to n workers that are being executed either within the same JVM via threads or in external JVMs launched dynamically when using Spring Cloud Task’s partition extensions. A good option when restart ability is needed.
Remote Chunking– Mostly I/O bound, sometimes when you need more processing power beyond the single JVM. It sends actual data remotely, only useful when processing is the bottleneck. Durable middleware is required for this option.

Spring Batch Orchestration and Composition

SCDF doesn’t watch the jobs. It just shares the same DB as the batch job does so you can view the results. Once a job is launched via SCDF, SCDF itself has no interaction with the job. You can compose and orchestrate jobs by drag and drop and set dependency between jobs, which jobs should run in parallel and which ones in sequence, execution order can also be set for multiple jobs scheduling.

Achieve Active-Active operation for High Availability(HA) between two Data Centers/AZs

There are two standard ways:

Place a shard Spring Batch Job repository between two active-active DC/AZs. Parallel sync happens in the job repository database. App data and batch repo should in the same schema for better synchronization as noted above. Transaction isolation level set by default, so that one of the active DC can run the job and another job should be failed when it tries to re-run the same job with the same parameter.
Spring Cloud Task has this built-in functionality to restrict Spring Cloud Task Instances- https://docs.spring.io/spring-cloud-task/docs/2.2.3.RELEASE/reference/#features-single-instance-enabled

Alerts and Monitoring of Spring Cloud Task and Spring Batch

Spring Cloud Task includes Micrometer health check and metrics API out of the box.
Plain Prometheus is not suitable for jobs, because it uses a pull mechanism and it won’t tell when a job has finished or has some issues. If you want to use Prometheus for application metrics with Grafana visualization then follow this Prometheus rsocket-proxy API- https://github.com/micrometer-metrics/prometheus-rsocket-proxy

More References:

Running Spring Batch Applications in PCF- https://dzone.com/articles/getting-started-with-kafka-2
Batch Processing with Spring Cloud Data Flow – https://www.baeldung.com/spring-cloud-data-flow-batch-processing
An Intro to Spring Cloud Task- https://www.baeldung.com/spring-cloud-task
https://tanzu.vmware.com/developer/guides/spring-batch/
https://tanzu.vmware.com/developer/tv/spring-live/0007/

What is WaveFront Tanzu Observability (TO)?

Features:

High Level Technical Architecture

WaveFront use cases:

Live WaveFront Dashboard

References

Generic Demo Video -1

MicroServices Observability with WaveFront Demo Video -2

Share this:

What is Tanzu Service Mesh (TSM)?

TSM Global NameSpace Architecture

TSM use Cases

References

Demo for Microservices:

Share this:

What is Tanzu Mission Control (TMC)?

TMC Use Cases

References

Demo Video

Share this:

Comparison of Spring Cloud Task vs Spring Batch

Best practices for Spring Batch:

Spring Batch Auto-scaling (both vertically and horizontally)

Spring Batch Orchestration and Composition

Achieve Active-Active operation for High Availability(HA) between two Data Centers/AZs

Alerts and Monitoring of Spring Cloud Task and Spring Batch

More References:

Share this: