Seven Models of Cloud Native Applications

Introduction

In today’s cloud-driven landscape, organizations are transitioning from legacy monolithic systems to agile, scalable, and secure cloud-native solutions. Some are even forging new cloud-native applications. However, the concept of cloud-native design remains subjective, lacking a universal blueprint. This blog aims to provide clarity and guidance for designing precise cloud-native applications and container deployment. It addresses the intricacies of end-to-end cloud development, encompassing architecture, development, testing, deployment, security, and observability.

Traditionally, separate development teams handle these aspects in isolation. This blog bridges these gaps and outlines seven practical models for standardizing cloud-native architecture, drawing from real-world experience in cloud-native application design and development.

Seven Models of Cloud Native Design

Cloud-native entails developing microservices/micro-frontend apps and deploying them within containers on private, public, or hybrid cloud platforms. These platforms autonomously manage, automate, orchestrate, and secure these applications and their data. Container orchestration engines handle most cross-cutting concerns. This blog outlines key approaches to creating and deploying modern cloud-native apps, emphasizing performance optimization and cost efficiency. These apps leverage cloud-managed SaaS and automatically deploy new source code changes on cloud container platforms. We will now briefly explore these seven models and their business-value components

1.   Modern Design & Development Model

ComponentsBusiness Value/ROI
Beyond 12 factor principles It’s a set of principles widely adopted for cloud-native applications and dev teams offering usability, agility, scalability, modularity, and security. It saves operational costs and improves developer productivity. It’s a set of abstractions of cross-cutting concerns or non-functional requirements (NFR). These are 12+3 factor principles for modern cloud-native apps, where 3 important principles are recent additions. We call it beyond 12-factor principles. One codebase, one application: A single code repo for a single responsibility should exist. Every microservice should have its code repo.API first: New cloud-native app development should start from designing API first.Dependency management: Explicitly declare and isolate dependencies. All dependencies should be declared without implicit reliance on system tools or libraries.Design, build, release, and run: The delivery pipeline should strictly consist of build, release, run. Configuration, credentials, and code: Configuration that varies between deployments should be stored in the environment.Logs: Applications should produce logs as event streams and leave the execution environment to aggregate.Disposability: Fast startup and shutdown are advocated for a more robust and resilient system.Backing services: All backing services are treated as attached resources and attached and detached by the execution environment.Environment parity: All environments should be as similar as possible.Administrative processes: Any needed admin tasks should be kept in source control and packaged with the application.Port binding: Self-contained services should make themselves available to other services by specified ports.Stateless processes: Applications should be deployed as one or more stateless processes with persisted data stored on a backing service.Concurrency: Concurrency is advocated by scaling individual processes.Telemetry: Add observability and monitoring.Authentication and authorization (A&A): Provide proper IAM support for user and application to application security.
Domain Driven Design (DDD)It’s a design pattern that helps identify separate business use case domains and microservices. The best use case is to migrate legacy monolithic apps to modern micro-services and micro-frontends. Example: Catalog, order, payment services.
API DrivenIt’s a method for API design that prioritizes business logic or services ahead of development, promoting service-to-service communication via API interfaces. Cloud-native apps utilize this approach, managing API endpoints with tools like GCP Apigee, Spring Cloud Gateway, and more.
Microservices Design It’s a set of principles widely adopted for cloud-native applications and dev teams offering usability, agility, scalability, modularity, and security. It saves operational costs and improves developer productivity. It’s a set of abstractions of cross-cutting concerns or non-functional requirements (NFR). These are 12+3 factor principles for modern cloud-native apps, where 3 important principles are recent additions. We call it beyond 12-factor principles. One codebase, one application: A single code repo for a single responsibility should exist. Every microservice should have its code repo.API first: New cloud-native app development should start by designing API first.Dependency management: Explicitly declare and isolate dependencies. All dependencies should be declared without implicit reliance on system tools or libraries. Design, build, release, and run: The delivery pipeline should strictly consist of build, release, and run. Configuration, credentials, and code: Configuration that varies between deployments should be stored in the environment.Logs: Applications should produce logs as event streams and leave the execution environment to aggregate.Disposability: Fast startup and shutdown are advocated for a more robust and resilient system.Backing services: All backing services are treated as attached resources and attached and detached by the execution environment.Environment parity: All environments should be as similar as possible.Administrative processes: Any needed admin tasks should be kept in source control and packaged with the application.Port binding: Self-contained services should make themselves available to other services by specified ports.Stateless processes: Applications should be deployed as one or more stateless processes with persisted data stored on a backing service.Concurrency: Concurrency is advocated by scaling individual processes.Telemetry: Add observability and monitoring.Authentication and authorization (A&A): Provide proper IAM support for user and application to application security.
Micro-Frontends DesignIt’s a frontend application architecture where a big UI app is decomposed into smaller UI apps developed by separate dev teams. These micro UI apps can be deployed and managed independently. They can also be divided based on business use cases.
WebAssembly (Wasm)It’s a next-generation UI platform. It’s an enhancement over Java script, which makes Java script code compilation faster and better performance; both are companions as of now. It’s a binary instruction format, which is lighter to compile and understandable by browser. It also has a lighter payload to perform faster on web browsers. 
Modern DatabasesIn the past, we only had traditional SQL databases as our option. But today, we have a wide variety of modern databases such as NoSQL, which are suitable for various purposes. Additionally, there are numerous companies, both in the public cloud sector and as independent software providers, that offer Database as a Service (DBaaS) through Software as a Service (SaaS) platforms. This means they make databases available to applications through APIs, and they handle the management of these databases on their cloud infrastructure, offering them to clients through subscription-based services.
Event-Driven DesignIt’s a backend application architecture where a small application is developed as a collection of services based on a specific business domain (usually derived by DDD). It provides a framework/guidelines to develop, deploy, and manage cloud-native apps.
Distributed Caching DesignItโ€™s the latest de facto standard for microservices communications. In this design, microservices connect with each other using a message broker on every event. Microservices can publish/consume messages to topics. It’s an asynchronous way of communication, which provides a lot of benefits like agility, high transactions, high availability,cut costs, reliability, and decoupling.

2.   Modern Infrastructure/DevOps – CI/CD

ComponentsBusiness Value/ROI
DevSecOpsIt’s an advanced DevOps concept for development, security, and operations. It provides tools and practices to secure data, code, and containers during the CI/CD process. It covers scanning source code for vulnerabilities, early threat/malware detection/prevention, security design audit review, static code analysis, container docker image, payload, database security, etc.  
Immutable InfrastructureThis kind of infra is never modified once it’s deployed on the cloud. A new infra has to be deployed for any new change, and the older one must be retired. It reduces operational complexities, debugging time and improves security. No patching or backward compatibility is needed.
Service MeshIt’s a dedicated infra layer that controls and manages cross-cutting concerns out of the box like service discovery, API tracing, observability, microservices internal east-west communication, circuit-breaker/failure recovery, load balancing, traffic management, mTLS payload security, A&A, etc. It helps to move/extract cross-cutting configurations from business logic source code. It also moves the responsibility of common configurations from the business code developer to the DevOps developer/team.
Declarative API (IaaC)It’s a very powerful modern way of managing infra as a code. It’s a desired state system automatically managed by the DevOps system. We need to tell the system,” Please make sure that the state I provide will be there,” without manual intervention. It’s an intelligent way of managing infra by Kubernetes and Terraform apps.
Platform as a Service (PaaS)It’s a cloud computing model that provides a ready-made development platform on top of infrastructure to write direct code and deploy without an understanding of cloud-config complexities for the developer. It gives a developer-friendly environment to build and deploy code on the cloud without any help from the DevOps team. It improves developer productivity.

3.   Build & Deployment Model

ComponentsBusiness Value/ROI
ServerlessIt’s a pure cloud-native development model which provides ready-to-use infra on-demand for app deployment. It saves a lot of costs because it is a spin-up based on on-demand events. Cloud providers manage the infra of serverless servers. They automatically scale up/down based on traffic usage. It works on an event-driven model.
GitOpsAn infra-operational framework where the Git source code repository is integrated with CI/CD DevOps pipeline. It automatically triggers when any commit change happens in the Git repo. It provides many benefits like security, compliance, lesser complexity to create/update Kubernetes config script, improved developer productivity, automation, reliability, and faster development. It provides a self-managed declarative infrastructure.

4.   Cloud Observability

ComponentsBusiness Value/ROI
TracingIt tracks microservices API interactions and shows interaction and response time with request and response data. It also helps to find out the performance of primary API and buggy API. Apps send app usage metrics details, which APM and other observability tools read. Those tools visualize, generate reports, and send preventive notifications.
Performance MonitoringApplication monitoring measures application performance, availability, and user experience and uses this data to identify and resolve application issues before they impact customers. APM and performance testing tools do it. Infra can be scaled based on SLA or API based on these reports.

5.   4C’s of Cloud Security

ComponentsBusiness Value/ROI
Container SecurityThis practice ensures that the container where the app is deployed is also secured. Itโ€™s a policy to secure potential security vulnerabilities. Container security tools usually protect it. 
Cluster SecurityPractice to secure container orchestration cluster components and apps running on that cluster.
Cloud SecurityIt comprises the security of data center and availability zones servers in cloud environments. If the Cloud layer is vulnerable (or configured in a vulnerable way), then there is no guarantee that the components built on top of this base are secure. Public service providers have a lot of security services like DDOS.
Container Image SecurityDocker images are stored in container repositories. These images must be scanned for any security vulnerabilities. Many tools are available with container repo which continuously scan updated images.
Endpoint SecurityItโ€™s a cyber security approach to defend end-consuming endpoints such as laptops, desktops, and mobile devices. An endpoint is any device that connects to the corporate network from outside its firewall. An endpoint security strategy is essential because every remote endpoint can be the entry point for an attack, and the number of endpoints is only increasing with the rapid pandemic-related shift to remote work.

6.   Cloud Platforms

ComponentsBusiness Value/ROI
PrivateItโ€™s managed by either on-prem (inside organization) or private instances isolated with physical secure boundaries on public cloud service providers.
PublicThese physical servers are shared with multi-tenants/organizations, provided mainly by the third-party vendor as a SaaS solution.
HybridIt’s a combination of private and public clouds. Sometimes, organizations keep their databases and other secure information on-prem and host applications and other services on the public cloud. It’s the most sustainable model for cloud migration. Most of the organizations, around 65%, prefer hybrid models. Public cloud service providers provide hybrid cloud orchestration tools to manage multi-cloud. Sometimes, when many public clouds are combined, they are also called hybrid clouds.

7.   Automation

ComponentsBusiness Value/ROI
BDDBehavioral Driven Design framework is based on a given-when-then model. It mainly focuses on the behavior of the product and user acceptance criteria. It provides simple English like the Gherkin language.
Chaos EngineeringIt’s a method of testing distributed microservices/micro-front-end apps deployed on a cloud that deliberately introduces failure and faulty scenarios to verify its resilience in the face of random disruptions. These disruptions can cause applications to respond unpredictably and break under pressure. Chaos engineers detect those issues. It’s a must for any true cloud-native app.

Conclusion

Every application has different types and needs. The cloud-native definition is different for other apps and organizations. This same “Seven model” can’t fit on all cloud-native applications architecture. They are sometimes driven by business units, technology compliance, cost, and operational overhead.


References:

Demystified Service Mesh Capabilities for Developers

Service Meshes have been gaining a lot of popularity lately, more so amongst Spring and Java developers who wish to address cross-cutting concerns. But, are you wondering what exactly are Service Meshes? What are some of the popular types out there? And most importantly, what kind of problems do they actually solve? Well, look no further! This blog is here to provide you with the answers you seek.

What is a Service Mesh?

A service mesh is a dedicated infrastructure layer that helps manage communication between the various microservices within a distributed application. It acts as a transparent and decentralized network of proxies that are deployed alongside the application services. These proxies, often referred to as sidecars, handle service-to-service communication, providing essential features such as service discovery, load balancing, traffic routing, authentication, and observability.

By abstracting away the complexity of network communication, a service mesh enables developers to focus on application logic rather than dealing with the intricacies of networking code. It provides a consistent and flexible way to handle cross-service communication and allows for the implementation of advanced traffic management strategies, security policies, and observability mechanisms.

They provide a standardized approach to managing microservices communication, making it easier to monitor, secure, and control traffic within complex distributed systems.

Components of a Service Mesh

Service mesh architecture typically involves the following components and their interactions:

Data Plane: The data plane refers to a network of sidecar proxies deployed along with each service instance, so that it can communicate with the other services in the system. It acts as an intermediary between the service and the rest of the network. Sidecar proxies handle inbound and outbound traffic, intercepting communication and providing additional features.

  1. Sidecar: Itโ€™s based on Envoy proxy. It’s another container which runs in the same Kubernetes POD and takes care of all cross-cutting concerns. Itโ€™s based on the sidecar container design pattern.
  2. Application Traffic: Microservices connect through other microservices using sidecar containers. Application traffic is basically communication between Envoy sidecar proxy containers.
  3. Namespace: Itโ€™s an isolated space on a Kubernetes POD where the both containers (sidecar and microservices app) run in parallel.

Control Plane: The control plane is the centralized management and configuration layer of the service mesh. It is responsible for controlling and coordinating the behavior of the sidecar proxies. It provides a control plane API that allows administrators to configure policies, rules, and settings for traffic management, security, and observability.

  1. API Endpoints: API endpoints are the entry points through which services within the mesh can communicate with each other
  2. Controllers: A controller is a component responsible for managing and controlling the behavior of the mesh. It is typically a software component that monitors the state and health of services, configures traffic routing and load balancing rules, enforces security policies, and handles other aspects of service-to-service communication within the mesh.
  3. Service Discovery: Service discovery is an essential component in service mesh architecture. It enables services to dynamically locate and connect with each other without hard-coded addresses.
  4. Certificate Authority: It provides and manages root and intermediate certificates and performs certificate signing operations. 

Application Microservices: These are the individual services or microservices that make up the application. They are responsible for handling specific functions or tasks.

Use Case: E-commerce Application

Consider an e-commerce application use case, a service mesh would help manage the complex network of microservices responsible for different functions, such as inventory management, order processing, payment processing, and shipping. 

  • The sidecar proxies would handle load balancing, ensuring that traffic is distributed efficiently across multiple instances of each service.
  • Additionally, the service mesh would provide secure communication between services by enforcing encryption and authentication using TLS. This would help protect sensitive customer information during transmission and prevent unauthorized access to critical services.
  • Traffic management features would allow operators to control and monitor the flow of requests, enabling them to perform tasks like routing certain requests to a newer version of a service for testing purposes or limiting the rate of incoming requests to prevent overloading.
  • The observability and monitoring capabilities of the service mesh would provide operators with real-time insights into the application’s performance, enabling them to identify and resolve issues promptly.
  • They could analyze metrics, logs, and traces to optimize the application’s performance, troubleshoot problems, and ensure a smooth customer experience.

Overall, a service mesh simplifies the management and enhances the resilience, security, and observability of a distributed application, making it an essential component in modern microservices architectures.

What problems do Service Meshes solve?

Service mesh solves several problems in the context of modern application architectures. Here are some of the key problems that service mesh addresses:

  1. Service-to-service communication: In a microservices architecture, applications are composed of multiple independent services that need to communicate with each other. Service mesh provides a dedicated infrastructure layer to handle service-to-service communication, making it easier to manage and secure these interactions.
  2. Service discovery and load balancing: As the number of services increases, it becomes challenging to keep track of their locations and distribute traffic efficiently. Service mesh offers service discovery and load balancing capabilities, allowing services to discover and connect to each other dynamically while automatically distributing the traffic load across multiple instances.
  3. Traffic management and routing: Service mesh enables sophisticated traffic management and routing features, such as request routing based on service version, path, headers, or other attributes. It allows for traffic shifting, canary deployments, and A/B testing, empowering teams to implement complex deployment strategies with ease.
  4. Resilience and fault tolerance: Service mesh provides mechanisms for implementing resilience and fault tolerance patterns, such as retries, timeouts, circuit breaking, and load shedding. These features help services handle failures gracefully, isolate issues, and prevent cascading failures across the system.
  5. Observability and Debugging: Service mesh provides developers with powerful observability features such as distributed tracing, metrics collection, and logging. These capabilities help developers gain insights into the behavior and performance of their services, allowing them to debug issues, trace requests across service boundaries, and optimize the performance of their applications.
  6. Security and authentication: Service mesh strengthens the security of microservices architectures by providing features like transport-level encryption (TLS), mutual authentication, and authorization policies. It allows for fine-grained access control and identity management, enhancing the overall security posture of the system.
  7. Tight coupling of source code: Cloud configuration always comes with tight coupling with business logic source code, which makes it code-heavy to manage and debug for any code issues. This can make the process of adding new business features, inserting additional code, and resolving issues a cumbersome task. However, adopting a service mesh architecture allows for the segregation of cross-cutting concerns from the business logic source code. By employing this approach, the service mesh effectively handles all application configurations independently through the collaboration of DevOps platform/infrastructure teams.
  8. Testing overhead of cross-cutting configuration concerns: Testing new features, during integration, regression, and load testing for feature releases, necessitates additional testing effort. It is crucial to test the entire codebase, including the cross-cutting configuration code, even for minor changes in the business logic. By adopting a service mesh approach, the business logic code becomes more concise and streamlined, resulting in easier and faster testing. Furthermore, developers find it simpler to write fewer JUnit and integration test cases.
  9. Application performance issue: When business logic and cross-cutting configuration are combined, they need extra time to load, deploy, and run on app containers. It consumes extra CPU and RAM for even business-specific API calls, which can cause performance issues. In contrast, a service mesh utilizes a separate side-car container dedicated to running the cross-cutting concerns configuration code. This alleviates the load on the main application container, resulting in improved app performance. By running only the streamlined application business logic, the performance is enhanced.

What key features should you look for when selecting a Service Mesh?

  • Connect Kubernetes clusters: It provides connectivity between two or more Kubernetes clusters if it’s used with hybrid cloud technologies like Google Anthos, Azure Arc, AWS Outpost, VMware Tanzu Mission Control (TMC), etc. It could spread across on-premises, private, and public cloud providers.
  • Service discovery with the Ingress Controller and Ingress resources: It provides dynamic service discovery and routing to distributed microservice REST APIs across K8s clusters on multiple clouds with different dynamic IP addresses. It exposes the service by its service name through the Ingress Controller and Ingress resources, which can be used by any client or consumer. The ingress resource provides routing details to various services, and the ingress controller routes incoming requests to the API using the ingress resource.
  • Circuit breaker resiliency: A circuit breaker provides a retry mechanism if dependent services are not responding to the first attempt. A service mesh provides a powerful feature of the circuit breaker when a dependent service does not respond within a given ETA. Because of this, microservices are more resilient to downtime since a service mesh can reroute requests away from failed services using this mechanism.
  • API Tracing between microservices: It provides the API Tracing (API to API interactions) feature of microservices, which traces request and response interaction logs. This tracing helps improve the performance of API and SLA. It helps developers debug and diagnose bugs.
  • Observability: It provides a powerful mechanism to check application health and infra resources like CPU and memory usage. Also, it collects application performance matrices and visualizes them on the web dashboard. Performance metrics can suggest ways to optimize communication in the runtime environment. Also, monitor infrastructure and application monitoring.
  • Data Payload Security: It provides data encryption in transit between microservice API communications by applying two-way strong mTLS security encryption technology.
  • API Rate Limiting: It provides a mechanism to restrict the number of backend API calls and prevent distributed denial-of-service (DOS/DDOS) attackers where thousands or even millions of requests hit backend APIs randomly and crash the entire backend software system and infrastructure.โ€
  • Load balancing: It provides load balancing by using its in-built ingress controller mechanism to expose microservices on Kubernetes clusters as external services exposed through the ingress controller load balancer. Ingress control can map and route client requests to distributed microservices based on ingress resources.

Popular Service Meshes

Istio (OSS)

Istio is an open-source service mesh platform that provides a set of tools and capabilities for managing and securing microservices-based applications. It aims to address common challenges associated with service-to-service communication, observability, security, and traffic management in complex distributed systems. At its core, Istio deploys a sidecar proxy, called Envoy, alongside each microservice in the application. This sidecar proxy intercepts and manages all inbound and outbound traffic for the service, allowing Istio to control and monitor the communication between services.

Advantages:

  • Istio boasts one of the largest communities for online service mesh and is highly acclaimed and discussed on the internet. Its GitHub contributors far outnumber those of Linkerd by a significant margin. 
  • Furthermore, it offers support for both Kubernetes and VM modes.

Drawbacks:

  • Istio comes with a cost as it is not available for free. It demands a considerable time investment in terms of reading the documentation, setting it up, ensuring proper functionality, and ongoing maintenance. 
  • The implementation and integration of Istio into production can range from several weeks to several months, depending on the complexity of the infrastructure.
  • Using Istio requires a significant amount of resource overhead. 
  • Unlike Linkerd, it lacks a built-in administrative dashboard. 
  • Additionally, Istio mandates the use of its own ingress gateway. 
  • The Istio control plane is exclusively supported within Kubernetes containers, meaning there is no VM mode available for the Istio data plane.

Linkerd

Linkerd is an open-source service mesh platform designed to provide observability, reliability, and security to microservices architectures. It is developed by the Cloud Native Computing Foundation (CNCF) and focuses on simplicity, performance, and ease of use.

Advantages

  • Linkerd leverages the expertise of its creators, who are former Twitter engineers with experience in developing the internal tool, Finagle. They gained valuable insights from working on Linkerd v1, which contributes to the refinement of the service mesh. 
  • Being one of the pioneering service meshes, Linkerd enjoys an active and vibrant community, boasting more than 5,000 users on Slack, along with an engaged mailing list and Discord server. 
  • The availability of comprehensive documentation and tutorials further enhances its appeal.
  • Linkerd has reached a level of maturity with the release of version 2.9, which is evident from its adoption by prominent corporations such as Nordstrom, eBay, Strava, Expedia, and Subspace. 
  • Additionally, Linkerd offers paid enterprise-grade support through Buoyant, ensuring professional assistance is readily available.

Drawbacks

  • Using Linkerd service meshes to their full potential requires a significant learning curve. It is important to note that Linkerd is exclusively supported within Kubernetes containers and does not offer a VM-based or “universal” mode. 
  • Unlike Envoy, the Linkerd sidecar proxy differs, providing Buoyant the flexibility to optimize it according to their requirements. However, this customization comes at the expense of losing the inherent extensibility offered by Envoy. 
  • Consequently, Linkerd lacks support for essential features such as circuit breaking, delay injection, and rate limiting. Additionally, there is no straightforward API exposed for easy control of the Linkerd control plane, although a gRPC API binding can be found.

In case you wish to read more about the above service meshes comparison and what more they have to offer, you can read all about it here.

Thatโ€™s not it, there many many options in the market for you to choose from like:

Conclusion

Service mesh technology is a boon for developers. It increases developer productivity by delegating cross-cutting concerns from application source code to in-house DevSecOps. Service Mesh provides a ton of more features to solve developer challenges and increase developer productivity. Itโ€™s now a de facto standard for managing cross-cutting configuration code for cloud-native microservice apps on Kubernetes.

Introduction to Automation Testing Strategies For Microservices

Early end-to-end (E2E) testing of microservices helps you identify bugs early in your software development process. Exploring the testing triangle, challenges, and solutions for microservices testing.

Microservices are distributed applications deployed in different environments and could be developed in different programming languages having different databases with too many internal and external communications. Microservice architecture is dependent on multiple interdependent applications for its end-to-end functionalities. This complex microservices architecture requires a systematic testing strategy to ensure end-to-end (E2E) testing for any given use case. In this blog, we will discuss some of the most adopted automation testing strategies for microservices and to do that we will use the testing triangle approach.

Testing Triangle

Itโ€™s a modern way of testing microservices with a bottom-up approach, which is also part of the โ€œShift-leftโ€ testing methodology (The โ€œshift-leftโ€ testing method pushes testing towards the early stages of software development. By testing early and often, you can reduce the number of bugs and increase the code quality.). The goal of having multiple stacked layers of the following test pyramid for microservices, is to identify different types of issues at the beginning of testing levels. So, in the end, you will have very few production issues. Each type of testing focuses on a different layer of the overall software system and verifies expected results. For a distributed microservices app, the tests can be organized into the following layers using a bottom-up approach:

DiagramDescription automatically generated

Testing triangle is based on these five principles:

Unit testing (Level 1)

It’s the starting point and level 1 white box testing in the bottom-up approach. Furthermore, it tests a small unit of source code functionality of microservices and verifies the behavior of source code methods or functions inside a microservice by stubbing and mocking dependent modules and test data. Application developers write unit test cases for a small unit of code (independent functions/methods) using different test data and analyzing expected output independently without impacting other parts of the code. It’s a vital part of the โ€œshift-leftโ€ testing approach, where issues are identified in the starting phase at method level of microservices. This testing should be done thoroughly with code coverage of more than ~90%. It will reduce the chances of potential bugs in the later phases.

Component testing (Level 2)

Itโ€™s the level 2 testing of the Testing Triangle that follows unit testing. This testing aims to test entire microservices functionalities and APIs independently in isolation for individual microservice. By writing component tests for a highly granular microservices layer, the API behavior is driven through tests from the client or consumer perspective. Component tests will test the interaction between microservice APIs/services with the database, messaging queues, and external, and third-party outbound services all as one unit.

It tests a small part of the entire system. In component testing, dependent microservices and database responses are mocked or stubbed. In this testing approach, all microservices APIs are tested with multiple sets of test data.

Contract testing (Level 3)

Itโ€™s the level 3 testing approach that verifies agreed contracts between different domain-driven microservices. There are contracts defined before the development of microservices in the API/interface, designing what should be the response for the given client request or query. If any changes happen, then the contract has to be revisited and revised. For example, if any new feature changes are deployed, then they must be exposed with a separate version /v2 API request, and we need to make sure that the older /v1 version still supports client requests for backward compatibility.

It tests a small part of the integration, like:

  • Between microservice to its connected databases.
  • API calls between two microservices.

Integration testing (Level 4)

Itโ€™s level 4 testing which verifies end-to-end functionality. It is the next level of contract testing, where integration testing is used to test and verify an entire functionality by testing all related microservices.

According to Martin Fowler, an integration test exercises communication paths through the subsystem to check for any incorrect assumptions, each module has about how to interact with its peers.

It tests a bigger part of the system, mostly the microservices exposing their services with API. For example:

  • Login functionality, which involves multiple microservices interactions.
  • It tests interactions for microservices API and event-driven hub components for a given functionality.

End-to-End (E2E) testing (Level 5)

It’s the final and the level 5 testing approach in the Testing Triangle and it is an end-to-end usability black box testing. It verifies that the entire system as a whole meets business functional goals from a user or a customer or client’s prospective. E2E testing is performed on the external front-end (user interface (UI)) or API client calls with the help of the REST clients. Itโ€™s performed on different distributed microservices and SPA (Single Page Apps)/MFE (Micro Front ends) applications. It covers testing of UI, backend microservices, databases, and their internal/external components.

Challenges of Microservice Testing

Many organizations have already adopted digital transformation which uses microservice architecture. IT organizations find it challenging to test microservices applications because of its distributed nature. We will discuss the important challenges and solutions offered by some of the industry experts:

  • Multiple agile microservices teams: Inter-communication between multiple agile microservices dev and test teams is really time taking and difficult. Sometimes, teams work in silos, not sharing enough technical/non-technical details which causes communication gaps.

    Solutions: Testing triangleโ€™s integration and E2E testing can help address the above challenge by testing dependent microservices which are developed by different dev teams.
  • Microservice integration testing-related challenges: Testing of all microservices does not happen parallelly. End-to-end integration testing of inter-dependent microservices is a nightmare in reality, these microservices might not be ready for testing in a test environment. Every microservice will have its own security mechanism and test data. It’s a daunting task to find failover of other microservices when they are dependent on each other.

    Solutions: Testing triangleโ€™s integration testing helps here by testing dependent microservices APIs.
  • Business requirement and design change challenges: Frequent changes in business and technical requirements in the agile development methodology, leads to increased complexity and testing effort. It increases development and testing costs.

    Solutions: Testing triangle provides an effective systematic step-by-step process that reduces complexities, operational cost, and testing effort by full automation testing.
  • Test database challenges: Databases have different types (SQL/NoSQL like Redis, MongoDB, Cassandra, etc.) which have different structures. These structured and unstructured data types can be combined to meet particular business needs. Every database has a different type of test data in distributed microservices development. Itโ€™s daunting to maintain different kinds of test data for different databases.

    Solutions: Testing triangle provides automated BDD (Behavioral Driven Design) where we can pass dynamic test data; and TDM (Test Data Management) method which solves test database challenges by managing different kinds/formats of test data.

Conclusion

Testing triangle provides great testing techniques to solve challenges associated with microservices. We need to choose these systematic testing techniques with a perspective on lower complexity, faster testing, time to market, testing cost, and risk mitigation before releasing to production. This testing strategy is required for microservices to avoid real production issues. This ensures that test cases should cover end-to-end functional and non-functional E2E testing for UI, backend, databases, and different PROD and Non-PROD staging environments for reliable product releases.

We have seen microservices introduce many testing challenges which can be solved with step by step (down to top) approach provided by testing triangle techniques.

Itโ€™s a modern cloud-native testing strategy to test microservices on the cloud. It finds and fixes maximum bugs during the testing phase until it reaches the highest level (topmost level in the triangle), which is E2E testing.

Tips: Many IT organizations have started following a โ€œShift-leftโ€ culture and have started using a testing culture, especially in situations where identifying and fixing bugs early is important.

Kubernetes alternatives to Spring Java framework

Spring Cloud and Kubernetes both complement each other to build a cloud-native platform and run microservices on the Kubernetes containers. Kubernetes provides many features which are similar to Spring Cloud and Spring Config Server features.

Spring framework has been around for many years. Even today, many organizations prefer to go with Spring because it provides many advanced features with simple ready-to-use libraries. It’s a great deal when Spring developers will only take care of business logic source code and configuration code is managed by DevOps/DevSecOps operation teams or automated CI/CD tools.

Important Note about Netflix OSS: Starting from the Spring Cloud Greenwich release Train, Netflix OSS, Hystrix, Ribbon, and Zuul are entering into maintenance mode and are now deprecated. This means that there won’t be any new features added to these modules, and the Spring Cloud team will only fix bugs and security issues. The maintenance mode does not include the Eureka module. Spring provides regular releases and patches for its libraries; however, Netflix OSS is almost not active and is not being used by organizations.

Letโ€™s discuss a couple of challenges of cloud configuration code with Spring Cloud and Spring Config Server for microservices architecture:

  • Tight coupling of business logic and configuration source code: Spring configuration provides tight coupling with business logic code, which makes the code-heavy and also makes it difficult to debug production issues. It slows down releases for new business features due to the tight integration of business logic with cross-cutting configuration source code.
  • Extra coding and testing effort: For new feature releases one extra testing effort is required to test new features mainly during integration, regression, and load testing. We need to test the entire code with cross-cutting configurations even for minor code changes in the business logic.
  • Slow build and deployment: It takes extra time to load, deploy, and run heavy code because of the strong bonding of configuration and business logic. It consumes extra CPU and RAM for all business-specific API calls.

Spring doesnโ€™t provide these important features:

  • Continuous Integration (CI): It doesnโ€™t address any CI-related concerns. It only handles the build microservices part.
  • Self-healing of infrastructure: It doesnโ€™t care about self-healing and restarting apps for any crashes. It provides health check APIs and observability features using actuator/micrometer support with Prometheus.
  • Dependency on Java framework: It only supports Java programming language.

Kubernetes alternatives of Spring Cloud

Here are a few better alternatives of Kubernetes for Spring libraries:

Spring CloudKubernetes
Service discoveryK8s provides Cluster API, a service that exposes microservices across all namespaces since “kube-dns” allows lookup. It also provides integration with the ingress controller and K8s ingress resources to intelligently route incoming traffic to designated service.K8s provides ConfigMap and secret to externalize configuration at the infra side natively which is maintained by the DevOps team.
Load balancingNetflix Ribbon provides client-side load balancing on HTTP/TCP requests.K8s provides load balancer services. It’s the responsibility of K8s service to load balance.
Configuration managementSpring Config Server externalizes configuration management through code configuration.K8s services and ingress resources fulfill partial API gateways features like routing and load balancing. K8s supports service mesh architecture implementation tools like Istio, which provides most of the API gateway-related features like service discovery, and API tracing. Itโ€™s not a replacement for the external API gateway.
API GatewaySpring Cloud Gateway and Zuul2 provide all API gateway features like request routing, caching, authentication, authorization, API level load balancing, rate limiting, circuit breaker, and so on.K8s provides the same features with health checks, resource isolation, and service mesh.
Resilience and fault toleranceSpring Boot Admin supports scaling and self-healing of applications. Itโ€™s used for managing and monitoring Spring Boot applications. Each application is considered a client and registers to the admin server. Spring Boot Actuator endpoints help to monitor the environment.Resilence4j, and Spring Retry projects provide resiliency and fault tolerance mechanisms. They provide circuit breaker, timeout, and retry features.
Scaling and self-healingSpring Batch, Spring Cloud Task, and Spring Cloud Data Flow (SCDF) have capabilities to schedule/on-demand and run batch jobs. Spring tasks can run short-living jobs. A short-lived task could be a Java process or a shell script.Netflix Eureka. Not recommended to use for cloud-native modern applications.
Batch jobsSpring Batch, Spring Cloud Task, and Spring Cloud Data Flow (SCDF) have capabilities to schedule/on-demand and run batch jobs. Spring tasks can run short-living jobs. A short-lived task could be a Java process, a shell script.K8s also provides scheduled Cron job features. It executes batch jobs and provides limited scheduling features. It also works together with Spring Batch.

Conclusion

Spring provides tons of features and has had a proven Java-based framework for many years! Kubernetes provides complimentary features which are comparable with Spring features and can be replaced to extract configuration code from the business logic. Cloud-native microservices’ service architecture (MSA) and 12/15 factor principles recommend keeping cross-cutting configuration code outside of the business logic code. Configuration should be stored and managed separately. In MSA, the same configuration can be shared across many microservices, that’s why configuration should be stored externally and be available for all microservices applications. Also, these configurations should be managed by DevOps teams.

This helps developers to focus only on business logic programming. It will definitely make release faster with lower development costs. Also, building and deployment will be faster for microservices apps. Kubernetes provides better alternatives to replace these legacy Spring libraries features, many of them are deprecated or in the maintenance phase. Kubernetes also provides Service Mesh support.

These Kubernetes alternatives are really helpful for microservices applications and complimentary to the Spring Java framework for microservices development!

Cloud Distributed Caching for Microservices

Distributed caching is a very important aspect of cloud-based applications, be it for on-prem, public, or hybrid cloud environments. It facilitates incremental scaling allowing the cache to grow and incorporate the data growth. In this blog we will explore distributed caching on the cloud and why it is useful for environments with high data volume and load. This blog will cover,

  • Challenges with Traditional Caching 
  • What is Distributed Caching
  • Benefits of distributed caching on the cloud
  • Recommended Distributed Caching Database Tools
  • Ways to Deploy Distributed Caching on the cloud

Traditional Distributed Caching Challenges

Traditional distributed caching servers are usually deployed with limited storage and CPU speed on a few limited dedicated servers or Virtual Machines (VMs). Often these caching infrastructures reside on data centers (DCs) that are on-prem or the cloud on VMs which are not resilient, not highly available, and fault-torent. . This kind of traditional caching comes with numerous challenges:

  • Traditional caching is called in-process caching which is at the instance server level. In-process caching stores data at the application level locally like storing in EhCache etc. It doesn’t provide accurate data consistency.
  • In-process cache creates performance issues, because they occupy extra memory, and due to garbage collection overhead.
  • It’s not reliable, because it uses the same heap memory which is used by the application. If an application got crashed due to memory or some other issues, cached data will be also wiped out.
  • Hard to scale cache storage and CPU speed on fewer servers because often these servers are not auto-scalable.
  • High operational cost to manage infrastructure and unutilized hardware resources. These servers are managed manually on traditional DevOps infrastructure.
  • Traditional distributed caching is not containerized (not deployed on Kubernetes/Docker containers). Thatโ€™s why it is not easily scalable, resilient, and self-managed. Also, more possibilities of these fewer servers crashing if the client load is higher than the actual.

What is Distributed Caching

Caching is a technique to store the state of data outside of the main storage and store it in high-speed memory to improve performance. In a microservices environment, all apps are deployed with their multiple instances across various servers/containers on the hybrid cloud. A single caching source is needed in a multi-cluster Kubernetes environment on the cloud to persist data centrally and replicate it on its own caching cluster. It will serve as a single point of storage to cache data in a distributed environment.

Benefits of Distributed Caching on cloud

These are a few benefits of distributed caching:

  • Periodic caching of frequently used read REST APIโ€™s response ensures faster API read performance.
  • Reduced database network calls by accessing cached data directly from distributed caching databases.
  • Resilience and fault tolerance by maintaining multiple copies of data at various caching databases in a cluster. 
  • High availability by auto-scaling the cache databases, based on load or client requests.
  • Storage of session secret tokens like JSON Web Token (ID/JWT)  for authentication & authorization purposes for microservices apps containers.
  • Provide faster read and write access in-memory if it’s used as a dedicated database solution for high-load mission-critical applications.
  • Avoid unnecessary roundtrip data calls to persistent databases.
  • Auto-scalable cloud infrastructure deployment.
  • Containerization of Distributed caching libraries/solutions.
  • Provide consistent read data from any synchronized connected caching data centers (DC).
  • Minimal to no outage, high availability of caching data.
  • Faster data synchronization between caching data servers.

Recommended Distributed Caching Databases Tools

Following are popular industry-recognized caching servers:  

  • Redis 
  • Memcache 
  • GemFire and 
  • HazelCast databases

Redis: Itโ€™s one of the most popular distributed caching services. It supports different data structures. Itโ€™s an open-source, in-memory data store used by millions of developers as a database, cache, streaming engine, and message broker. It also has an enterprise version. It can be deployed in containers on private, public, and hybrid clouds etc. it provides consistent and faster data synchronization between different data centers (DC).

HazelCast: Hazelcast is a distributed computation and storage platform for consistent low-latency querying, aggregation, and stateful computation against event streams and traditional data sources. It allows you to quickly build resource-efficient, real-time applications. You can deploy it at any scale from small edge devices to a large cluster of cloud instances. A cluster of Hazelcast nodes share both the data storage and computational load which can dynamically scale up and down. When you add new nodes to the cluster, the data is automatically rebalanced across the cluster. The computational tasks (jobs) that are currently in a running state, snapshot their state and scale with a processing guarantee.

Memcached:  It is an open-source, high-performance, distributed memory object caching system. It is generic in nature but intended for use in speeding up dynamic web applications by alleviating database load. Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from the results of database calls, API calls, or page rendering. Memcached is simple yet powerful. Its simple design promotes easy, quick deployment and development. It solves many data caching problems and the API is available in various commonly used languages.

GemFire: It provides distributed in-memory data grid cache, powered by Apache Geode open source. It scales data services on demand to support high performance. Itโ€™s a key-value store that performs read and write operations at fast speeds. It offers highly available parallel message queues, continuous availability, and an event-driven architecture to scale dynamically, with no downtime. 


It provides multi-site replication. As data size requirements increase to support high-performance, real-time apps, they can scale linearly with ease. Applications get low-latency responses to data access requests, and always return fresh data. Maintain transaction integrity across distributed nodes. It supports high-concurrency, low-latency data operations of applications. It also provides node failover and multi Geo (Cross Data Center or Multi Data Center) replication to ensure applications are resilient, whether on-premises or in the cloud.

Ways to Deploy Distributed Caching on Hybrid cloud

These are recommended ways to deploy and setup distributed caching be it public cloud or hybrid cloud:

  • Open source distributed caching on traditional VM instances.
  • Open source distributed caching on Kubernetes container. I would recommend deploying on a Kubernetes container for high availability, resiliency, scalable and faster performance. 
  • Enterprise COTS distributed caching deployment on VM and Container. I would recommend the enterprise version because it will provide additional features and support.
  • The public cloud offers managed services of distributed caching open and enterprise sources like Redis, Hazelcast and Memcache, etc.
  • Caching servers can be deployed on multiple sources like on-prem and public cloud together, public servers, or only one public server in different availability zones.

Conclusion

Distributed caching is now a de-facto requirement for distributed microservices apps in a distributed deployment environment on a hybrid cloud. It addresses concerns in important use cases like maintaining user sessions when a cookie is disabled at the web browser side, improving API query read performance, avoiding operational cost and database hit for the same type of requests, managing secret tokens for authentication and authorization, etc.

Distributed cache syncs data on the hybrid cloud automatically without any manual operation and always gives the latest data. I would recommend industry-standard distributed caching solutions – Redis, Hazelcast, and Memcache. We need to choose a better distributed caching technology in the cloud based on use cases.

Understanding Technical Debt for Software Teams

Overview of Technical Debt

โ€œTechnical debt is a metaphor commonly used by software professionals in reference to short-term compromises made during the design, development, testing and deployment processesโ€.

In order to stay competitive, many organizations opt for software development methodologies like Agile, to accelerate the overall development processes. Cramped up release schedules often force teams to skip the standard practices, resulting in the accumulation of technical debt. Some technical debt is given less priority/importance during the rapid release cycles and is addressed post the production release.ย 

Organizations often push large, complex changes to speed up the release process. Short-term compromises are acceptable to a certain extent, however, long-term debt can damage an organizationโ€™s IT infrastructure and reputation. Sometimes, it comes with a heavy penalty of re-engineering and post-release fixes. These damages could be in the form of high costs for:

  • Remediating pending technical debt
  • Customer dissatisfaction due to scalability and performance issues 
  • Increased hiring and training 
  • Increased modernization time 

The cost of refactoring, re-engineering, re-base, and re-platform could be much higher than the original cost during initial development. These compromises should be thoroughly analyzed and approved by IT stakeholders and CXOs. This involves looking at future tradeoffs, risk appetite (risk capacity), and cost. Organizations also need to evaluate the pros and cons of taking technical debt decisions.

Taking on technical debt can be both tricky and risky for organizations. Hence organizations must factor in associated risks and operational costs. One of the consequences of Technical debt is the implied cost of reworking on applications and their architecture. Hence, organizations should choose easy development paths and limited solutions to shorten the production time.

If these technical debts are not addressed over time, the accrued interest makes it more challenging to implement changes, resulting in business and technical challenges.

A Scandinavian study reveals that developers waste, on average, 23% of their time due to technical debt.

As if that wasnโ€™t alarming enough, Stripe published data that shows that software developers on an average spend 42% of their workweek dealing with technical debt and bad code

Major Drivers of Technical Debt 

  • Faster solution design process
  • Faster development of source code
  • Quick releases
  • Cut throat business competition to release new and unique features early in market

Impact of accumulating Technical Debt 

  • It results in daily operational costs to accommodate remediation.
  • A longer development cycle leads to slower application releases.
  • It incurs long term financial loss due to technical debt accumulation.
  • It may result in compliance issues and lack of proper standards.
  • Code quality and design gets compromised.
  • More time is spent on debugging rather than development.
  • Failures that can put an organizationโ€™s reputation at risk.
  • It can be a cause of security breaches and hefty fines.
  • It can potentially lead to Loss of agility, lower productivity due to outages.

Types of Technical Debt

  • Design/Architecture Debt: 

It represents a design work with backlogs which may include a lack of design thinking processes, UI bugs, and other design flaws that were neglected. Most organizations do follow standard design practices like The Open Group Architecture Framework(TOGAF) due to the agile way of designing. Tools and techniques like the ADM and TOGAF implementation governance, provide the required format and standard of solution design.

  • Code Debt: 

It is the most common debt, which is skipped due to speedy agile delivery, complexity, or lack of subject knowledge. In some cases, new features are added in the latest version, which the dev team may not be aware of. This might result in the dev team working on the same feature again, resulting in unnecessary cost and time investment. Sometimes, the development team doesnโ€™t follow standard best practices for coding or use quick workarounds or they do not refactor code because of time-bound release cycles.

  • NFR/Infrastructure Debt

Introduced during designing and implementing Non Functional Requirements (NFR) such as:

  • Inaccurate scalability configuration may crash applications on high load.
  • Improper availability planning leads to outage issues when any data center is down.
  • Inaccurate caching, logging leads to slower application performance.
  • Repetitive code of error/exception handling may create refactoring and performance issues.
  • Additional auditing and tracing may lead to performance issues and occupy unnecessary database storage.
  • Ignorance of security may lead to serious data breaches and financial loss.
  • Improper observability and monitoring may not give an alert on time for any major issues in application and infrastructure. 
  • Testing Debt

The pressure of quick agile releases may force organizations to miss out on most of the manual and automated testing scenarios. Frequent unit and detailed end-to-end integration testing can detect major production issues. Sometimes, these detailed testings are skipped during the development phase which leads to major production bugs. 

  • Process Debt

It is introduced when a few less important business and technical process flow steps are skipped. In agile development, there are many processes that are followed like sprint planning, Kanban, Scrum, retro meetings, and some other project management processes such as Capability Maturity Model(CMM) and Project Management Institute(PMI), etc. Sometimes these processes are not followed religiously due to time constraint pressure, which may have a severe impact later.

  • Defect Debt

It is introduced when minor technical bugs are skipped during the testing phase like frontend UI cosmetic bugs, etc. These low severity bugs are deferred to the following releases, which may later have an impact in the form of production bugs. These production bugs spoil an organizationโ€™s reputation and profit margin.

  • Documentation Debt

It is introduced when some of the less important technical contents in the document are skipped. Improper documentation always creates an issue for customers and developers to understand and operate after the release. The engineering team may not properly document the release and feature/fix details due to quick release schedules. As a result, users find it difficult to test and use new features. 

  • Known or Deliberate  Debt

Known or deliberate debt is injected on purpose to accelerate releases. This acceleration is achieved by workarounds or alternate methods or technologies that use simple algorithms. For example, sometimes the dev team does not evaluate and consider better algorithms to avoid cyclomatic code complexity in the source code. As a result, it reduces the performance of the code.

  • Unknown Outdated/Accidental Debt

It is introduced unknowingly by developers/designers and other stakeholders. It is sometimes introduced by regression of other related code changes, independent applications and libraries. For example, if all applications use the same error handling library code and if there is a regression issue in that error handling library, it may impact all dependent applications.

  • Bit Rot Technical Debt

According to Wired, it involves, โ€œa component or system slowly devolving into unnecessary complexity through lots of incremental changes, often exacerbated when worked upon by several people who might not fully understand the original design.โ€ In practice, many old and new engineers work on the same module code without knowing the background details of the code. New engineers may rewrite or redesign code without understanding the initial design and background. It may create issues like regression issues, etc. This happens over time, and it should be avoided.

Causes of Technical Debt

  • Business competition

Competitive business markets may force organizations to roll out frequent feature releases to better/surpass/overtake their competitors and keep the customers interested.

  • Time constraints due to agile releases

With tighter deadlines, the development team doesnโ€™t have enough time to follow all coding/design standards such as language-specific coding standards, TOGAF enterprise design, suitable design patterns, review, testing/validation, and other best development practices.   

  • Save short-term cost

Some organizations want to develop and release features faster to save additional development costs on coding and design effort. They may prefer employing a small development team for faster releases with minimal short-term cost. These organizations may also hire junior or unskilled developers for more profit margin.  

  • Lack of knowledge and  training

The development team may change very frequently during exit, internal movement, and new hiring. Faster release cycles may result in undertrained resources due to lack of functional or technical training and little to no knowledge transfers about product and design.

  • Improper project planning

Tighter release schedules may result in improper project planning which plays a major role in introducing technical debt. For example, skipping important meetings with all business stakeholders or project planning phases such as agile retro, scrum, and sprint plan meetings, etc. 

  • Complex technical design and technical solution

The development teams prefer simple technical design and solution over a complex one, because they donโ€™t want to spend more time and effort understanding complex algorithms and technical solutions. Complex solutions take more time to understand and implement. They also need more POC evaluation and effort.

  • Poor development practices

Most development teams prefer shortcuts by following poor development practices. Due to aggressive release timelines and lack of knowledge, dev teams donโ€™t follow standard coding and design practices. 

  • Insufficient testing

It is a major contributor to technical debt. Regular unit and integration testing for even a small code change are very important. Testing and validation are the only mechanisms to identify bugs and shortfalls in software applications. They also find technical and functional bugs. Insufficient testing can lead to the introduction of technical debt.ย 

  • Delayed refactoring

Tight deadlines may force developments in giving less priority to refactoring code in the early stages. Hence they defer and delay code refactoring to prioritize quick releases

Due to high pressure, development teams defer and delay code refactoring to release software products early, or they donโ€™t give priority to refactor code during initial development. Sometimes they backtrack, review and refactor with a delayed path with low priority. 

  • Constant change

โ€˜Change is the only constant.โ€™ Software applications evolve and adopt new designs and technologies over time. Itโ€™s hard to cope with these constant changes in parallel. It takes time to revisit the source code, design and then implements the latest design and technologies.

  • Outdated technology 

Most traditional organizations use outdated technologies. They make late decisions to upgrade or defer modernization with modern technologies. They miss a lot of new modern features which are considered to be technical debt. This debt can be mitigated only by shifting to modern technologies.  

  • No involvement and mentoring by senior developer and architect

Itโ€™s very common to have less or no involvement of senior developers and architects during design and development. Itโ€™s very important to have senior mentors who can review and guide the development team to avoid technical debt. Those senior developers/architects might have a better understanding and experience of working on the same project or software applications.

Identifying and Analyzing Technical Debt

  • User feedback

User feedback/reviews are very important in identifying technical debt and mitigating it. Organizations should listen and act on usersโ€™ feedback for improvement and handling bugs. These feedbacks and bugs are considered to be technical debt.

  • Analyze bad code smell

Use manual and automated code review to understand bad code smell like memory leakage of JVM Java applications. There are many code analyzers or tools like SonarQube, PMD, FindBug, Checkstyle, etc, that can help. It could be integrated with automated build and deployment of CI/CD pipelines for every release.ย 

  • Monitoring and observability tools

Application Performance Monitoring (APM) tools are the best tools to monitor software applications continuously, for example, VMware Wavefront/Tanzu observability, Dynatrace, DataDog, etc. They have special algorithms to check the performance of applications and underlying infrastructure. They also analyze application logs and generate failure reasons reports. These reports are a great source of identifying technical debt.

  • Manual and automated code review 

Continuous, manual, and automated code review processes definitely help to identify technical debt using static and automated code analyzers.ย ย 

  • Operational profit and loss analysis

Itโ€™s done through business and senior CxO people by analyzing operational cost (Opex) and loss analysis reports. These reports give a fair idea of improvement and address important technical debt quickly. Addressing this technical debt is very important for any organization because it impacts their business revenue. 

  • Performance metrics

Application Performance Monitoring (APM) and load testing tools also generate performance reports of the software application on high load. This is the best way to identify and mitigate technical debt of Non-functional requirements (NFR) configurations like the performance of application and infrastructure, read caching availability, scalability, etc.

  • Understand long-term or short-term requirement

Organizations identify technical debt by understanding long-term and short-term technical requirements. Accordingly, they prioritize, plan and remediate. These requirements are prioritized based on business criticality and urgency. 

  • Review with latest industry-standard best practices

Some technical debt can be analyzed by comparing with the latest industry-standard software application design and development such as Agile, TDD, BDD, Scrum, Kanban, Cloud-native, microservices, micro frontends, TOGAF, and latest technology trends like Cloud, etc.

  • Code refactoring tools and techniques

There are modern tools available that are capable of analyzing legacy monolithic apps and providing suggestions or refactoring partially to modern cloud-native microservices design. They also provide tools to migrate on-prem VM (Virtual Machine) to cloud VM with easy lift and shift rebase.

  • Security analysis

Some security-related technical debt is identified during the security analysis phase. There are some security analysis tools available like CheckMarx, SonarQube which generate security reports for applications. There are many other infrastructure security tools like Vmware Carbon black endpoint in security, RSA, Aquasec, Claire aqua security etc.

Best Practices to Avoid Technical Debt

To reduce technical debt, itโ€™s essential to analyze and measure it. You can calculate technical debt by using remediation and development costs as parameters. These are a few techniques to avoid technical debt:

  • Listen to user feedback comments and remediate application technical debt.
  • Religiously follow consistent code review practices. Have multiple rounds of manual code and design reviews by senior peers and architects.
  • Do automated testing after every build and release.
  • Monitor and analyze reports based on observability and monitoring tools.  
  • Analyze and evaluate the performance and business impact of any new code and design change before implementing.
  • Follow standard coding best practices.
  • Follow the manual and automated static code review for any release.
  • Use incident management and issue tracker to report and track bugs.
  • Always review and validate solution architecture before implementation. 
  • Follow static and dynamic code analysis using code analyzer tools like Somarqube, PMD, FindBug, etc.
  • Follow agile iterative development approach and regularly do retrospective meetings. Also, measure technical debt/TDR in each iteration.
  • Use project management tools like Jira, Trello, etc.
  • Do code refactoring of legacy code. Always revisit code and modularize common code components.  
  • Strictly follow test-driven development (TDD) and Behavioral Driven Development (BDD) approach for every module of code.
  • Follow continuous build, integration, test, and validate the approach on all releases.
  • Last but not the least, technical debt should be documented, measured, and prioritized. 

Estimating Technical Debt Cost

Itโ€™s very important to measure technical debt cost as it helps stakeholders and senior management to analyze and prioritize remediation costs. This should be a measurable number to make business decisions. It also helps to track the technical debt remediation status. There are so many measurable variables to measure code and constraints to calculate technical debt.

There are various tools available like SonarQube to check code quality, code complexities, lines of code, etc.

We can calculate technical debt as a ratio of the cost to fix a software system [Remediation Cost] to the cost of developing it [Development Cost]. This ratio is called the Technical Debt Ratio [TDR]:

Technical Debt Ratio (TDR) = (Remediation Cost / Development Cost) x 100%

Good TDR should be <=5%. High TDR shows bad code quality, which involves more remediation costs. 

Optionally, remediation cost (RC) and Development cost (DC) could be also replaced by hours, which will help to calculate remediation time in terms of total efforts in hours. 

Key Takeaways

These are some key points about technical debt cost:

  • The average organization wastes 25% to 45% of its development cost.
  • Hiring and training new engineers involve additional costs and an increase in coordination costs.
  • Operational overhead cost by spending 15 to 20% on unplanned work.
  • Impacts organizationsโ€™ revenue for additional and unplanned work.
  • Waste of time in analyzing improvement of source code and design.
  • Lower productivity rate around 60 to 70%.
  • Cost of  project management and modern tooling.

Conclusion

Technical Debt can impact different factors like overall operations cost, velocity, quality of the product and can easily end up impacting teamsโ€™ productivity and morale. Hence avoiding technical debt or addressing it at the right intervals during the development process is the best way forward. We hope this blog helps you have a better understanding of Technical debt and best practices of their remediation.

Spring API Gateway Implementation with sample apps

Spring Cloud Gateway Overview

The Spring Cloud Gateway (SCG) is an API gateway proxy. It’s open-source based on the Java language. It has tons of features and can be embedded with code and can also be deployed as a separate service and scaled easily on Kubernetes containers.

The SCG is capable of handling client requests traffic by routing to desired microservices using the gateway handler mapping and aggregate responses of different back-end REST API endpoints of microservices.

SCG runs on Netty the non-blocking web server which provides asynchronous request processing for faster non-blocking processing of client requests.

Itโ€™s based on the following three major pillars:

  • Route traffic to microservices: It handles routing client requests to designated REST API endpoint destinations. Imagine a use case where only the /catalogue API is being called. In this use case, the API gateway forwards client requests with payload to the Catalogue microservice. We will see this in our next source code example in the next section.
  • Predicates: It helps to add the condition on incoming client requests like checking request URI, parameter, or assigning weight.
  • Filters: It helps to implement the Spring framework web filters. Developers can modify the request and response based on the clientโ€™s preferences or security reasons. Developers can also add their custom filtering logic to filter incoming client requests and outgoing responses with no source code changes.

Spring Cloud Gateway Implementation Step by Step

In this section, we will implement the Spring Cloud Gateway routing features. In this coding exercise, we will cover major API gateway features route traffic, filters, and predicates.

We will use the dynamic routing configuration by making changes in application properties files. It will make changes applied dynamically without restarting microservices web apps. It will be deployed as a separate microservice on the Kubernetes container and can be deployed on multiple containers to provide HA

Prerequisite

These are basic installation requirements to build:

โ€ข Java 8+
โ€ข Spring Boot v2.4.1+
โ€ข Spring Cloud Gateway
โ€ข Java v8.x+
โ€ข Source code reference: https://github.com/rajivmca2004/spring-gateway-demo

Letโ€™s create a simple Spring Boot Java microservice using the Spring Cloud Gateway:

1. Create the catalogue-cache-service project using the SprCreate the spring-gateway-demo project using the Spring Initializer web portal: https://start.spring.io/

2. Now, we will define Spring Cloud Gateway routes, predicates, and filters in the application.yaml file.

3. We will configure API gateway routing for two separate microservices customer-management-service and catalogue-service. We will test and verify these in the upcoming points.

1.	spring:  
2.	  application:  
3.	    name: catalogue-service  
4.	  jpa:  
5.	    hibernate:  
6.	      ddl-auto: update  
7.	  cache:  
8.	    type: redis  
9.	  redis:  
10.	    host: localhost  
11.	    port: 6379  
12.	  
13.	server:  
14.	  port : 8010  
15.	springdoc:  

4. We will use filters attributes [RS1] [RS2] of the Spring Cloud Gateway in the rate limiting implementation section in the same chapter:


1.	spring:  
2.	  application:  
3.	    name: spring-gateway-demo  
4.	  redis:  
5.	    host: localhost  
6.	    port: 6379      
7.	  cloud:  
8.	    gateway:  
9.	      routes:  
10.	      - id: catalogues_route  
11.	        uri: http://localhost:8010  
12.	        predicates:  
13.	        - Path=/catalogue  
14.	        - Weight=group1, 6  
15.	      - id: customers_route  
16.	        uri: http://localhost:8011  
17.	        predicates:  
18.	        - Path=/customers  

5. If the client goes to http://localhost:8080/customers, then it will route to the customer-management-service microservice REST API http://localhost:8011/customers which is running as a separate web service on a different container and port number:

6. If a client click to http://localhost:8080/catalogue, then it will route to the customer-management-service microservice REST API http://localhost:8010/catalogue which is running as a separate web service on a different container and port number:

Distributed Caching with Redis

When there is a need to improve the performance of web applications/microservices every millisecond counts. API gateway provides a powerful feature of distributed caching where API responses can be cached and be available for all distributed microservices. It may span multiple servers on separateย  Kubernetes containers. In caching, objects/data are stored in high-speed static RAMย  memory for faster access. Memory caching is effective because all microservices apps access the same set of cached data. The objective of distributed cached memory is to store program instructions and data that are used repeatedly by clients.ย 

Distributed caching is an important caching strategy for decreasing a distributed microservices apps latency and improving its concurrency and scalability for better performance. Cache eviction strategy should be also configured regularly to replace it with the latest fresh data.ย According to a research by http://www.marketingdive.com:

  • New research by Google has found that 53% of mobile website visitors ย will leave if a webpage doesnโ€™t load within 3 seconds.ย 
  • The average load time for sites is 19 seconds on a 3G connection and 14 ย seconds on a 4G connection.ย 

API Caching with Redis distributed caching

Redis is an open-source in-memory data structure project implementing aย  distributed caching, in-memory key-value database. Redis supports different kinds of abstract data structures, such as strings, lists, maps, sets, sorted sets,ย  hyper log, bitmaps, streams, and spatial indexes.ย 

Redis is a high-performance, in-memory, data structure server (not just a key-value store). On large-scale distributed systems with a high number of API calls per second, Redis is a perfect distributed caching solution for this kind of distributed enterprise microservice architecture. Itโ€™s faster than usual database calls because Redis serves data from static RAM cache memory.ย 

Apps are responsible to fetch data from the database and push to the Redis cluster on a master node that updates/writes all new cache data entries into the Redis cluster.ย  Redis master writes/updates data to Redis slave nodes. Redis server run in twoย  modes:ย 

  • Master Mode (Redis Master)ย โˆ™
  • Slave Mode (Redis Slave/Redis Replica)

We can configure Redis to choose a mode to write and read from. It is recommended to serve writes through the Redis leader and reads through the Redis follower.

Redis cluster architecture for high availability (HA)

Every leader should have one follower minimum; certainly can have more followers than leaders, which would be preferable to having a single follower per leader so that you can have one fail and still have a backup follower for some redundancy post failover.

Clients write on the leader node and read from follower nodes. Clients can directly connect with leader nodes for reads if followers are not available or down. Every leader node replicates the cached data to its follower. It could be one or many followers. They are all configurable.

All leaders and followers check the health status of every node by using the gossip protocol.

Notification System Design

Objective:

Design enterprise level system architecture to support email, SMS, Chat and other public social app integrations using API:

  • Email
  • SMS/OTP
  • Push notifications (Mobile and Web browser)
  • Chat – Whatsapp/Telegram

It’s a generic feature of all kind of web and mobile applications, which is required for all modern distributed applications regardless of using any programming languages and technologies. You can customize based on your business use cases.

I have tried to simplify this design concept to fulfil common use case requirements with high availability, high perfromance and analytical services. It’s a very important medium of communication with customers/users thru their desktop/mobile devices. I would recommend to implement using microservice architecture and deploy ion Kubernetes containers to make it fully cloud native modern system. Let’s get started!

Functional Requirement:

  • Send notifications
  • Prioritize notifications
  • Send notifications based on customer’s saved preferences
  • Single/simple and bulk notification messages
  • Analytics use cases for various notifications
  • Reporting of notification messages

Non-functional requirements (NFR):

  • High perfromance
  • Highly available (HA)
  • Low latency
  • Extendable/Pluggable design to add more clients, adapters and vendors.
  • Support Android/iOS mobile and desktop/laptop web browsers.
  • API integration with all notification modules and external integrations wth clients and service providers/vendors.
  • Scalable for higher load on-prem (VMware Tanzu) and on public cloud services like AWS, GCP, or Azure etc.

System Design Architecture:

Important- (Copyright) Please DO NOT COPY and use this image.

These are the solution design considerations and components:

1. Notification clients:

These clients will request for single and bulk messages using API calls. These clients will send notification messages to simple and bulk notification services:

  • Bulk Notification clients: These clients send bulk notification(s).
  • Simple Notification clients: These clients send single notification(s).

2. Notification Services:

These services are entry services which will expose REST APIs to clients and interact with the clients.  They are responsible to build notification messages by consuming Template Service. These messages will be also validated using Validation Service

  • Simple Notification Service: This service will expose APIs to integrate client with backend services. Itโ€™s a main service, which will handle simple notification request.
  • Bulk Notification Service: This service will expose APIs to integrate client with backend services. Itโ€™s a main service, which will handle bulk notification request.

This service will also manage notification messages. It wills persist sent messages to databases and maintain activity log. Same message can be resent using APIs of these services. It will provide APIs to add/update/delete and view old and new messages. It will also provide web dashboard which should have filter option to filter messages based on different criteria like date range, priority, module user, user groups etc.

3. Template Service:

This service manages all ready-to use templates for OTP, SMS, Email,chat and other push notification messages. It also provides REST APIs to create, update, delete and manage templates. It will also provide an UI dashboard page to check and manage message templates from web console.

4. User Selection Service:

This service will provide services to choose target users and various application modules. There could be use cases to send bulk messages to specific group of users or different application modules. It could be also AD/IAM/eDirectory/user database/ user groups based on customerโ€™s preferences. Internally, it will consume API services of User Profile Service APIs and check customers notification preferences.

5. User Profile Service:

This service will provide various features including managing users profile and their preferences . It will also provide feature to unsubscribe for notifications and also notification receiving frequency etc. Notification Service will be dependent on this service.

6. Common Notification Service

  • Scheduling Service:

This service will provide APIs to schedule notifications like immediate or any given time. It could be any of these followings:

  • Second
  • Minute
  • Hourly
  • Daily
  • Weekly
  • Monthly
  • Yearly
  • Custom frequency etc.

There could be other services also, which can be auto-triggered messages based on the scheduled times.

  • Validation Service:

This service solely responsible for validating notification messages against business rules and expected format. Bulk messages should be approved by authorized system admin only.

  • Validation Service:

It will also prioritize notification based on high, medium and low priorities. OTP notification messages have higher priority with a time-bound expiry time, they will always be sent in higher priority. Common Outbound Handler will consume messages and process and send based on the same priorities from reading in three different queues high, medium and low. Another use case of bulk messages can be send using low priority during off hours. Application notifications during transactions could be sent to medium priority like email etc. Business will decide priority based on criticality of the notifications.

7. Event Priority Queues (Event Hub):

It will provide event hub service which will consume messages from notification services in high, medium and low topics. It sends processed and validated messages to Notification Handler Service which internally uses Notification Preferences Service to check users personal preferences.

It will have these three topics, which will be used to consume/send messages based on business priority:

  • High
  • Medium
  • Low

8. Common Outbound Handler:

This service will consume notification messages from Event Hub by polling event priority queues based on their priority. High precedence will be given to โ€œHighโ€ queue and so on so forth. Finally It will send notification messages to message specific adapter thru Event Hub.

This service will also fetch target user/applications from User Selection Service.

9. Notification DB

It will persist all notification messages with their delivery time, status etc. It will have a cluster of databases with a leader which will be used to perform all write operations and read will be on read replica/followers. It should be No-SQL database.

10. Outbound Event Hub:

It finally transmits message to various supported adapters. These adapters will be based on different devices (desktop/mobile) and notification types( sms/OTP/Email/Chat/Push notifications).

11. Notification Adapters:

These are adapters which will transform incoming messages from event hub (Kafka) and send to external vendors according to their supported format. These are a few adapters, we can add more based on use case requirements:

  • OTP Adapter Service
  • SMS Adapter Service
  • Email Adapter Service
  • In-App Notification Adapter Service
  • WhatsApp Chat Notification Adapter Service
  • Telegram Notification Adapter Service

12. Notification Vendors:

These are the external SAAS (on cloud/on-prem) vendors, which provide actual notification transmission using their infrastructure and technologies. They maybe paid enterprise services like AWS SNS, MailChimp etc.

  • SMS Vendor Integration Service
  • Email Vendor Integration Service
  • App Push Notification Vendor Integration Service
  • WhatsApp Vendor Integration Service
  • Telegram Vendor Integration Service

13. Notification Analytical Service

This service will do all analytics and identify notification usage, trends and do a reporting on top of that. It will pull all final notifications messages from analytical database (Cassandra) and Notification databases for analytics and reporting purpose.

These are a few use cases:

  • Total number of notifications per day/per sec.
  • Which is highly used notification system.
  • What’s average size and frequency of messages.
  • Filter out messages based on their priorities and many more…


14. Notification Tracker

This service will continuously read Event hub queues and track all sent notifications. It captures metadata of the notifications like transmission time delivery status, communication channel, message type etc.

15. Cassandra Database Cluster

This database cluster will persist all notifications for analytics and reporting purpose. It’s based on write more and read less concept.

This will provide good performance and low latency for high number of notifications, because it internally manages high number of write operations and sync up with other database nodes and keep duplicate data/messages for high availability and reliability. Messages will be always available in case of any node get crashed.

Please share feedback and let me know if you have any suggestion to make this design better!

15. Inbound Notification Service

This service will expose API endpoint to external clients and applications to send inbound messages.

16. INBOUND event Hub

This topic will be used to queue and process all incoming notification messages from Inbound notification clients.

17. Inbound Handler

This will consume all incoming notification messages from INBOUND topic.

18. Inbound Notification Clients

These inbound notification messages will come from internal and external sources/applications.

API Introduction and Best practices!

Disclaimer: It has been taken from my book – “Cloud Native Microservices using Spring and Kubernetes“.

Application Programming Interface (API) allows two apps/resources to talk to each other and is mostly referred for Service Oriented Architecture (SOA)

API is gaining more popularity when microservices development is booming for modern cloud-native applications or app modernization. We canโ€™t imagine microservices without APIs, because there are so many distributed services in a microservice architecture, which canโ€™t be easily integrated without the help of API. So, both Microservices and API compliments each other!

Itโ€™s an architectural design specification, a set of protocols that provides an interface to integrate and talk different microservices/monolithic apps and databases with each other. API does talk about how external services can communicate with apps, not how it works!

It creates an integration contract between different apps/external clients with a standard set of rules and specifications. Itโ€™s followed as a development practice for external clients/apps.

API is based on a contract first design pattern of development, where all developments happen around APIs specifications and protocols. Developers use the same standard practices across different microservices agile development teams.

API best practices

Now, we will discuss a few best API practices in detail:

  • Follow OpenAPI standard: Modern apps API should follow OpenAPI specification to make it compatible and portable for all kinds of apps.
  • API web dashboard support: It should be developer-friendly with the API management dashboard, which helps to create, manage, and monitor APIs in large systems or microservices environments. There are many API open source and enterprise solutions like OpenAPI based SwaggerHub, Google Apigee, and so on. They provide a web-based dashboard to manage APIs dynamically and can be exported  as source code and shared with development teams.
  • Web-based HTTP with REST: Most of the apps, databases, and messaging systems use REST over HTTP protocol communication over the internet. REST is widely accepted, supported by most of the clients and logical integration apps, and so on. Itโ€™s more flexible, has rich features. If you are building an API then you should know the basics about HTTP web protocol and its methods, attributes, and status codes. Also, you should have a good understanding of the REST style of API interfaces, because REST is resource oriented architectural style.
  • Return valid structured JSON response: Donโ€™t return plain text message. It should be well-structured JSON, XML, or a similar response. Example is as follows:

“sku”: 101, 

“pInfo”: { 

“fullProductName”: “LG 50B6000FHD 127 Fridge”, 

“brand”: “LG”, 

“model”: “50B6000FHD”, 

“category”: “Fridge”, 

}   

  • Maintain status codes: API must return HTTP status codes because when a client sends a request to server through REST API, it expects response from the server, if itโ€™s a success or failure. There are standard pre-defined error codes for this purpose:
Status codesDescriptions
2xxSuccess category, for example, 200 โ€“ Ok
3xxRedirection category, for example,  304 – Not modified
4xxClient error category, for example, 404 – Not found
5xxServer errors category, for example, 500 – Internal server error

  • API Endpoint naming standard: Name the collections using plural nouns. The reason behind this, the same resource can return a single record or multiple records. Itโ€™s not recommended to have two separate resources URIs for these two resources. For example, /orders is a valid URI name for API, which serves both purposes.

Use nouns instead of verbs. It will be a standard naming convention, because multiple operations can be done on a single resource or object. For example, /orders is a noun and correct way because order can be created, updated, deleted, and fetched. Itโ€™s not recommended to use / createOrder, /updateOrder, /deleteOrder, and so on.

  • Error handling, return error details with error code: Server resource should always returns appropriate error code, internal error code and simple human-readable error message for better error and exception handling at client-side apps, for example:

“status”: “400”, 

“erroCode”: “2200” 

“errorDetail”: “Connection refused” 

}

  • Return appropriate HTTP response status code: Every REST endpoint should return a meaningful HTTP response code to handle server responses in a better way like:
    • 200 for success.
    • 404 for not found.
    • 201 resource is created.
    • 304 not modified. Response already in its cache.
    • 400 bad request. The client request was not processed, as the server could not understand what the client was asking for.
    • 401 unauthorized. Not allowed to access resources, and should re-request with the required credentials.
    • 403 forbidden. Client is authenticated, but the client is not allowed access to the page or resource for some reason.
    • 503 services unavailable. The server is down or unavailable to receive and process the request.
  • Avoid nesting of related resources:  Sometimes, resources are related to each other like /orders resource object is related to catalogue category and user ID. We should not nest resources like: GET /orders/mobile/111 

Itโ€™s recommended to use top-level resource and make other related resources as a query parameter like this:

              GET /orders?ctg=mobile&userid=111

  • Handle trailing slashes: Itโ€™s always advisable to use only one approach either with trailing spaces like /orders/ or without it /orders to avoid any confusion.
  • Use sorting, filtering, querying, pagination: In many use cases, a simple resource name wonโ€™t work.
    • Sorting: You need to request server API resources to sort data in ascending or descending order: GET /orders?sort=asc
    • Filtering:  Filter on some business conditions like return product catalog responses based on price range: GET /orders?minprice=100&maxprice=500
    • Querying: Use cases where you want to query products based on their category like searching electronics products based on mobile category, for example: GET /orders?ctg=mobile&userid=111
    • Pagination: To improve performance and reduce latency on API calls over the internet, the client requests a subset of records at a single request like 10 records at a time for a given page. Itโ€™s called pagination: GET /orders?page=1&page_size=10
  • Versioning: Versioning is a very important concept of API, which helps consumers to migrate to newer versions without any outage. In this scenario, some clients can access newer versions, and others can still use older versions. There are various ways of API versioning:
  • Using URI path: Itโ€™s a standard technique to maintain different versions of the same APIs to support older versions of API resources, if the server-side API resource is upgraded to a newer version. Clients take some time to migrate and use the latest version of API. A new version to the same API resource can be changed, for example, /order/v1, / order/v2. The internal version of the API uses the 1.2.3 format like this: MAJOR.MINOR.PATCH.
    • Major version: It contains major code changes in business logic or other components. A new major version is added to the new API and the version number is used to route to the correct host.
    • Minor and patch versions: These are used internally for backwardcompatible updates. They are usually communicated in changelogs to inform clients about new functionality or a bug fix. The minor version represents minor changes and the patch contains break-fixes or security patches, and so on.
  • Using query parameters: In this method, version number is added into query parameters in key and value. Itโ€™s simple to use, however not recommended because itโ€™s difficult to route requests to APIs. For example: /orders?version=1.
  • Using custom headers: In this method, version number can be added in HTTP request header. It avoids the clutter of URI versions; however, we need to create and manage new headers. For example: Accepts-version: 1.0.
  • Using content negotiation: Itโ€™s also added in the header, allows a single resource representation instead of versioning the entire API which gives us more granular control over versioning. In this method, no need to create routing rules at the source API codes of different versions. This approach is not very popular, because itโ€™s difficult to test and verify changes in browsers. For example: Accept: application/json; version=1
  • Caching: API caching is a very much needed feature to improve API read (GET) performance. It caches responses from the API and for the same set of data and makes it available for other similar client requests. Itโ€™s recommended to use distributed caching techniques in a distributed microservices environment so that the same cached response should be available for multiple instances of the same microservice app.
  • Rate limiting and throttling: Rate limiting is a technique of counting client requests with counter and limit based on the subscription or maximumallowed limit to control traffic on the server, also it is useful for security reasons to avoid hackers to hit continuously and bring the system down by consuming all the memory and compute resources.

API throttling controls the way API is being consumed by external apps/ or clients. It also indicates a temporary state and is used to control the data that external clients can access through a REST API. When a throttle is triggered, we can disconnect client requests, client apps, device ID, a user or just reduce the response rate. You can define a throttle at the application, API or user level.

There are multiple ways to implement rate-limiting. Spring Cloud Gateway provides rate limiting wrapper using distributed caching such as Redis or a similar caching tool.

  • API gateway support: Itโ€™s recommended to expose APIs using API gateway tools to external apps. API gateway takes care of routing and orchestrating to designated server-side API, filtering, rate limiting, throttling, circuit breaker, API, authentication, authorization, and so on out of the box. It makes your API configuration outside of the business logic source code. It makes actual business logic code lighter and easy to debug and maintain.