Notification System Design

Objective:

Design enterprise level system architecture to support email, SMS, Chat and other public social app integrations using API:

  • Email
  • SMS/OTP
  • Push notifications (Mobile and Web browser)
  • Chat – Whatsapp/Telegram

It’s a generic feature of all kind of web and mobile applications, which is required for all modern distributed applications regardless of using any programming languages and technologies. You can customize based on your business use cases.

I have tried to simplify this design concept to fulfil common use case requirements with high availability, high perfromance and analytical services. It’s a very important medium of communication with customers/users thru their desktop/mobile devices. I would recommend to implement using microservice architecture and deploy ion Kubernetes containers to make it fully cloud native modern system. Let’s get started!

Functional Requirement:

  • Send notifications
  • Prioritize notifications
  • Send notifications based on customer’s saved preferences
  • Single/simple and bulk notification messages
  • Analytics use cases for various notifications
  • Reporting of notification messages

Non-functional requirements (NFR):

  • High perfromance
  • Highly available (HA)
  • Low latency
  • Extendable/Pluggable design to add more clients, adapters and vendors.
  • Support Android/iOS mobile and desktop/laptop web browsers.
  • API integration with all notification modules and external integrations wth clients and service providers/vendors.
  • Scalable for higher load on-prem (VMware Tanzu) and on public cloud services like AWS, GCP, or Azure etc.

System Design Architecture:

Note: Please click on the image to see clear view!

These are the solution design considerations and components:

1. Notification clients:

These clients will request for single and bulk messages using API calls. These clients will send notification messages to simple and bulk notification services:

  • Bulk Notification clients: These clients send bulk notification(s).
  • Simple Notification clients: These clients send single notification(s).

2. Notification Services:

These services are entry services which will expose REST APIs to clients and interact with the clients.  They are responsible to build notification messages by consuming Template Service. These messages will be also validated using Validation Service

  • Simple Notification Service: This service will expose APIs to integrate client with backend services. It’s a main service, which will handle simple notification request.
  • Bulk Notification Service: This service will expose APIs to integrate client with backend services. It’s a main service, which will handle bulk notification request.

This service will also manage notification messages. It wills persist sent messages to databases and maintain activity log. Same message can be resent using APIs of these services. It will provide APIs to add/update/delete and view old and new messages. It will also provide web dashboard which should have filter option to filter messages based on different criteria like date range, priority, module user, user groups etc.

3. Template Service:

This service manages all ready-to use templates for OTP, SMS, Email,chat and other push notification messages. It also provides REST APIs to create, update, delete and manage templates. It will also provide an UI dashboard page to check and manage message templates from web console.

4. User Selection Service:

This service will provide services to choose target users and various application modules. There could be use cases to send bulk messages to specific group of users or different application modules. It could be also AD/IAM/eDirectory/user database/ user groups based on customer’s preferences. Internally, it will consume API services of User Profile Service APIs and check customers notification preferences.

5. User Profile Service:

This service will provide various features including managing users profile and their preferences . It will also provide feature to unsubscribe for notifications and also notification receiving frequency etc. Notification Service will be dependent on this service.

6. Common Notification Service

  • Scheduling Service:

This service will provide APIs to schedule notifications like immediate or any given time. It could be any of these followings:

  • Second
  • Minute
  • Hourly
  • Daily
  • Weekly
  • Monthly
  • Yearly
  • Custom frequency etc.

There could be other services also, which can be auto-triggered messages based on the scheduled times.

  • Validation Service:

This service solely responsible for validating notification messages against business rules and expected format. Bulk messages should be approved by authorized system admin only.

  • Validation Service:

It will also prioritize notification based on high, medium and low priorities. OTP notification messages have higher priority with a time-bound expiry time, they will always be sent in higher priority. Common Outbound Handler will consume messages and process and send based on the same priorities from reading in three different queues high, medium and low. Another use case of bulk messages can be send using low priority during off hours. Application notifications during transactions could be sent to medium priority like email etc. Business will decide priority based on criticality of the notifications.

7. Event Priority Queues (Event Hub):

It will provide event hub service which will consume messages from notification services in high, medium and low topics. It sends processed and validated messages to Notification Handler Service which internally uses Notification Preferences Service to check users personal preferences.

It will have these three topics, which will be used to consume/send messages based on business priority:

  • High
  • Medium
  • Low

8. Common Outbound Handler:

This service will consume notification messages from Event Hub by polling event priority queues based on their priority. High precedence will be given to “High” queue and so on so forth. Finally It will send notification messages to message specific adapter thru Event Hub.

This service will also fetch target user/applications from User Selection Service.

9. Notification DB

It will persist all notification messages with their delivery time, status etc. It will have a cluster of databases with a leader which will be used to perform all write operations and read will be on read replica/followers. It should be No-SQL database.

10. Outbound Event Hub:

It finally transmits message to various supported adapters. These adapters will be based on different devices (desktop/mobile) and notification types( sms/OTP/Email/Chat/Push notifications).

11. Notification Adapters:

These are adapters which will transform incoming messages from event hub (Kafka) and send to external vendors according to their supported format. These are a few adapters, we can add more based on use case requirements:

  • OTP Adapter Service
  • SMS Adapter Service
  • Email Adapter Service
  • In-App Notification Adapter Service
  • WhatsApp Chat Notification Adapter Service
  • Telegram Notification Adapter Service

12. Notification Vendors:

These are the external SAAS (on cloud/on-prem) vendors, which provide actual notification transmission using their infrastructure and technologies. They maybe paid enterprise services like AWS SNS, MailChimp etc.

  • SMS Vendor Integration Service
  • Email Vendor Integration Service
  • App Push Notification Vendor Integration Service
  • WhatsApp Vendor Integration Service
  • Telegram Vendor Integration Service

13. Notification Analytical Service

This service will do all analytics and identify notification usage, trends and do a reporting on top of that. It will pull all final notifications messages from analytical database (Cassandra) and Notification databases for analytics and reporting purpose.

These are a few use cases:

  • Total number of notifications per day/per sec.
  • Which is highly used notification system.
  • What’s average size and frequency of messages.
  • Filter out messages based on their priorities and many more…


14. Notification Tracker

This service will continuously read Event hub queues and track all sent notifications. It captures metadata of the notifications like transmission time delivery status, communication channel, message type etc.

15. Cassandra Database Cluster

This database cluster will persist all notifications for analytics and reporting purpose. It’s based on write more and read less concept.

This will provide good performance and low latency for high number of notifications, because it internally manages high number of write operations and sync up with other database nodes and keep duplicate data/messages for high availability and reliability. Messages will be always available in case of any node get crashed.

Please share feedback and let me know if you have any suggestion to make this design better!

15. Inbound Notification Service

This service will expose API endpoint to external clients and applications to send inbound messages.

16. INBOUND event Hub

This topic will be used to queue and process all incoming notification messages from Inbound notification clients.

17. Inbound Handler

This will consume all incoming notification messages from INBOUND topic.

18. Inbound Notification Clients

These inbound notification messages will come from internal and external sources/applications.

API Introduction and Best practices!

Disclaimer: It has been taken from my book – “Cloud Native Microservices using Spring and Kubernetes“.

Application Programming Interface (API) allows two apps/resources to talk to each other and is mostly referred for Service Oriented Architecture (SOA)

API is gaining more popularity when microservices development is booming for modern cloud-native applications or app modernization. We can’t imagine microservices without APIs, because there are so many distributed services in a microservice architecture, which can’t be easily integrated without the help of API. So, both Microservices and API compliments each other!

It’s an architectural design specification, a set of protocols that provides an interface to integrate and talk different microservices/monolithic apps and databases with each other. API does talk about how external services can communicate with apps, not how it works!

It creates an integration contract between different apps/external clients with a standard set of rules and specifications. It’s followed as a development practice for external clients/apps.

API is based on a contract first design pattern of development, where all developments happen around APIs specifications and protocols. Developers use the same standard practices across different microservices agile development teams.

API best practices

Now, we will discuss a few best API practices in detail:

  • Follow OpenAPI standard: Modern apps API should follow OpenAPI specification to make it compatible and portable for all kinds of apps.
  • API web dashboard support: It should be developer-friendly with the API management dashboard, which helps to create, manage, and monitor APIs in large systems or microservices environments. There are many API open source and enterprise solutions like OpenAPI based SwaggerHub, Google Apigee, and so on. They provide a web-based dashboard to manage APIs dynamically and can be exported  as source code and shared with development teams.
  • Web-based HTTP with REST: Most of the apps, databases, and messaging systems use REST over HTTP protocol communication over the internet. REST is widely accepted, supported by most of the clients and logical integration apps, and so on. It’s more flexible, has rich features. If you are building an API then you should know the basics about HTTP web protocol and its methods, attributes, and status codes. Also, you should have a good understanding of the REST style of API interfaces, because REST is resource oriented architectural style.
  • Return valid structured JSON response: Don’t return plain text message. It should be well-structured JSON, XML, or a similar response. Example is as follows:

“sku”: 101, 

“pInfo”: { 

“fullProductName”: “LG 50B6000FHD 127 Fridge”, 

“brand”: “LG”, 

“model”: “50B6000FHD”, 

“category”: “Fridge”, 

}   

  • Maintain status codes: API must return HTTP status codes because when a client sends a request to server through REST API, it expects response from the server, if it’s a success or failure. There are standard pre-defined error codes for this purpose:
Status codesDescriptions
2xxSuccess category, for example, 200 – Ok
3xxRedirection category, for example,  304 – Not modified
4xxClient error category, for example, 404 – Not found
5xxServer errors category, for example, 500 – Internal server error

  • API Endpoint naming standard: Name the collections using plural nouns. The reason behind this, the same resource can return a single record or multiple records. It’s not recommended to have two separate resources URIs for these two resources. For example, /orders is a valid URI name for API, which serves both purposes.

Use nouns instead of verbs. It will be a standard naming convention, because multiple operations can be done on a single resource or object. For example, /orders is a noun and correct way because order can be created, updated, deleted, and fetched. It’s not recommended to use / createOrder, /updateOrder, /deleteOrder, and so on.

  • Error handling, return error details with error code: Server resource should always returns appropriate error code, internal error code and simple human-readable error message for better error and exception handling at client-side apps, for example:

“status”: “400”, 

“erroCode”: “2200” 

“errorDetail”: “Connection refused” 

}

  • Return appropriate HTTP response status code: Every REST endpoint should return a meaningful HTTP response code to handle server responses in a better way like:
    • 200 for success.
    • 404 for not found.
    • 201 resource is created.
    • 304 not modified. Response already in its cache.
    • 400 bad request. The client request was not processed, as the server could not understand what the client was asking for.
    • 401 unauthorized. Not allowed to access resources, and should re-request with the required credentials.
    • 403 forbidden. Client is authenticated, but the client is not allowed access to the page or resource for some reason.
    • 503 services unavailable. The server is down or unavailable to receive and process the request.
  • Avoid nesting of related resources:  Sometimes, resources are related to each other like /orders resource object is related to catalogue category and user ID. We should not nest resources like: GET /orders/mobile/111 

It’s recommended to use top-level resource and make other related resources as a query parameter like this:

              GET /orders?ctg=mobile&userid=111

  • Handle trailing slashes: It’s always advisable to use only one approach either with trailing spaces like /orders/ or without it /orders to avoid any confusion.
  • Use sorting, filtering, querying, pagination: In many use cases, a simple resource name won’t work.
    • Sorting: You need to request server API resources to sort data in ascending or descending order: GET /orders?sort=asc
    • Filtering:  Filter on some business conditions like return product catalog responses based on price range: GET /orders?minprice=100&maxprice=500
    • Querying: Use cases where you want to query products based on their category like searching electronics products based on mobile category, for example: GET /orders?ctg=mobile&userid=111
    • Pagination: To improve performance and reduce latency on API calls over the internet, the client requests a subset of records at a single request like 10 records at a time for a given page. It’s called pagination: GET /orders?page=1&page_size=10
  • Versioning: Versioning is a very important concept of API, which helps consumers to migrate to newer versions without any outage. In this scenario, some clients can access newer versions, and others can still use older versions. There are various ways of API versioning:
  • Using URI path: It’s a standard technique to maintain different versions of the same APIs to support older versions of API resources, if the server-side API resource is upgraded to a newer version. Clients take some time to migrate and use the latest version of API. A new version to the same API resource can be changed, for example, /order/v1, / order/v2. The internal version of the API uses the 1.2.3 format like this: MAJOR.MINOR.PATCH.
    • Major version: It contains major code changes in business logic or other components. A new major version is added to the new API and the version number is used to route to the correct host.
    • Minor and patch versions: These are used internally for backwardcompatible updates. They are usually communicated in changelogs to inform clients about new functionality or a bug fix. The minor version represents minor changes and the patch contains break-fixes or security patches, and so on.
  • Using query parameters: In this method, version number is added into query parameters in key and value. It’s simple to use, however not recommended because it’s difficult to route requests to APIs. For example: /orders?version=1.
  • Using custom headers: In this method, version number can be added in HTTP request header. It avoids the clutter of URI versions; however, we need to create and manage new headers. For example: Accepts-version: 1.0.
  • Using content negotiation: It’s also added in the header, allows a single resource representation instead of versioning the entire API which gives us more granular control over versioning. In this method, no need to create routing rules at the source API codes of different versions. This approach is not very popular, because it’s difficult to test and verify changes in browsers. For example: Accept: application/json; version=1
  • Caching: API caching is a very much needed feature to improve API read (GET) performance. It caches responses from the API and for the same set of data and makes it available for other similar client requests. It’s recommended to use distributed caching techniques in a distributed microservices environment so that the same cached response should be available for multiple instances of the same microservice app.
  • Rate limiting and throttling: Rate limiting is a technique of counting client requests with counter and limit based on the subscription or maximumallowed limit to control traffic on the server, also it is useful for security reasons to avoid hackers to hit continuously and bring the system down by consuming all the memory and compute resources.

API throttling controls the way API is being consumed by external apps/ or clients. It also indicates a temporary state and is used to control the data that external clients can access through a REST API. When a throttle is triggered, we can disconnect client requests, client apps, device ID, a user or just reduce the response rate. You can define a throttle at the application, API or user level.

There are multiple ways to implement rate-limiting. Spring Cloud Gateway provides rate limiting wrapper using distributed caching such as Redis or a similar caching tool.

  • API gateway support: It’s recommended to expose APIs using API gateway tools to external apps. API gateway takes care of routing and orchestrating to designated server-side API, filtering, rate limiting, throttling, circuit breaker, API, authentication, authorization, and so on out of the box. It makes your API configuration outside of the business logic source code. It makes actual business logic code lighter and easy to debug and maintain.