Observability in Microservices

  • Monitor its efficiency in terms of latency or throughput.
  • Monitor efficiency of service in utilizing the resources.
  • Alert the developer or ops team in case of any problem in the system. E.g. Disk filling up, service crash, etc.
  • Troubleshoot and identify the root cause in case of a problem.
  • Health Check APIs: Expose endpoint to inspect the health of the service
  • Log aggregation: Trace out the service activity and writes down the log to a centralized place.
  • Distributed tracing: External request is enriched with request ID and identifies the flow of the request in the system.
  • Exception tracking: Forward exceptions to the exception tracing service, which prevents the duplicate the exception, send a alert to the developers and monitor the resolution of each exception
  • Application metrics: The service maintains the metrics and exposes them to the metrics server.
  • Audit logging: Log user actions.

Health Check API pattern

  • Implementation of health check endpoint
  • Deployment infra to periodically invoke health check endpoint

Log aggregation pattern

  • Elasticsearch: A text-search oriented NoSQL database that’s used as a logging server.
  • Logstash: A log pipeline that aggregates the service logs and writes them to the elastic search.
  • Kibana: A visualization tool for Elastic search.

Distributed Tracing pattern

  • Trace: Each external request and combination of one or more spans.
  • Span: Represents a request to internal services with properties: operation, its attributes, start timestamp, and end timestamp.

Metrics Pattern

  • Infrastructural level metrics: CPU, memory, and disk utilization.
  • Application-level metrics: Number of requests, latency, etc.
  • Name
  • Value
  • Timestamp
  • Dimension

Exception Tracking pattern

  • Logs are intended for single line entries but the exception consists of multiple lines.
  • No mechanism to track the resolution of the exception which occurred in past.
  • No mechanism to avoid duplicate errors which are confusing most of the time.

Audit logging pattern

  • A user who performed the operation
  • An operation which is being performed.
  • A business entity on which operation is performed.
  • Add audit logging code the business logic
  • Use aspect-oriented programming
  • Event sourcing



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ajay Yadav

Ajay Yadav

Believer of Distributed Systems