Brillio is a rapidly growing digital technology service provider that partners with Fortune 1000 companies to enhance their digital adoption strategies. They are seeking a Senior Observability Engineer with expertise in Elastic Stack to lead the development of observability capabilities for mission-critical applications, focusing on dashboard creation, monitoring solutions, and proactive observability strategies.
Responsibilities:
- Design and implement end-to-end observability solutions using ESS (Elastic Stack)
- Build a centralized observability layer covering all MF applications
- Ensure block-level aggregation with drill-down to: Application-level metrics, APM traces, Logs and events, Service dependencies
- Develop and scale a large backlog of ESS dashboards, including but not limited to: Cluster Health (OCP/K8s), API & APM Dashboards, Service Health & Dependency Monitoring, Pod Status / Restart / Scaling Metrics, HTTP Status Analytics (200/400/500 trends), Transaction Processing Metrics, Infra Metrics (CPU, Memory, Disk, Network), Synthetic Monitoring & Availability
- Build intuitive, drill-down dashboards from MF Block → Service → Application level
- Expand ESS-based: Application Performance Monitoring (APM), Distributed tracing, Real User Monitoring (RUM), Synthetic monitoring
- Enable end-to-end traceability across microservices
- Design and implement smart alerting rules: Move from reactive → proactive detection, Reduce noise, improve signal quality
- Define SLOs, SLIs, and error budgets
- Enhance anomaly detection and trend analysis
- Work closely with: EOT Observability Team, Internal CDLs, Application teams
- Act as ESS Observability SME
- Provide guidance, standards, and best practices
Requirements:
- Strong hands-on experience with ESS (Elastic Stack): Elasticsearch, Logstash, Kibana, Beats / Elastic Agent, Elastic APM
- Proven experience building enterprise-scale observability dashboards in ESS
- Deep understanding of microservices architecture
- Kubernetes / OpenShift (OCP)
- Experience with APM, distributed tracing, logging, metrics correlation
- Ability to design multi-layer observability (infra → platform → app)
- Experience with synthetic monitoring tools integrated with ESS
- Real User Monitoring (RUM)
- Service maps and dependency graphs
- Knowledge of CI/CD observability integration
- Alerting frameworks within Elastic
- Scripting: Python / Shell / Groovy (nice to have)