Role : Enterprise Observability & AIOps Architect (App + Infra)
Location: Remote (Dallas, Texas ) Duration: 1 Year (with possible extension)
Role Overview
We are looking for an experienced Enterprise Observability & AIOps Architect to design, modernize, and lead enterprise-scale observability ecosystems spanning applications, infrastructure, cloud platforms, databases, and operational workflows.
The ideal candidate will combine strategic architectural leadership with strong hands-on expertise in modern observability and AIOps platforms, driving operational excellence and AI-driven transformation across large enterprise environments.
Key Responsibilities Enterprise Observability Architecture
Lead enterprise-wide observability assessments across applications, infrastructure, cloud, and databases
Define current-state and target-state architectures
Drive monitoring rationalization and tool consolidation strategies
Establish standards for telemetry, tagging, service identity, alerting, and dashboards
Define scalable operating models aligned with SRE, ITSM, and platform engineering
Application Observability
Architect solutions for:
APM, distributed tracing, logs & metrics, RUM, synthetic monitoring
Define SLI/SLO-driven monitoring strategies
Improve service visibility, dependency mapping, and telemetry quality
Build observability for microservices, APIs, Kubernetes, Azure-native & legacy systems
Infrastructure & Platform Observability
Design observability across cloud, middleware, databases, and batch systems
Analyze alert duplication, routing inefficiencies, and monitoring overlaps
Define event correlation, severity models, enrichment, and ownership frameworks
AIOps & Intelligent Operations
ITSM & Operational Integration
Integrate observability tools with ServiceNow, CMDB, and incident workflows
Define monitoring-to-incident processes and governance frameworks
Establish KPI-driven operational maturity models
Governance & Blueprinting
Develop enterprise standards, onboarding blueprints, and playbooks
Define reusable observability patterns and reference architectures
Establish Day-1 observability models for new services
Required Experience
15+ years in observability, SRE, platform engineering, AIOps, or production operations
Proven experience in enterprise observability transformation and monitoring rationalization
Strong background in hybrid cloud and distributed systems
Experience working with executives, enterprise architects, and platform teams
Deep understanding of incident management and reliability engineering