AWSAzureCloudGoogle Cloud PlatformGrafanaKubernetesNode.jsPrometheusPythonRuby on RailsSQLRubyBashRailsGCPGoogle CloudSentryBlockchainAgileCI/CDCommunicationCollaboration
About this role
Role Overview
Incident ownership: Manage complex incidents from triage through resolution, coordinate cross-functional resources, conduct post-incident reviews, and communicate promptly with stakeholders.
Technical investigation: Analyze logs, traces, metrics, and data queries to diagnose issues across backend services, integrations, payment rails, and custody systems.
Reproduce issues in test environments as needed.
Escalation and collaboration: Partner with Engineering, DevOps, and Product teams to prioritize fixes, develop mitigation plans, and track progress toward resolution.
Knowledge management: Develop and maintain technical runbooks, troubleshooting guides, and customer-facing incident updates.
Coach L1/L2 agents on common issues and their resolutions.
Monitoring and alerting: Define effective alerts, adjust thresholds, and reduce noise to improve incident detection and response times.
Release support: Participate in release reviews, validate deployments in staging and production, and conduct targeted post-release checks to detect regressions early.
Continuous improvement: Lead initiatives to eliminate recurring incidents, automate investigations, and enhance observability and telemetry across the stack.
Customer communication: Deliver clear, timely, and professional updates to customers and stakeholders throughout incident lifecycles, ensuring expectations are managed and outcomes are documented.
Requirements
At least 3 years of experience in technical support, site reliability, or incident response roles within fintech and with common databases.
Comfortable reading logs, using debugging tools, and writing SQL queries.
Hands-on experience with monitoring and logging tools such as Prometheus, Grafana, ELK/EFK, and Sentry, as well as incident management platforms like PagerDuty and Opsgenie.
Problem-solving: Strong analytical skills, with the ability to perform root cause analysis and develop clear remediation plans under time constraints.
Excellent written English, Ukrainian, and Russian skills, with the ability to write concise incident reports and present technical findings to non-technical stakeholders.
Team player with experience collaborating across engineering, product, and operations teams; able to influence priorities and advocate for customer impact.
Remains calm under pressure, accountable, detail-oriented, and committed to continuous learning.
Nice to have: Hands-on experience with container orchestration (Kubernetes), cloud platforms (AWS/GCP/Azure), and CI/CD pipelines.
Familiarity with blockchain node operations, wallet services, or custodial reconciliation processes.
Familiarity with automation scripts (Bash, Python) to accelerate common support tasks.
Previous work in regulated environments with incident reporting and audit traceability requirements.
Experience with payment gateways, blockchain node interactions, custody APIs, or third-party service integrations is a plus.
Tech Stack
AWS
Azure
Cloud
Google Cloud Platform
Grafana
Kubernetes
Node.js
Prometheus
Python
Ruby on Rails
SQL
Benefits
Work remotely as part of an agile, international team that is shaping the future of finance.
Competitive salary range based on individual experience and contribution.
Career progression opportunities to advance into senior SRE, product, or engineering roles.
Impact & ownership. Take a central role in ensuring platform reliability, where accuracy and timeliness are critical.
Collaborative culture. Join a transparent environment that values open dialogue, experimentation, and measurable impact.