Ad Hoc LLC is a technology company that empowers organizations to deliver scalable, impactful digital services. The Staff Software Engineer - Full Stack/SRE will lead and monitor project scope, schedule, and delivery requirements while collaborating with stakeholders to improve software engineering processes and practices.
Responsibilities:
- Plans and executes on roadmaps for new projects without explicit guidance and direction from technical supervisors
- Actively participates in conversations and planning sessions with partners and key stakeholders
- Periodically travels to work with and present to clients, partners, and stakeholders
- Elaborates on and evolves complex and ambiguous products to uncover constraints and new opportunities
- Reduces ambiguity in the systems they work with, including adding documentation, refactoring, and automated testing
- Effectively communicates on existing systems, design decisions, past performance, and a major history of the projects that they’ve been part of for bid-writing, tech demos, and other potentially client-facing communications
- Participates in technical depth interviews with new candidates
- Presents on technical topics effectively, articulating implementation complexity and other costs to inform business decisions
- Troubleshoot and Resolve Production Issues: Diagnose and fix performance bottlenecks, errors, and other issues within the va.gov application (primarily a Ruby on Rails monolith, including Sidekiq background jobs, but familiarity with similar frameworks is valuable)
- Observability & Monitoring: Utilize DataDog (and potentially Dynatrace) to monitor application performance, identify anomalies, and proactively address potential problems. Develop and maintain relevant dashboards and alerts
- Incident Response and On-Call Rotation ("The Watch"): Participate in our on-call rotation approximately once per month. Unlike traditional pager-driven on-call, "The Watch" involves reviewing the previous day's alerts and ensuring no silent failures occurred (such as background jobs exhausting without an alternate submission path). During your on-call week, expect to work 2-4 hours each day on the weekend to maintain system reliability
- Code Contributions: Write and review code to improve observability and fix bugs (Ruby on Rails), implement improvements, and maintain internal tools (JavaScript/SvelteKit, and Python)
- Consulting & Collaboration: Work closely with other engineering teams to provide guidance on best practices for observability, reliability, and performance. Communicate technical issues clearly to both technical and non-technical audiences
- Process Improvement: Identify and implement improvements to our monitoring, alerting, and incident response processes. Contribute to documentation and runbooks
- Maintain Internal Tools: Contribute to the development and maintenance of a small SvelteKit application used for tracking team metrics and success
Requirements:
- Bachelor's degree and 9+ years of engineering experience or Site Reliability Engineer. Relevant years of experience may be substituted for education
- 3+ years of experience with backend web application development in a production environment. Strong preference for Ruby on Rails experience, but candidates with demonstrable experience in other dynamic languages (e.g., Python/Django/Flask, Node.js/Express, PHP/Laravel) or compiled languages with web frameworks (e.g., Java/Spring, C#/.NET) will be considered
- Experience with Sidekiq or other background job processing framework. If not Sidekiq, experience must be with a comparable system in their chosen language/framework (e.g., Celery for Python)
- Proven experience with application performance monitoring (APM) tools, specifically DataDog and/or Dynatrace. Ability to interpret metrics and identify root causes of performance issues
- Demonstrated experience in incident response and troubleshooting complex production issues
- Experience with at least one modern JavaScript framework (React, Angular, Vue, Svelte, etc.)
- Excellent communication, collaboration, and consulting skills
- Ability to work effectively in a fast-paced, dynamic environment
- Experience working within an Agile environment
- Experience with vets-api
- Prior experience working within the VA/OCTO environment or any large government software deployment that integrates with multiple legacy services
- Experience with Python for scripting, API interactions, and ETL/data engineering tasks
- General understanding of DevOps concepts (containerization, virtualization, networking)
- Familiarity with GitHub Actions
- Experience with the U.S. Web Design System (USWDS)