Website LinkedIn

Cloud DevOps Engineer

United States of America

Contract

3 weeks ago

No H1B

Key skills

Splunk Architecture & AdministrationSplunk HTTP Event Collector (HEC)Splunk DB ConnectAWS Infrastructure & ServicesTerraformCloudFormationAnsiblePuppetChefPython scriptingBash scriptingCI/CD pipeline integrationGitMonitoring with Splunk and AWS CloudWatchSecurity and Compliance PCI-DSSSecurity and Compliance HIPAASecurity and Compliance SOC 2Encryption (SSL/TLS)AWS IAMAWS Secrets ManagerHashiCorp VaultCribl StreamCribl EdgeIncident Management (ITIL)ServiceNowJira Service ManagementRemedyAgile Methodologies ScrumAgile Methodologies KanbanJava Fullstack EngineeringDockerKubernetesRed Hat OpenShift Container PlatformCloud-native ArchitectureMicroservices architectureServerless architectureRestAPIAPI gateway (MuleSoft)Continuous Integration and Continuous Delivery (CI/CD)Automated testingAutomated code quality scanningAutomated DeploymentsLinux system administrationJavaScriptPythonJavaSQLBashReactAngularJSAWSJenkinsGitHub ActionsGitLab CIServerlessVaultOpenShiftEC2S3IAMCloudWatchAPI GatewaySAMLIdentity ManagementSplunkREST APIGitHubGitLabGitOpsVersion ControlJiraAgileScrumKanbanCI/CDLeadershipProject ManagementCommunicationProblem SolvingCollaborationOWASPCloud SecurityWAF

About this role

Stefanini Group is a global provider of outsourcing and IT digital consulting services, and they are seeking a Cloud DevOps Engineer. In this role, you will lead the development teams in cloud strategy integration and execution, focusing on creating scalable and resilient cloud solutions that meet customer needs.

Responsibilities:

Engage with architects, developers and technical support teams to achieve success in deployment of Cloud application services within a managed service environment
Work with business partners, architects, and other groups to identify technical and functional needs of systems, and determine priority of needs
Collaborate with performing teams to deliver new capabilities in business applications and/or remediate issues
Analyze, define and document requirements for data, workflow, and logical processes
Analyze and translate business, information and technical requirements to create patterns for solutions that integrate across applications, systems and platforms to achieve business objectives
Assess foundational services, integration services, cloud operations and management capabilities
Champion, communicate and rationalize approaches with business leaders, organization management and within teams to develop structured outcome of proof of fit/proof of concept
Establish the direction for application development approaches, including tools, process and frameworks
Support preparation of documentation including application designs, Assessments, Security Management, Implementation Plans and post implementation documentation
Lead by example, demonstrating high performance in the areas of customer satisfaction, collaboration, teamwork, and reliability
Participate in the establishment of local standards, patterns, and practices for cloud integration, while providing input and influence to broader organizational standards and initiatives
Spearhead use of innovative new technology and best practices for new product and solution development initiatives
Participate in IT strategy development, including environmental analysis, opportunity identification, business cases, business innovation, and portfolio development. Responsible for the planning and engineering of an organization's systems infrastructure
Includes the implementation and design of hardware and software
Monitors the performance of systems. Typically requires a bachelor's degree in area of specialty and 6-8 years of experience in the field or in a related area
Familiar with concepts, practices, and procedures within a particular field
Relies on extensive experience and judgment to plan and accomplish goals
Performs a variety of complicated tasks
Works under general supervision
Leads and directs the work of others
A wide degree of creativity and latitude is expected
Typically reports to a manager or head of a unit/department

Requirements:

Typically requires a bachelor's degree in area of specialty and 6-8 years of experience in the field or in a related area
Familiar with concepts, practices, and procedures within a particular field
Relies on extensive experience and judgment to plan and accomplish goals
Performs a variety of complicated tasks
Works under general supervision
Leads and directs the work of others
A wide degree of creativity and latitude is expected
Typically reports to a manager or head of a unit/department
Splunk Architecture & Administration
Design and maintain distributed Splunk deployments (search heads, indexers, forwarders, deployers)
Manage indexer clustering and search head clustering for high availability
Configure data inputs, parsing, and index management
Implement role-based access control (RBAC) and authentication integration
Performance tuning and capacity planning
Data Onboarding: Design and implement data onboarding strategies for diverse data sources
Create and maintain props.conf and transforms.conf for data parsing and routing
Develop source type definitions and field extractions
Configure input specifications and monitor data quality post-onboarding
Establish data retention policies and index lifecycle management
Splunk HTTP Event Collector (HEC): Configure and manage HEC endpoints for REST API-based data ingestion
Implement HEC tokens with appropriate permissions and index routing
Troubleshoot HEC connectivity, authentication, and data formatting issues
Scale HEC deployments for high-volume event ingestion
Integrate cloud-native applications and serverless functions with HEC
Splunk DB Connect: Install, configure, and maintain DB Connect app across search heads
Create database connections and manage JDBC drivers for various database types
Design and schedule database inputs (rising column, batch, and tail inputs)
Optimize SQL queries for performance and minimize database load
Configure database identity management and credential security
Troubleshoot connection issues, query timeouts, and data ingestion gaps
AWS Infrastructure & Services: Deploy and manage EC2 instances for Splunk components with proper sizing
Configure VPCs, security groups, NACLs, and networking for secure Splunk communication
Implement EBS storage optimization and snapshot strategies for Splunk data
Leverage S3 for SmartStore architecture and backup solutions
Use AWS Systems Manager, CloudWatch, and Auto Scaling for monitoring and automation
Infrastructure as Code (IaC) & Automation: Terraform or CloudFormation for provisioning Splunk infrastructure
Ansible, Puppet, or Chef for Splunk configuration management
Python/Bash scripting for custom automation tasks
CI/CD pipeline integration (Jenkins, GitLab CI, GitHub Actions)
Version control with Git for infrastructure and configuration code
Monitoring, Logging & Troubleshooting: Create Splunk monitoring dashboards and alerts for platform health
Implement log forwarding strategies using universal/heavy forwarders
Troubleshoot data ingestion issues, search performance, and cluster health
Integrate AWS CloudWatch metrics with Splunk for unified monitoring
Analyze Splunk internal logs (_internal, _introspection, _audit indexes)
Security & Compliance: Implement encryption in-transit (SSL/TLS) and at-rest for Splunk data
Configure AWS IAM roles and policies following least-privilege principles
Ensure compliance with standards (PCI-DSS, HIPAA, SOC 2) for log data
Implement backup and disaster recovery procedures
Secure API access and credential management (AWS Secrets Manager, HashiCorp Vault)
Cribl Stream & Cribl Edge - Data Pipeline Optimization: Deploy and manage Cribl Stream architecture (Leader nodes, Worker nodes, Worker groups)
Configure data sources and destinations for multi-platform routing (Splunk, S3, other SIEMs)
Design and implement pipelines for data transformation, enrichment, and reduction
Create routes and filters to optimize data flow and reduce ingestion costs
Implement data sampling, aggregation, and redaction for compliance and cost savings
Configure event breakers, parsers, and field extractions within Cribl
Manage Cribl packs for pre-built data optimization solutions
Integrate Cribl Stream with Splunk HEC and S3 for hybrid storage strategies
Monitor pipeline performance and troubleshoot data flow issues
Implement GitOps workflows for Cribl configuration management
Cribl Edge Competencies: Deploy and manage Cribl Edge fleets for distributed edge data collection
Configure Edge nodes as lightweight agents replacing traditional forwarders
Implement centralized management of Edge fleets through Cribl Cloud or Stream Leader
Collect data from edge sources (logs, metrics, Windows events, syslog)
Perform edge-side data processing to reduce bandwidth and central processing load
Configure auto-discovery and dynamic data source management
Manage Edge node updates, configuration versioning, and fleet-wide deployments
Monitor Edge node health and connectivity across distributed environments
Implement edge-to-cloud data routing strategies for hybrid architectures
Incident Management & Service Request Support: Triage and respond to platform incidents following ITIL or similar frameworks
Diagnose and resolve P1/P2 incidents affecting Splunk availability or data ingestion
Perform root cause analysis (RCA) and create post-incident reports
Coordinate with cross-functional teams during major incidents
Implement corrective and preventive actions to reduce incident recurrence
Maintain on-call rotation and provide 24/7 platform support
Service Request Management: Process user access requests (account creation, role assignments, permission changes)
Handle data onboarding requests for new applications and data sources
Fulfill infrastructure change requests (index creation, retention policy updates, capacity expansion)
Coordinate app installations and updates on search heads
Provision and configure new forwarders, HEC tokens, or DB Connect inputs
Create custom dashboards and reports based on user requirements
Ticket Management & Communication: Utilize ticketing systems (ServiceNow, Jira Service Management, Remedy)
Document troubleshooting steps and resolution procedures
Maintain SLA compliance for incident response and service request fulfillment
Communicate effectively with stakeholders on status updates and timelines
Create and maintain knowledge base articles for common issues
Escalate complex issues to vendors (Splunk Support, AWS Support) when necessary
Proactive Support: Conduct health checks and performance reviews
Identify trending issues and implement preventive measures
Provide user training and guidance on Splunk best practices
Participate in change advisory board (CAB) meetings for platform changes
Agile Methodology & Project Collaboration: Participate in sprint planning, daily stand-ups, sprint reviews, and retrospectives
Commit to sprint goals and deliver incremental value within 2-week sprint cycles
Collaborate with Scrum Master to remove impediments and optimize team velocity
Contribute to backlog refinement and story estimation sessions (story points, planning poker)
Demonstrate completed work during sprint reviews and incorporate feedback
Identify process improvements during retrospectives for continuous team enhancement
Work within continuous flow model with WIP (Work in Progress) limits
Manage work items through defined workflow stages (To Do, In Progress, Review, Done)
Prioritize tasks dynamically based on business value and urgency
Monitor cycle time and lead time metrics for process optimization
Participate in Kanban board reviews and workflow refinement
Balance operational support work with project-based initiatives
Write clear, concise user stories with acceptance criteria following 'As a [user], I want [goal], so that [benefit]' format
Break down epics into manageable user stories and technical tasks
Define technical requirements, dependencies, and effort estimates
Update story status, track progress, and document blockers in real-time
Create technical debt and bug stories with appropriate prioritization
Maintain story traceability through completion and closure
Participate in backlog grooming sessions to clarify requirements and priorities
Provide technical feasibility input and effort estimates for proposed features
Communicate constraints, risks, and technical dependencies proactively
Negotiate scope and timelines based on technical complexity and resource availability
Seek clarification on ambiguous requirements before implementation
Provide regular status updates on work progress and potential delivery impacts
Offer alternative technical solutions to meet business objectives
Present completed work demonstrations and gather stakeholder feedback
Utilize project management tools (Jira)
Maintain transparency through accurate story updates and burndown tracking
Participate in capacity planning and release planning activities
Contribute to definition of done (DoD) and team working agreements
Practice iterative development with continuous integration and delivery
Master's Degree
AWS Certification (Certified Developer or Solutions Architect)
6+ years software development experience
2+ years of experience in Agile
2+ years of experience in deploying Red Hat OpenShift Container Platform solutions
2+ years of experience in deploying cloud-based solutions
2+ years of experience in deploying hybrid cloud-based solutions
Master's degree from an accredited college or university with specialization in an Information Technology field, or an equivalent combination of related education and work experience
Minimum eight years of broad technical experience in one or more phases of information technology and management information systems
Experience to include managing highly complex IT efforts and / or operations
Eight years of information technology experience directly related to software design and development, of which 2+ years focused on cloud architecture, design, and implementation
Excellent oral and written communication, presentation, leadership, interpersonal, collaborative/relationship-building, organizational & planning, and analytical & problem solving skills
Demonstrated experience being part of a high velocity DevOps team
Practical articulation of IaC - thorough knowledge and practical experience implementing container management concepts and practices using OCP, Kubernetes, and Docker
Innate experience with and application of Agile principles
Highly proficient in Java Fullstack Engineering, development and Cloud technology platforms: Cloud-native Architecture: Well Architected Framework, 12 Factor App, Cloud Reference Architectures, Cloud Service Models, Microservices architecture, Serverless architecture, Decoupled UI including JavaScript frameworks (AngularJS, React), single page applications, and modern web applications
Cloud Strategy: Business case development, Application assessment and migration planning, Cloud operating model design
Cloud Security: Shared Security Model, Cloud Security Architecture, IAM policies/roles, WAF, OWASP Web/API, SAML, vulnerabilities and compensating controls (CSP, CSRF, XSS, SQLI) etc
Cloud-native Services: Optimization of applications to scalable cloud designs, designing, arch and integration of applications to modern cloud patterns
Containerization (Docker) and container orchestration (Docker Swarm, Kubernetes), Infrastructure as Code
Cloud-native monitoring, logging - ELK, Splunk, CloudWatch, amongst others
Application Integration Services - RestAPI, API gateway ex. MuleSoft
Well-versed in Agile software development practices & ability to contribute to sprint ceremonies such as, Refinement, Planning, Review, Retrospectives
Well-versed in Continuous Integration and Continuous Delivery (CI/CD) practices, automated testing, Automated code quality scanning, and Automated Deployments. ex Deploy XL , Deploy Release, Subversion, Jenkins, Jira, Remedy
Demonstrated experience in mentorship and serving as technical subject matter expert (SME) for a development team
Experience in the Finance industry or experience developing solutions involving financial products is a plus