Job summary:
Title:
Site Reliability Engineer - Hybrid
Location:
Atlanta, GA, USA
Length and terms:
Long term - W2 or C2C
Position created on 03/13/2025 06:45 pm
Job description:
**** Webcam interview; *** Long term project *** Hybrid - need locals only ****
Job Overview:
We are seeking a Site Reliability Engineer (SRE) to play a key role in ensuring the reliability, scalability, and efficiency of our cloud-based infrastructure and applications. This position will focus on AWS cloud operations, CI/CD automation, infrastructure monitoring, and performance optimization for critical applications. The ideal candidate will have strong expertise in DevOps, cloud automation, and site reliability engineering principles to drive operational excellence.
Job Responsibilities:
- Manage and optimize data streaming and API components in OpenShift On-Premise and AWS.
- Identify and implement performance optimization techniques for APIs and backend services.
- Automate testing, delivery, and deployment processes for high-availability production systems.
- Develop CI/CD pipelines to deploy application artifacts, including APIs and data processing jobs.
- Build integrations between applications running on AWS, On-Premise, and third-party tools like ServiceNow, VersionOne, and Sumo Logic.
- Define and implement SLIs/SLOs to enhance application reliability and scalability.
- Monitor system health, troubleshoot complex performance issues, and document root cause analysis.
- Enhance cloud infrastructure by experimenting with emerging technologies and building prototypes.
- Design and develop monitoring and alerting mechanisms for early issue detection and resolution.
- Ensure data integrity and security by implementing AWS security services such as IAM, HSM, encryption, and key management.
- Analyze AWS cost structures and develop cost optimization strategies.
- Work with enterprise security teams to implement solutions addressing vulnerabilities and compliance needs.
- Plan and implement backup strategies, disaster recovery, and infrastructure elasticity to support dynamic workloads.
- Continuously improve system performance, security, and reliability through collaboration with architecture, infrastructure, and application teams.
Candidates need to have:
- Hands-on experience as a Site Reliability Engineer, DevOps Engineer, or related role.
- Strong expertise in AWS cloud infrastructure (or other major cloud providers).
- Deep understanding of CI/CD pipelines and automation tools such as GitLab, GitHub, Jenkins, Maven, Gradle, and Nexus.
- Hands-on experience with Infrastructure as Code (IaC) tools like Terraform, Ansible, OpenShift Cloud Formation, and Shell/Python scripting.
- Proficiency in Linux OS, virtualization platforms, networking, load balancers, firewalls, and API tools.
- Strong understanding of monitoring, alerting, and logging tools for high-availability applications.
- Experience with data streaming technologies and deploying high-availability critical application components.
- Ability to troubleshoot and resolve complex system issues in both on-premise and AWS environments.
- Strong knowledge of software release management and DevOps best practices.
Preferred Qualifications:
- 4-6 years of overall experience in DevOps, Cloud Infrastructure, or related fields.
- Experience working with big data technologies and cloud-native applications.
Contact the recruiter working on this position:
The recruiter working on this position is Sandeep(Shaji Team) Maraganti
His/her contact number is
His/her contact email is sandeep.maraganti@msysinc.com
Our recruiters will be more than happy to help you to get this contract.