Job summary:

Title:

Site Reliability Engineer - Hybrid

Location:

Atlanta, GA, USA

Length and terms:

Long term - W2 or C2C

Position created on 03/13/2025 06:45 pm

Job description:

**** Webcam interview; *** Long term project *** Hybrid - need locals only ****

Job Overview:

We are seeking a Site Reliability Engineer (SRE) to play a key role in ensuring the reliability, scalability, and efficiency of our cloud-based infrastructure and applications. This position will focus on AWS cloud operations, CI/CD automation, infrastructure monitoring, and performance optimization for critical applications. The ideal candidate will have strong expertise in DevOps, cloud automation, and site reliability engineering principles to drive operational excellence.

Job Responsibilities:

Manage and optimize data streaming and API components in OpenShift On-Premise and AWS.
Identify and implement performance optimization techniques for APIs and backend services.
Automate testing, delivery, and deployment processes for high-availability production systems.
Develop CI/CD pipelines to deploy application artifacts, including APIs and data processing jobs.
Build integrations between applications running on AWS, On-Premise, and third-party tools like ServiceNow, VersionOne, and Sumo Logic.
Define and implement SLIs/SLOs to enhance application reliability and scalability.
Monitor system health, troubleshoot complex performance issues, and document root cause analysis.
Enhance cloud infrastructure by experimenting with emerging technologies and building prototypes.
Design and develop monitoring and alerting mechanisms for early issue detection and resolution.
Ensure data integrity and security by implementing AWS security services such as IAM, HSM, encryption, and key management.
Analyze AWS cost structures and develop cost optimization strategies.
Work with enterprise security teams to implement solutions addressing vulnerabilities and compliance needs.
Plan and implement backup strategies, disaster recovery, and infrastructure elasticity to support dynamic workloads.
Continuously improve system performance, security, and reliability through collaboration with architecture, infrastructure, and application teams.

Candidates need to have:

Hands-on experience as a Site Reliability Engineer, DevOps Engineer, or related role.
Strong expertise in AWS cloud infrastructure (or other major cloud providers).
Deep understanding of CI/CD pipelines and automation tools such as GitLab, GitHub, Jenkins, Maven, Gradle, and Nexus.
Hands-on experience with Infrastructure as Code (IaC) tools like Terraform, Ansible, OpenShift Cloud Formation, and Shell/Python scripting.
Proficiency in Linux OS, virtualization platforms, networking, load balancers, firewalls, and API tools.
Strong understanding of monitoring, alerting, and logging tools for high-availability applications.
Experience with data streaming technologies and deploying high-availability critical application components.
Ability to troubleshoot and resolve complex system issues in both on-premise and AWS environments.
Strong knowledge of software release management and DevOps best practices.

Preferred Qualifications:

4-6 years of overall experience in DevOps, Cloud Infrastructure, or related fields.
Experience working with big data technologies and cloud-native applications.

Contact the recruiter working on this position:

The recruiter working on this position is Sandeep(Shaji Team) Maraganti
His/her contact number is
His/her contact email is sandeep.maraganti@msysinc.com

Our recruiters will be more than happy to help you to get this contract.