Job summary:


Title:
Site Reliability Engineer - Hybrid

Location:
Atlanta, GA, USA

Length and terms:
Long term - W2 or C2C


Position created on 03/13/2025 06:45 pm

Job description:


**** Webcam interview;  *** Long term project *** Hybrid - need locals only ****

Job Overview:

We are seeking a Site Reliability Engineer (SRE) to play a key role in ensuring the reliability, scalability, and efficiency of our cloud-based infrastructure and applications. This position will focus on AWS cloud operations, CI/CD automation, infrastructure monitoring, and performance optimization for critical applications. The ideal candidate will have strong expertise in DevOps, cloud automation, and site reliability engineering principles to drive operational excellence.

Job Responsibilities:

  • Manage and optimize data streaming and API components in OpenShift On-Premise and AWS.
  • Identify and implement performance optimization techniques for APIs and backend services.
  • Automate testing, delivery, and deployment processes for high-availability production systems.
  • Develop CI/CD pipelines to deploy application artifacts, including APIs and data processing jobs.
  • Build integrations between applications running on AWS, On-Premise, and third-party tools like ServiceNow, VersionOne, and Sumo Logic.
  • Define and implement SLIs/SLOs to enhance application reliability and scalability.
  • Monitor system health, troubleshoot complex performance issues, and document root cause analysis.
  • Enhance cloud infrastructure by experimenting with emerging technologies and building prototypes.
  • Design and develop monitoring and alerting mechanisms for early issue detection and resolution.
  • Ensure data integrity and security by implementing AWS security services such as IAM, HSM, encryption, and key management.
  • Analyze AWS cost structures and develop cost optimization strategies.
  • Work with enterprise security teams to implement solutions addressing vulnerabilities and compliance needs.
  • Plan and implement backup strategies, disaster recovery, and infrastructure elasticity to support dynamic workloads.
  • Continuously improve system performance, security, and reliability through collaboration with architecture, infrastructure, and application teams.

Candidates need to have:

  • Hands-on experience as a Site Reliability Engineer, DevOps Engineer, or related role.
  • Strong expertise in AWS cloud infrastructure (or other major cloud providers).
  • Deep understanding of CI/CD pipelines and automation tools such as GitLab, GitHub, Jenkins, Maven, Gradle, and Nexus.
  • Hands-on experience with Infrastructure as Code (IaC) tools like Terraform, Ansible, OpenShift Cloud Formation, and Shell/Python scripting.
  • Proficiency in Linux OS, virtualization platforms, networking, load balancers, firewalls, and API tools.
  • Strong understanding of monitoring, alerting, and logging tools for high-availability applications.
  • Experience with data streaming technologies and deploying high-availability critical application components.
  • Ability to troubleshoot and resolve complex system issues in both on-premise and AWS environments.
  • Strong knowledge of software release management and DevOps best practices.

Preferred Qualifications:

  • 4-6 years of overall experience in DevOps, Cloud Infrastructure, or related fields.
  • Experience working with big data technologies and cloud-native applications.

Contact the recruiter working on this position:



The recruiter working on this position is Sandeep(Shaji Team) Maraganti
His/her contact number is
His/her contact email is sandeep.maraganti@msysinc.com

Our recruiters will be more than happy to help you to get this contract.