Job Description:
As a Site Reliability Engineer (SRE) you will play a critical role in ensuring the reliability, scalability, and performance of our production systems. You will collaborate closely with software engineering and operations teams to build and maintain tools for automation, monitoring, and operations. Your expertise will be crucial in designing resilient and scalable architectures, optimizing application performance, and resolving complex technical issues to deliver a seamless user experience.
Responsibilities:
- Design, build, and maintain tools and frameworks for deployment, monitoring, and operations.
- Implement best practices in infrastructure security, scalability, and reliability.
- Collaborate with cross-functional teams to define and achieve Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
- Perform system and application troubleshooting to resolve issues and ensure optimal performance.
- Design and implement automation strategies to streamline operations and reduce manual intervention.
- Participate in on-call rotation and respond to incidents to minimize downtime and impact on users.
- Conduct post-mortem analyses of incidents and implement measures to prevent recurrence.
- youContinuously evaluate and improve our systems and processes to enhance reliability and efficiency.
JOB REQUIREMENTS:
Requirements:
- Bachelor's degree in computer science, Engineering, or a related technical field, or equivalent practical experience.
- Proven experience in a Site Reliability Engineer or similar role, with a focus on designing and implementing scalable systems.
- Strong proficiency in programming languages, scripting and automation (Java, ReactJS, etc.).
- Experience with cloud platforms such as AWS, Azure, or GCP, and container orchestration tools like Kubernetes.
- Deep understanding of networking, system administration, Windows, and Linux/Unix-based environments.
- Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
- Strong communication skills and the ability to work effectively in a collaborative team environment and to stakeholders
Preferred Qualifications:
- Master's degree in computer science, Engineering, or a related technical field.
- Certification in cloud platforms or DevOps methodologies (e.g., AWS Certified DevOps Engineer, Google Professional Cloud DevOps Engineer).
- Experience with CI/CD pipelines and configuration management tools (e.g., Ansible).
- Knowledge of monitoring and logging tools such as Prometheus, Grafana, ELK stack, etc. 5. Experience with Agile/Scrum methodologies and practices.