Octal Philippines Inc. is looking for a skilled Site Reliability Engineer (SRE) to join our dynamic team. The SRE will be responsible for ensuring that our systems are reliable, scalable, and efficient. You will play a critical role in maintaining the uptime of our services, improving system performance, and automating processes to enhance productivity. The ideal candidate is a passionate technologist who thrives in a fast-paced environment and enjoys tackling complex challenges.
Responsibilities:
- Monitor and maintain the reliability and availability of production systems
- Implement automation to reduce operational toil and improve system reliability
- Identify and resolve performance issues and outages
- Collaborate with development teams to design scalable and robust systems
- Create and maintain SRE documentation and runbooks
- Participate in on-call rotation and incident response activities
- Continuously improve tooling and processes to enhance the efficiency of operations
- Bachelor's degree in Computer Science, Engineering, or related field
- At least 3 years of experience in a Site Reliability Engineering or related role
- Strong experience with cloud platforms such as AWS, Azure, or GCP
- Proficiency in scripting and programming languages (e.g., Python, Go, Bash)
- Experience with containerization and orchestration technologies like Docker and Kubernetes
- Strong understanding of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack)
- Excellent troubleshooting and analytical skills
- Ability to work collaboratively in a team-oriented, fast-paced environment
Responsibilities:
• Bachelor's degree in computer science, Engineering, or a related technical field, or equivalent practical experience.
• Proven experience in a Site Reliability Engineer or similar role, with a focus on designing and implementing scalable systems.
• Strong proficiency in programming languages, scripting and automation (Java, ReactJS, etc.).
• Experience with cloud platforms such as AWS, Azure, or GCP, and container orchestration tools like Kubernetes.
• Deep understanding of networking, system administration, Windows, and Linux/Unix-based environments.
• Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
• Strong communication skills and the ability to work effectively in a collaborative team environment and to stakeholders
Communication Allowance, Health & Life Insurance & Others