Responsibilities
Monitor the health of your services and work with developers to increase the velocity of changes using built-in support for service monitoring.
Select metrics for SLIs, set SLOs, and track error budgets to mitigate risk for the service.
Use powerful dashboards to aggregate metrics and logs, including golden signals to reduce MTTR and quickly answer questions about service health.
Take ownership of platform-related incident management and resolution, ensuring timely communication and effective problem-solving.
Automate various provisioning and maintenance tasks using scripts and automation tools.
Qualification
3 - 5 year of experience as software engineer or systems administrator and willing to be SRE in the future for Junior level.
Minimum 5 year of experience as SRE for Senior level.
Experience with coding at least one language (Bash, Python, PowerShell, etc.)
Ability to use observability tools such as Datadog, Grafana, ElasticSearch, and Kibana
Ability to use cloud services (AWS, etc.)
Good command in English both spoken and written
Nice to have:
Knowledge of best practices and IT operations in Always-Available and highly-scalable services
Experience with automation CI/CD tools (Github Actions, Jenkins, Ansible, Terraform, etc.)
Experience with containerization, container orchestration, microservices - Docker, Kubernetes, (K8s), Helm
Knowledge of IT service management (ITSM) - Incident management, problem management, change management
Skills
Functions
Full-time
Company
16 active jobs
Singapore
Industry:
Ready to Apply?
Submit your application now and take the next step in your career journey.
Similar Jobs