Site Reliability Engineer

OBS! Ansökningsperioden för denna annonsen har passerat.

Arbetsbeskrivning

What do we expect from you
Extensive knowledge of system administration on Linux environments, preferably working on high throughput and low latency systems
Extensive knowledge of Docker and Kubernetes
Excellent understanding of distributed system design across process and site boundaries
Hands-on experience with service orchestration, management, deployment activities, configuration management and all necessary automation
Strong grasp of process isolation and containerization concepts, being able to apply them when necessary
Good understanding of software development lifecycle, versioning, building, testing, staging and deployment processes with a strong continuous delivery mindset

What will you work on
Building tooling to ease the provisioning and scaling of infrastructure resources
Continuously improve and scale infrastructure components to handle growth
Improve overall systems performance and investigate failures taking part of actively in future improvements discussion
Ensure systems availability, reachability, and maintainability building the necessary instrumentation, tooling, and alarming systems in order to escalate abnormalities
Being influential in monitoring and capacity planning together with the application development teams and in alignment with the business goals

It would be great if you also have
Experience developing kubernetes operators
Experience managing infrastructure on google cloud platform
Experience deploying and scaling apache cassandra, scylladb, mysql, postgresql, redis or memcached