OBS! Ansökningsperioden för denna annonsen har
passerat.
Arbetsbeskrivning
Site Reliability Engineer
Solna, Sweden
Employees can work remotely
Full-time
Company Description
Snow Software is the global leader in technology intelligence solutions, ensuring the trillions spent on all forms of technology is optimized to drive maximum value. More than 4,000 organizations around the world rely on Snow's platform to provide complete visibility, optimize usage and spend, and minimize regulatory risk. Headquartered in Stockholm, Snow has more local offices and regional support centers than any other software asset and cloud management provider, delivering unparalleled results to our customers and partners.
Job Description
Snow is on a journey to transform it’s market-leading offering to cloud-native microservices delivered as a SaaS service.
This role will suit an experienced SRE who can lead our push into the SaaS space. The SRE will work very tightly with the technology, product and development teams to define the path we take for the next 3-5 years. It means, a lot of freedom and autonomy but also comes with a lot of responsibility.
We are active members of the CNCF end-user community where we participate in the bi-weekly developer experience SIG meetings.
This is a unique opportunity to work with the leading cloud technologies and methodologies as well as being a key player in the definition and implementation of Snows SaaS offering.
WHAT WE DO
We provide our developers with a reliable platform as a product. Our aim is to abstract the complexities of Kubernetes away so that teams can easily create and deploy services into production by just specify the configuration and resources that are required for the application to run. We believe that GitOps is the best way to realise this vision, using tools such as ArgoCD, Terraform, Helm and Backstage. We are not afraid to evaluate new technologies if it can further improve the developer experience; current technologies we are assessing are Cue, Pulumi, and Crossplane.
We also provide our development team with a monitoring stack so that they can effectively monitor metrics and logs from their applications in production. We believe in “You build it, you own it, you run it”. On our roadmap in the near future is to build a framework for creating SLIs, SLOs and error budgets to further improve this process.
OUR CHALLENGE FOR YOU
Lead and drive initiatives aimed at improving the reliability of our services by providing guidance, engineering solutions and improving our processes.
Drive reliability practices across our engineering organization.
Provide improvements and best practices targeting observability and predictability.
Experiment, learn new things and help grow those around you.
Work in short iterations in a lightweight Kanban environment shaped by the team.
To apply for this position you will have a minimum level of 2 - 3 years proficiency in using the following technologies for a commercial software company: Strong Cloud Experience (Azure, AWS, GCP), IaC (Terraform), GitHub, Observability Platforms (Sumologic), helm, Backstage, ArgoCD, CI/CD, Kubernetes.
Qualifications
Some years of experience managing production environments as SRE, DevOps Engineer or similar.
Experience working with SLOs, metrics, incident management in a cloud environment.
A good understanding of running Kubernetes in production.
Comfortable writing infrastructure as code for one of the major cloud providers.
Passion about reliability engineering practices and automation.
Curiosity to learn, explore and collaborate with those around you.
Bonus Points
Experience in writing in one or more of the following languages, GoLang, C# or Python
You have worked on projects migrating monolithic applications to micro-services based architecture.
You have worked with event-driven architectures (We use NATS for our event bus).