Site Reliability Engineer

Site Reliability Engineer

Arbetsbeskrivning

Site Reliability Engineers are responsible for ensuring that services are available, the underlying infrastructure is properly functioning, and other internal tools, processes, and systems are working as expected. An essential responsibility also includes monitoring critical applications and related services to ensure availability during critical business hours

Main responsibilities, activities and duties

Thoroughly analyse assigned systems and supported business to understand design and functionality in relation to stakeholder needs.

Identify, establish, and uphold appropriate system Service Level Objectives (SLOs) together with the team and implement the use of error budgets along with policies/consequences for deviating from them.

Ensure observability and monitoring of relevant systems. Provide guidelines and educate the product teams on observability practices/standards around metrics, logs and traces.

Proactively seek out system weaknesses and find ways to fix them before they cause production issues using monitoring data, watching trends, and using Chaos Engineering.

Build automated solutions and tools to help debug and resolve problems in production and prevent them from reoccurring.
Lead blameless post-mortems for incidents together with different product teams and vendors. Be part of on-call rotations to ensure SLOs are met, with the goal of eliminating the need for support outside of office hours.

Education and certification
Academic degree in systems development, or equivalent knowledge and skills acquired through work experience and continuing professional education.

Knowledge and experience

- Senior level expertise in: - Software development.
- Database management systems and SQL.
- Containerization with OpenShift/Kubernetes.
- Integration though KAFKA and message queues.
- Java
- Cloud (Azure/AWS)



- Proven experience in developing production-grade, performant, scalable and durable applications.
- Experience at all levels of the technology stack, i.e. Infrastructure, Database, API, components and front-end.
- Hands on experience of managing complex, high-volume applications/components in production critical environments.
- Experience of performance tuning techniques, stability patterns and scalability approaches.
- Experience of fact-based, data-driven problem solving and communication.


Other qualifications

- Excellent analytical skills.
- Good leadership qualities.
- Excellent in planning and working in a very structured way.


Proven ability to independently capture and share information through formal written
documentation in English and local language.
Fluent in English and Swedish, both spoken and written.
Workplace : HQ Solna
Process oriented mind-set and a strong ability to follow methods to secure cross functional efficiency and collaboration.
Previous work experience of Site Reliability Engineering is meritorious.

Sammanfattning

  • Arbetsplats: weITglobal
  • 1 plats
  • 6 månader eller längre
  • Heltid
  • Fast månads- vecko- eller timlön
  • Publicerat: 26 augusti 2022
  • Ansök senast: 2 september 2022

Liknande jobb


25 november 2024

AI Team Lead

AI Team Lead

25 november 2024

Data Engineer

25 november 2024