OBS! Ansökningsperioden för denna annonsen har
passerat.
Arbetsbeskrivning
The world is changing in the way we consume products, from ownership to usership, and DigitalRoute is positioned in the centre of the transition. Because, when enterprises pivot to usage-based business models, they often make an unfortunate discovery. Their systems weren’t built to handle the massive data volumes and complexity that usage-based models generate. This causes them to leak revenue and respond too slowly to customer demand. DigitalRoute solves this by creating a real-time usage data layer for enterprises. Our products transform raw usage data into clear information for billing, in real time and at high scale.
As a Senior Site Reliability Engineer, you'll be responsible for designing, implementing, and maintaining the infrastructure and tools necessary to support our mission-critical systems. You'll collaborate closely with cross-functional teams to optimize our platform for reliability, availability, and performance. If you're passionate about leveraging automation, monitoring, and best practices to build robust and resilient systems, this is the opportunity for you to make a significant impact in a fast-paced environment.
What you'll do
Design, implement, and maintain highly available and scalable infrastructure to support our applications and services.
Develop automation scripts and tools to streamline deployment, monitoring, and incident response processes.
Implement best practices for system reliability, including fault tolerance, disaster recovery, and performance optimization.
Collaborate with development teams to ensure that new features and services are designed with reliability and scalability in mind.
Monitor system performance and proactively identify and address potential issues before they impact production.
Participate in on-call rotations and respond to incidents in a timely and effective manner.
Continuously evaluate and improve our systems and processes to enhance reliability, scalability, and efficiency.
Mentor team members and promote a culture of reliability and excellence within the organization.
What you'll bring
Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
Proven experience as a Site Reliability Engineer or similar role in a production environment.
Strong proficiency in cloud computing platforms such as AWS (preferably), Azure, or GCP.
Experience with containerization technologies such as Docker and Kubernetes.
Proficiency in scripting and automation using languages such as Python, Bash, or PowerShell.
Solid understanding of networking, security, and system administration principles.
Experience with monitoring and logging tools such as Datadog, Prometheus, Grafana, ELK Stack, or similar.
Strong problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
Excellent communication and collaboration skills, with the ability to work effectively in a cross-functional team environment.
Experience with Agile methodologies and DevOps practices is a plus.
If you're passionate about building reliable, scalable, and high-performance systems and thrive in a fast-paced, collaborative environment, we'd love to hear from you.
We apply continuous selection, and the position may be filled before the last application date.
DigitalRoute wants to be part of an inclusive and diverse environment and we are actively looking for qualified candidates irrespective of gender, sexual orientation, ethnicity, disability, or age. You will be part of a global and diverse company where our differences are our strengths.
Apply now! We look forward to you joining us!
Kontaktpersoner på detta företaget
Mikael Bäckström