Site Reliability Engineer

Senior Site Reliability Engineer

This is US

Matrix42 is a leading provider of digital workspace management solutions, empowering businesses to streamline IT operations, enhance user experiences, and drive digital transformation. As we expand our offerings, we are building a dedicated AI Innovations Stream to incorporate advanced AI features into our products.

POSITION OVERVIEW

We are seeking an experienced and proactive Site Reliability Engineer (SRE) to join our AI Innovations Stream. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our AI-driven features and services across Matrix42’s product portfolio. You will work closely with development teams, system architects, and operations to build and maintain robust systems that support our AI innovations, focusing on automation, monitoring, and continuous improvement to deliver a seamless user experience.

YOUR MISSION

System Reliability & Performance: Ensure the reliability, availability, and performance of AI-powered features and services across our products. Proactively monitor and address system issues to prevent downtime and improve performance.
Infrastructure Automation: Develop and maintain automation tools and scripts to manage infrastructure, deployments, and operations, using technologies such as Terraform, Ansible, or similar.
Monitoring & Incident Management: Implement and maintain comprehensive monitoring and alerting systems. Lead incident response efforts, including root cause analysis and post-mortem documentation.
Colaboration: Work closely with development and operations teams to design, build, and maintain scalable and resilient systems that support AI features and integrations.
Continuous Improvement: Identify and implement improvements to existing systems, processes, and practices to enhance reliability, scalability, and performance.
Security & Compliance: Ensure that all systems and processes comply with security best practices and regulatory requirements, particularly in the context of AI and cloud-hosted services.
CI/CD Pipeline Management: Maintain and optimize continuous integration and continuous deployment (CI/CD) pipelines to ensure smooth and efficient deployment of AI features.
Capacity Planning & Optimization: Conduct capacity planning and optimize resource utilization to ensure that our systems can scale effectively as demand grows.

MUST HAVE

- Education: Bachelor’s degree in Computer Science, Software Engineering, or a related field.
- Experience: 3-5 years of experience in site reliability engineering, DevOps, or a similar role, with a strong focus on cloud-hosted environments, preferably on Microsoft Azure.
- Automation Skills: Extensive experience with infrastructure as code (IaC) tools such as Terraform, Ansible, or equivalent, and a strong understanding of automation principles.
- Cloud Expertise: Deep knowledge of Microsoft Azure, including experience with cloud services, networking, storage, and security.
- Monitoring & Incident Management: Proven experience in setting up and managing monitoring, logging, and alerting tools, as well as leading incident response efforts.
- Collaboration & Communication: Strong collaboration and communication skills, with the ability to work effectively in cross-functional teams and influence stakeholders.
- Problem-Solving Skills: Strong analytical and problem-solving abilities, with a proactive approach to identifying and addressing potential issues before they impact the user experience.
NICE TO HAVE
- AI Systems Experience: Familiarity with the unique challenges of deploying and maintaining AI systems, including model deployment, monitoring, and scalability.
- Agile Methodology: Experience working in an Agile/Scrum environment, with a focus on continuous improvement and iterative development.
- Security Best Practices: Knowledge of security best practices in cloud environments, particularly in relation to AI and sensitive data.

FOR YOU

We could tell you all about the flexible working hours, 25 days of vacation or that remote work is part of everyday life. But in our eyes, that's not a benefit, it's standard.

Here are some of our benefit offers:

Learning & Development Opportunities
- Up to 6 additional days off for personal or professional development

- Log into our online platforms to expand your knowledge or improve your language skills.

Smart experience. You can choose from numerous recreational activities from horseback riding, archery, to ballooning or paragliding.
Wellbeing: Fitness subscriptions, Medical and dentistry subscriptions, Digital health, and wellness solutions (mental health, nutrition, coaching, parenting)
And many more... ask us about it!

JOIN US

Send us your application, including your salary requirements and earliest possible starting date, directly through our online portal via the "APPLY NOW" button. If you have any questions, please do not hesitate to contact Diane Djongoue.

We ask for your understanding that MATRIX42 can only accept applications online via the applicant portal in connection with our applicant management system due to the currently valid EU data protection regulations.

Zurück

Jetzt bewerben!

Datenschutz Impressum