Senior Site Reliability Engineer (SRE)

Fable

3 days ago

Remote

Canada

$130,000,150,000 - $130,000,150,000 USD yearly

Description

About Fable

Global enterprises work with Fable to make products more accessible for over one billion people who live with disabilities. Our customers include global leaders like Walmart, Slack, and Shopify. Fable was featured on the Forbes Accessibility 100 list in 2025, awarded Fast Company’s Most Innovative Companies in Design, and has received accolades from global entities like the World Summit Awards and the UN-endorsed Zero Project.

About the role

As a Senior Site Reliability Engineer at Fable, you will play a critical role in ensuring the reliability, scalability, and efficiency of our platform as we continue to grow.

Fable’s products support organizations in building more accessible digital experiences, and the reliability of our infrastructure is essential to delivering that impact. You will work across our platform and product systems to ensure they are stable, performant, and cost-efficient, while enabling teams to move quickly and safely.

As AI-powered capabilities increasingly become part of modern product experiences, you will also help ensure Fable’s infrastructure is ready to support AI workloads—balancing reliability, performance, and cost while enabling teams to safely experiment and scale new capabilities.

Reporting to the Director of Technical Operations, this role works closely with teams across Engineering and Product. It is ideal for someone who enjoys hands-on technical work while taking ownership of system health, tooling, and operational excellence, and who is excited to help shape Fable’s approach to infrastructure, reliability, and platform engineering over time.

Responsibilities

Reliability, Infrastructure & Platform

Design, build, and maintain reliable, scalable, and secure infrastructure for Fable’s product services

Improve system observability, monitoring, and alerting to ensure high availability and fast incident response

Contribute to and evolve SRE practices, including SLIs/SLOs, incident management, and postmortems

Support and improve CI/CD pipelines and deployment processes

Identify and reduce operational complexity across systems and tooling

Work across infrastructure and application layers to diagnose and resolve reliability and performance issues, including making targeted improvements to application code when needed

Support infrastructure and platform capabilities required for AI/ML-powered features, including scaling, performance, and reliability considerations

Cost Efficiency & Performance

Monitor and optimize infrastructure costs across cloud environments

Contribute to capacity planning and cost forecasting for infrastructure and services

Identify opportunities to improve performance and efficiency at the system level

Evaluate and optimize the cost and performance of compute-intensive workloads (e.g., AI/ML services), ensuring efficient resource usage and scalability

Vendor & Tooling Ownership

Work with third-party vendors and tools that support Fable’s infrastructure and operations

Help evaluate, select, and manage tools and services to support platform reliability and scalability

Support vendor-related troubleshooting and ongoing service improvements

Cross-functional Collaboration

Partner with Engineering teams to improve reliability, performance, and operational readiness of new features

Partner with application engineering teams to improve service architecture, performance, and observability, and help define best practices for building reliable, scalable systems

Act as a point of support and escalation for production issues

Collaborate across teams to manage dependencies and ensure smooth system operations

Team & Practice Development

Contribute to building strong SRE and operational practices across the organization

Share knowledge through documentation, pairing, and technical discussions

Help onboard and support more junior team members as the team grows

Contribute to improving ways of working within the team and across Engineering

Requirements

Key qualifications and assets

5–8+ years of experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, or Platform Engineering

Strong experience with cloud infrastructure (AWS, GCP, or Azure)

Experience building internal platforms, tooling, or shared services that improve developer productivity and system reliability

Experience designing systems that bridge infrastructure and application layers

Ability to work across the stack: comfortable reading, debugging, and making changes to application code (e.g., backend services, APIs) when needed to improve reliability, performance, or observability

Experience with at least one backend programming language (e.g., Node.js, Python, Go, Java)

Strong experience with monitoring, observability, and alerting tools (e.g., Datadog, Prometheus, Grafana)

Solid understanding of CI/CD systems and modern deployment practices

Experience managing infrastructure as code (e.g., Terraform, CloudFormation)

Experience optimizing system performance and infrastructure costs

Familiarity with security and compliance considerations in cloud environments

Experience working with third-party vendors and infrastructure tools

Familiarity with infrastructure considerations for AI/ML workloads (e.g., high-compute services, data pipelines, or third-party AI platforms) is a strong asset

Curiosity about emerging technologies and their impact on infrastructure, reliability, and cost at scale

Strong problem-solving skills and ability to navigate complex systems

Excellent collaboration and communication skills

Nice to have

Experience contributing to platform engineering initiatives (e.g., internal developer platforms, self-serve infrastructure)

Experience improving developer experience (DX)

Experience with SLIs/SLOs and reliability engineering practices

Experience mentoring or supporting other engineers

Our values

To lead, listen first
You amplify voices that are less often heard and create space for those voices to grow. The quality of an idea doesn't correlate with the loudness of someone's voice.

The brain is a muscle
If you're going to do something, you will do it well. Practice often and rest when needed. Give your mind what it needs to thrive.

Unlearn to learn
What did we learn growing up, and what do we need to unlearn? It's essential to understanding our personal bias and position so that we can grow.

Benefits

What’s in it for you?
At Fable, you’ll join a collaborative and mission-driven environment where you’ll work with people who care deeply about building a more inclusive digital world. We offer benefits such as stock options, career growth opportunities, professional development support, health and dental coverage, and more.

Accessibility accommodations

Fable is an inclusive workplace. If you are facing any accessibility requirements or concerns regarding the hiring process or employment with us, please fill out this form or email us at jobs@makeitfable.com and include the subject line “Accessibility accommodation for Senior Product Manager job application.”

Pay range

$130,000 – $150,000 The salary band is designed to reflect the range of skills and experience needed for the position and is subject to change. The final salary is based on relevant skills, experience, and internal equity. This posting reflects an existing vacancy. Artificial intelligence (AI) tools may be used to support part of the recruitment and selection process. However, all hiring decisions are made by our hiring managers.

Apply now

Senior Site Reliability Engineer (SRE)

More jobs

Growth Marketing Manager, Conversion & Experimentation

Greenlight Financial Technology

Performance Marketing and Digital Media Manager

Piping Rock