Manager, SRE
- Job ID
- 60750
Collaboration & Communication Excellence: They possess strong communication and influencing skills, effectively collaborating with senior leadership, engineers, and cross-functional teams. They clearly articulate complex technical concepts and successfully lead geographically distributed teams while fostering strong partnerships across engineering, product, operations, security, and business functions.
Team Development & Cultural Stewardship: The SRE Manager promotes an inclusive and psychologically safe environment, mentors team members, encourages innovation, and builds a culture of continuous learning and blameless improvement.
Technical Acumen & Innovation Driver: They demonstrate deep technical curiosity, attention to detail, and adaptability, guiding teams through evolving technologies and challenges.
Accountability & Ownership: They take full responsibility for the reliability, performance, and health of critical applications while driving measurable outcomes and team accountability.
SRE Strategy & Best Practices: Expertise in SLOs/SLIs, error budgets, incident response, and reliability improvements.
Architecture & Modern Platform Engineering: Cloud-native, microservices, Kubernetes, hybrid cloud.
Automation, CI/CD & Observability: IaC, CI/CD, monitoring, AIOps.
Infrastructure & Security: Cloud security, networking, databases, disaster recovery.
Define SRE Strategy & Vision: Develop and drive the long-term SRE strategy and roadmap for the Marketing and Sales technology portfolio, aligning reliability goals with business objectives. Establish enterprise SRE standards, including SLOs, SLIs, and error budgets, and translate technical metrics into meaningful business health indicators.
Lead the "Paved Road" Initiative & Platform Engineering: Build and enhance shared SRE platforms, tools, and services that enable secure, reliable deployments. Promote automation, self-service capabilities, and an automation-first culture to reduce operational toil and improve efficiency.
Drive Observability, AIOps & Performance Strategy: Lead observability initiatives with robust monitoring, logging, and alerting while advancing AIOps capabilities using AI/ML for anomaly detection, predictive insights, and proactive risk mitigation.
Architectural Leadership & Collaboration for Reliability: Partner with engineering and architecture teams to design scalable, secure, and resilient systems that follow SRE best practices.
Oversee Incident Management & Resilience: Lead incident response, promote blameless post-mortems, improve MTTR, drive resilience testing, and oversee 24x7 first-responder operations.
Cross-Functional Engagement & Governance: Collaborate across teams to embed SRE practices, ensure compliance, and lead the SRE Community of Practice.
Reporting, Vendor & Budget Management: Deliver executive reporting on system health, manage vendor partnerships, and optimize SRE budgets and cloud spend.
- Bachelor's degree in Computer Science, Engineering, or a related technical field (Master's degree preferred).
- Progressive Leadership: 10+ years of progressive experience in Site Reliability Engineering, including a minimum of 5+ years of proven leadership experience managing and mentoring SRE teams.
- Cloud Expertise: Extensive experience designing, deploying, and operating mid to large-scale public cloud environments. GCP expertise is a must-have, with additional experience in AWS or Azure being a significant advantage.
- Infrastructure as Code (IaC): Demonstrated expertise and hands-on experience in implementing and driving Infrastructure as Code (IaC) strategies, particularly with Terraform Enterprise.
- SRE Frameworks & Observability: Strong track record of defining and implementing comprehensive SRE frameworks, including Service Level Objectives, Service Level Indicator, and Error Budgets. Proven experience in developing and implementing robust observability solutions (monitoring, logging, tracing) using tools such as Dynatrace, Grafana, Prometheus, and native cloud monitoring services.
- Modern Application Architectures: Experience with microservices architectures, Spring Boot, and both NoSQL and SQL datastores.
- Enterprise CMS (Plus): Familiarity with Adobe Experience Manager (AEM) or similar enterprise Content Management System (CMS) platforms is a plus.
-
Built on one bold idea and the passion to define sustainable transportation for generations to come, Ford is a story about people with a vision that’s still being written.
What We Do -
Ford’s culture fuels the kind of momentum where ideas flow, progress is unstoppable, and our people keep redefining what it means to innovate.
Our People and Culture -
At Ford, your work matters, your life matters and we’re here to back the whole you—from growth to well-being—so you show up ready to realize your full potential.
Your Benefits
Jobs For You.
Explore roles tailored to your interests, based on your preferences and experience.
-
Staff Embedded Platform Engineer
- Long Beach, California
-
Ford Pro Product Specialist, EDM
- Szentendre, Hungary
-
Market Executor - Service Operations
- Chennai, India
-
Quality Analytics Analyst
- Dearborn, Michigan