Get AI-powered advice on this job and more exclusive features.
Direct message the job poster from Talent Groups
Senior Site Reliability Engineer (Contract to Hire)
Location: McKinney, TX (Hybrid, 2–3 days onsite)
Must be authorized to work in the U.S.
Overview:
Our client is seeking a Senior Site Reliability Engineer to lead platform reliability and traffic enforcement in a Kubernetes-hosted SASE (Secure Access Service Edge) environment. This role ensures high availability, observability, and fair multi-tenant traffic handling across distributed systems.
Key Responsibilities:
Platform Reliability & Operations
- Own uptime (target: 99.99%) and stability of multi-region Kubernetes environments.
- Architect resilient, scalable infrastructure with proactive capacity planning and automated remediation.
- Lead incident response, root cause analysis, disaster recovery, and change management.
Observability & Monitoring
- Build a full-stack observability pipeline (Prometheus, OpenTelemetry, Grafana, etc.).
- Implement golden signals, tracing, and alerting to drive real-time performance insights.
- Develop automation for issue detection and resolution.
- Integrate and optimize OpenStack-based infrastructure beneath Kubernetes.
- Enforce security compliance, resource efficiency, and FinOps best practices.
Traffic Enforcement & Networking
- Design a Kubernetes-native traffic control layer for per-tenant/session enforcement.
- Implement CRDs, custom controllers, and service mesh (e.g., Istio, Linkerd) for dynamic policy management.
- Operate SDN telemetry agents (Cilium Hubble, WireGuard) and integrate with observability stack.
- Contribute to infrastructure architecture and reliability strategy.
- Mentor team members and promote Kubernetes best practices.
- Partner cross-functionally across engineering, security, and product teams.
Required Skills:
- Kubernetes in production across multi-region architectures.
- Observability tools: Prometheus, OpenTelemetry, Grafana, Jaeger, Loki.
- Programming: Go (preferred), Python/Bash scripting.
- Familiarity with OpenStack (Nova, Neutron, Ceph) and CNI (Cilium preferred).
Preferred Experience:
- Developer platform abstraction on Kubernetes.
- Edge Kubernetes and NFV/SDN background.
- Active participation in the Kubernetes community.
Seniority level
Seniority level
Mid-Senior level
Employment type
Job function
Job function
Information TechnologyIndustries
IT Services and IT Consulting
Referrals increase your chances of interviewing at Talent Groups by 2x
Inferred from the description for this job
Medical insurance
Vision insurance
401(k)
Get notified when a new job is posted.
Sign in to set job alerts for “Site Reliability Engineer” roles.
Carrollton, TX $145,000.00-$175,000.00 4 days ago
Sr Site Reliability Engineer (Prisma Access)
Plano, TX $99,008.00-$134,368.00 1 month ago
Plano, TX $100,949.33-$137,002.66 1 month ago
Richardson, TX $92,700.00-$100,000.00 3 days ago
Plano, TX $76,400.00-$127,850.00 2 weeks ago
Senior Manager of Site Reliability Engineer
Site Reliability Engineer III: Infrastructure Platforms
We're unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr