Unlimited Job Postings Subscription - $99/yr!

Job Details

Sr. Site Reliability Engineer

  2025-07-14     Talent Groups     Mckinney,TX  
Description:

Get AI-powered advice on this job and more exclusive features.

Direct message the job poster from Talent Groups

Senior Site Reliability Engineer (Contract to Hire)

Location: McKinney, TX (Hybrid, 2–3 days onsite)

Must be authorized to work in the U.S.

Overview:

Our client is seeking a Senior Site Reliability Engineer to lead platform reliability and traffic enforcement in a Kubernetes-hosted SASE (Secure Access Service Edge) environment. This role ensures high availability, observability, and fair multi-tenant traffic handling across distributed systems.

Key Responsibilities:

Platform Reliability & Operations

  • Own uptime (target: 99.99%) and stability of multi-region Kubernetes environments.
  • Architect resilient, scalable infrastructure with proactive capacity planning and automated remediation.
  • Lead incident response, root cause analysis, disaster recovery, and change management.

Observability & Monitoring

  • Build a full-stack observability pipeline (Prometheus, OpenTelemetry, Grafana, etc.).
  • Implement golden signals, tracing, and alerting to drive real-time performance insights.
  • Develop automation for issue detection and resolution.
  • Integrate and optimize OpenStack-based infrastructure beneath Kubernetes.
  • Enforce security compliance, resource efficiency, and FinOps best practices.

Traffic Enforcement & Networking

  • Design a Kubernetes-native traffic control layer for per-tenant/session enforcement.
  • Implement CRDs, custom controllers, and service mesh (e.g., Istio, Linkerd) for dynamic policy management.
  • Operate SDN telemetry agents (Cilium Hubble, WireGuard) and integrate with observability stack.
  • Contribute to infrastructure architecture and reliability strategy.
  • Mentor team members and promote Kubernetes best practices.
  • Partner cross-functionally across engineering, security, and product teams.

Required Skills:

  • Kubernetes in production across multi-region architectures.
  • Observability tools: Prometheus, OpenTelemetry, Grafana, Jaeger, Loki.
  • Programming: Go (preferred), Python/Bash scripting.
  • Familiarity with OpenStack (Nova, Neutron, Ceph) and CNI (Cilium preferred).

Preferred Experience:

  • Developer platform abstraction on Kubernetes.
  • Edge Kubernetes and NFV/SDN background.
  • Active participation in the Kubernetes community.

Seniority level

  • Seniority level

    Mid-Senior level

Employment type

  • Employment type

    Contract

Job function

  • Job function

    Information Technology
  • Industries

    IT Services and IT Consulting

Referrals increase your chances of interviewing at Talent Groups by 2x

Inferred from the description for this job

Medical insurance

Vision insurance

401(k)

Get notified when a new job is posted.

Sign in to set job alerts for “Site Reliability Engineer” roles.

Carrollton, TX $145,000.00-$175,000.00 4 days ago

Sr Site Reliability Engineer (Prisma Access)

Plano, TX $99,008.00-$134,368.00 1 month ago

Plano, TX $100,949.33-$137,002.66 1 month ago

Richardson, TX $92,700.00-$100,000.00 3 days ago

Plano, TX $76,400.00-$127,850.00 2 weeks ago

Senior Manager of Site Reliability Engineer

Site Reliability Engineer III: Infrastructure Platforms

We're unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr


Apply for this Job

Please use the APPLY HERE link below to view additional details and application instructions.

Apply Here

Back to Search