As a Senior AI Reliability Engineer, you will play a critical role in ensuring the operational excellence, scalability, and performance of AI-powered platforms and services at T-Mobile. This role requires strong SRE fundamentals, experience in managing LLM-based services and APIs, and the ability to drive observability and reliability for Gen. AI systems across cloud environments. We pride ourselves on encouraging a culture of innovation, advocating for agile methodologies, and promoting transparency in all that we do. Join us in embodying the spirit of the 'Un-carrier' and make a tangible impact! Our team is dynamic where no day is the same, and we are diverse and inclusive passionate about grow. Job Responsibilities:Implement observability tools, dashboards, and SLO frameworks for LLM-based services and inference pipelines. Monitor and improve the health, latency, and throughput of AI infrastructure in multi-cloud (primarily Azure) and hybrid environments. Manage incident detection, ...Reliability Engineer, Liability, AI, Engineer, Reliability, Reliability, Technology