Staff Machine Learning Engineer, GenAI Platform
Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet.
What this role actually needs.
Staff Machine Learning Engineer, GenAI Platform at Reddit in Remote (United States). UpJobz keeps this listing high-signal for applicants targeting serious high-tech roles across the United States, Canada, and Mexico. Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet.
Day-to-day expectations
A clear list of the work this role is designed to cover.
- Drive GenAI Infrastructure Strategy: Propose, design, and lead the architecture of our next-generation LLM platform, significantly advancing our capabilities to support large-scale foundation models that serve millions of redditors.
- Design Resilient, Large-Scale Distributed Systems: Architect highly fault-tolerant training infrastructure capable of supporting multi-week, distributed workloads across massive GPU clusters. You will tackle challenges related to automated recovery, cluster-scale health monitoring, and advanced checkpointing to ensure optimal compute efficiency.
- Build Self-Serve LLM Workflows: Design and implement robust, production-grade pipelines for LLM fine-tuning (e.g., SFT, RLHF/DPO). You will abstract away the complexity of distributed training frameworks, integrating them into a seamless platform SDK that handles configuration, experiment tracking, and model lifecycle management.
- Develop Comprehensive Evaluation & Benchmarking Infrastructure: Treat model evaluation as a first-class platform capability. You will build scalable systems for automated regression detection, structured metrics tracking, and complex inference-heavy evaluation patterns to ensure the quality and safety of models before they hit production.
- Architect Advanced Data Ingestion Pipelines: Extend our distributed data platforms to natively and efficiently handle the massive, multimodal datasets (text, image, video) required for modern GenAI workloads, optimizing for throughput and dynamic batching.
- Provide Technical Leadership & Mentorship: Analyze complex bottlenecks in distributed systems to optimize for performance and cost-efficiency. Mentor senior engineers, champion a rigorous MLOps culture, and partner with cross-functional leadership to define technical roadmaps and de-risk major initiatives.
What a strong candidate brings
This keeps the job page specific, readable, and easier to match.
Why people would want this job
Benefits help searchers understand whether the role is a real fit before they apply.
- Comprehensive Healthcare Benefits and Income Replacement Programs
- 401k with Employer Match
- Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
- Family Planning Support
- Gender-Affirming Care
- Mental Health & Coaching Benefits
Browse similar jobs
Turn this listing into an application plan.
This is the first pass at the premium UpJobz layer: a fast brief that helps serious applicants move with more clarity.
Next moves
- Tailor your resume around ai and llm instead of sending a generic application.
- Use the first two bullets of your application to connect your background directly to staff machine learning engineer, genai platform is a high-signal remote role in remote (united states), and it is most realistic for united states residents.
- Open the role quickly if it fits and bookmark three similar jobs before you leave the page.
Interview themes
Watchouts
- Compensation is hidden, so get range clarity in the first recruiter conversation.
- Use united states residents as part of your positioning so the recruiter does not have to infer it.
- Lead with distributed collaboration, async delivery, and timezone discipline.
Search intent signals for this listing
Helpful keyword hooks for serious tech searchers and future programmatic job pages.
Ready to move on this role?
This page keeps the application flow simple while giving you enough context to decide quickly and move.