Senior MLOps / ML Infrastructure Engineer
About the Company

Our client is a Series B, venture-backed deep-tech company building a Physics AI platform that helps engineering teams bring products to market faster, reduce development risk, and explore better designs with greater confidence. The platform combines large-scale simulation data with modern machine learning to generate high-fidelity predictions of physical behavior in near real time.
Customers include leading organizations across aerospace, automotive, and advanced manufacturing, working on some of the most demanding real-world engineering problems.
The Role

This role focuses on building and operating the infrastructure that powers physics-based AI systems at scale. The position enables ML engineers and scientists to train, track, deploy, and monitor models reliably without managing low-level infrastructure. The work sits at the intersection of ML systems, cloud infrastructure, and large-scale simulation data, with a strong emphasis on performance, reliability, and developer productivity. It is a hands-on engineering role in a fast-moving, in-office environment, working closely with ML researchers, platform engineers, and product teams.
What You’ll Do
  • Design, build, and maintain robust MLOps infrastructure supporting the full ML lifecycle, from experimentation and training through to production deployment and monitoring
  • Implement automated training pipelines, experiment tracking, and model lifecycle management using tools such as Kubeflow, MLflow, and Argo Workflows
  • Develop scalable data pipelines capable of handling large volumes of unstructured data, particularly 3D geometric data and physics simulation outputs
  • Deploy machine learning models into production inference systems with strong standards for performance, reliability, and observability
  • Manage model registries and integrate them with CI/CD workflows to support consistent and reliable model releases
  • Implement monitoring systems that continuously track model health and performance in production
  • Collaborate closely with ML researchers, platform engineers, and product teams to evolve the infrastructure platform for physics-based AI applications
  • Write production-grade code and optimize cloud infrastructure, primarily on Google Cloud Platform, while making thoughtful trade-offs around scalability, cost, and operational simplicity using Docker and Kubernetes
What We’re Looking For
  • Bachelor’s degree or higher in Computer Science, Data Science, Applied Mathematics, or a closely related field
  • 5 years of industry experience building MLOps platforms or ML systems in production environments
  • Strong proficiency in Python, with working knowledge of BASH and SQL
  • Hands-on experience with cloud infrastructure such as GCP, AWS, or Azure
  • Experience with containerization and orchestration tools including Docker and Kubernetes
  • Familiarity with modern MLOps frameworks such as Kubeflow, MLflow, and Argo Workflows
  • Experience building and maintaining scalable data pipelines, ideally working with unstructured or high-dimensional data
  • Ability to independently deploy models and implement monitored inference systems in production
  • Comfortable troubleshooting complex distributed systems and building reliable infrastructure that other teams depend on
Nice to Have
  • Interest in physics simulation, scientific computing, or HPC environments
  • Experience building production MLOps platforms in deep-tech or simulation-heavy environments
  • Familiarity with additional programming languages such as Go or C
Working Style and Culture
This role suits someone who enjoys startup environments, learns quickly, and communicates clearly across disciplines. The team works on-site five days a week and values close collaboration, fast feedback loops, and hands-on problem solving. There is a strong belief that great infrastructure should be largely invisible, enabling engineers and scientists to move faster without friction.