Jacob Graham


Jacob specialises in relationship-driven recruitment focused on MLOps, Infrastructure, and Engineering across the DACH region for DeepRec.

He began his recruitment career in CRM, successfully placing across the full spectrum of Salesforce specialists into partners, ISVs, and end users throughout multiple geographies. Jacob then transitioned into data recruitment, delivering niche SME hires and high-volume contract placements, primarily within investment banking.

Today, he combines market intelligence with targeted headhunting, systematically mapping the market to understand who’s building, hiring, and advancing ML infrastructure. 

Every introduction is intentional, informed, and backed by data.

 

 

JOBS FROM JACOB

Baden-Württemberg, Baden-Württemberg, Germany
LLM Performance Engineer
LLM Performance Engineer Baden-WürttembergRemote with quarterly in person engineering workshops€110,000The work Most ML engineers never see what actually happens on the GPU. They train models, call an inference API, and trust the framework. If you have ever opened Nsight or Torch Profiler, followed a request through kernel launches and communication calls, and wondered why half the GPU time disappears into overhead, this work will feel very familiar. The problem Large language models behave very differently in production than they do in benchmarks. Token generation patterns change. Prefill and decode phases behave unpredictably. Communication overhead quietly kills throughput. Schedulers make decisions based on incomplete information. Most infrastructure platforms cannot see any of this.So they optimise the wrong things. Your work changes that. What you will actually build You will make the entire LLM execution path observable, from the moment a request hits the system to the moment CUDA kernels execute on the GPU. That means generating traces that capture:token-level model behaviourkernel launches and GPU utilisationruntime scheduling decisionsmemory movement and communication between GPUs You will use those traces to answer questions like: Why is a GPU only 55% utilised? Where does latency appear between prefill and decode? Why does a supposedly optimised attention kernel stall under load? Then you turn those answers into improvements. Better kernel behaviour. Better runtime execution. Better scheduling decisions across GPU fleets. The results show up in real numbers: higher GPU utilisation, lower latency and more throughput on production workloads. Why this work is different Most ML roles sit above the framework layer. This sits underneath it. You will spend your time inside PyTorch execution paths, CUDA behaviour, inference runtimes and distributed communication. The interesting problems live in the gaps between those layers. The systems you work on also run at meaningful scale. Clusters range from small internal deployments to environments with tens of thousands of GPUs. Performance improvements do not save milliseconds. They change how large fleets of hardware are used. The environment Small engineering team. Around sixty people. No layers of product managers translating problems for you. Engineers talk directly to each other and to the system. Work is fully remote, with occasional engineering sessions in Heidelberg focused on deep technical work rather than company rituals. Performance improvements are measured, validated and shipped to production systems used by paying customers.  You will likely enjoy this if You like profiling GPU workloads. You have dug into CUDA kernels, PyTorch internals or distributed training behaviour to understand why something performs poorly. You prefer investigating real systems over building ML features or training models. You care more about how models run than about how they are trained.
Jacob GrahamJacob Graham
Stockholm, Sweden
Engineering Manager
Engineering Manager Stockholm, Sweden73,000–93,000 SEK per month benefits Hybrid – 3 days office / 2 days remote Full-time Most ML leadership jobs pull you away from the models. This one puts you in charge of them. You will lead the generative audio systems that create music and sound effects for a global content platform used by millions of creators. The models already exist. The research direction is clear. What is needed now is someone who can own the entire system and push it into production at scale. You will guide how large diffusion models for music are trained, evaluated and deployed. Your decisions determine how these models evolve technically and how they run in real products where latency, stability and cost matter. What you will build You will help build systems that automatically adapt music to video, generate sound effects directly from visual input, and allow creators to produce soundtracks in seconds. A small team of five PhD educated ML engineers and a contractor will rely on your technical direction while you shape how the technology moves from experimentation into production. You will work across the full machine learning lifecycle. Training large generative models. Defining evaluation strategies. Making architectural decisions about inference, optimisation and deployment. Working closely with platform and MLOps engineers to ensure the systems run reliably in production.  Why this environment is different The models are trained on a proprietary catalogue of licensed music and structured datasets created through a global network of artists who produce and remix tracks specifically for training. This produces a dataset most AI labs simply do not have. You will also work close to the research frontier, with collaborations involving groups connected to unicorn start up labs and tier 1 universities.  The result is rare: frontier generative model work inside a stable, profitable company where the technology actually ships to users.  What you bringDeep experience training large machine learning models. Experience with generative models such as diffusion, audio models, vision models or large language models. Strong ML system design skills across training, evaluation and production deployment. Comfort guiding engineers and making architectural decisions that shape how ML systems evolve. Experience shipping ML systems where latency, reliability and cost matter. Team and setup You will lead a team of five PhD educated engineers and one contractor working on generative audio systems. The team works closely with platform engineering, data infrastructure and MLOps to ensure models move from experimentation into production features.  Curious? If you have trained large generative models before and want ownership of the entire system rather than a narrow piece of it, this will likely be interesting. Send a message / apply, and I can share more context.
Jacob GrahamJacob Graham
Heidelberg, Baden-Württemberg, Germany
Senior Research Engineer
Senior Research Engineer – Generative AIGermany - Remote first €80,000 – €100,000 2 year contract  This role sits inside a research-driven engineering team building real Generative AI systems that are meant to leave the lab and prove their value in the world.It is about building working GenAI agents, putting them in front of partners, stress testing them, improving them and demonstrating that they solve meaningful problems. The domains range from public safety and social services to finance. The common thread is impact. In the first six months, you would join an applied project where the goal is to prototype a GenAI agent and convince an external partner that it creates tangible value. You would work closely with a senior researcher, iterating quickly, shipping regular merge requests, refining features, spotting technical risks early and improving the system week by week. There is a strong emphasis on being able to explain what you built, both to technical peers and to non-technical stakeholders. The environment is intentionally exploratory. New models, new agent frameworks, new tooling. If something promising appears, you are encouraged to test it. The team meets in person every Tuesday in Heidelberg, but beyond that there is flexibility. English is the working language.You might be refining prompts and evaluation loops for LLM-based systems, experimenting with coding agents, shaping system architectures, or mapping out a lightweight roadmap for how a prototype could evolve into something commercial. You will be close to decision making, not buried in a narrow implementation silo.Who we're looking for:Working with LLMs or GenAI in practice since at least 2023, comfortable building in Python with proper version control.A Master’s or PhD in Computer Science, AI or a related field fits well.Industry experience matters more than labels.Experience with coding agents such as Cursor or Codex is particularly interesting, as is familiarity with modern GenAI libraries and lightweight MLOps tooling.Just as important is adaptability. The technology moves fast and so does the direction of applied projects. The interview process is technical but practical. There is an initial technical conversation focused on engineering and GenAI fundamentals, followed by a motivational discussion, and then an in-person day that includes collaborative coding using AI coding agents. The coding session focuses more on how you think and structure a solution than on perfect syntax. This is suited to someone who enjoys building at the edge of what is currently possible with Generative AI, but who also cares whether the result genuinely improves something for real users.If this sounds interesting, please apply here and a member of the team will be in touch.
Jacob GrahamJacob Graham
Spain
Machine Learning Engineer
MLOps Engineer Barcelona or San Sebastián, Hybrid Fixed-term contract until 30 June 2026€45,000-€55,000 Salary €3,000 sign-on bonus €500 per month retention bonus €2,000 relocation support EU work authorisation required Total bonus package available over the contract: up to €5,000 depending on start date. You join one of Europe’s most recognised deep-tech scale-ups. Backed by major global investors and strong EU support, they have built one of the most credible AI compression products in the market. This compression tool is already live with major enterprise clients. Now they need more engineers to help deploy, monitor and scale it properly.Why apply? You will work alongside highly technical quantum and AI engineers operating at a very high level. You will gain hands-on exposure to large-scale LLM deployment, distributed training and real-world cost optimisation. You will have a globally recognised deep-tech brand on your CV, working on AI efficiency at scale. That combination of compression, distributed systems and enterprise deployment opens doors across AI infrastructure, LLMOps and high-performance ML environments. You get flexible working hours. Start early, start late, structure your day how you want. Hybrid setup in Barcelona or San Sebastián. You get meaningful bonuses on top of base salary. What you’ll actually be doing Helping take compressed LLMs and get them deployed, monitored and running reliably for enterprise customers. Improving automation, reliability and cost efficiency across the ML lifecycle. Working closely with researchers and platform engineers to bridge research and production.What you'll needExperience running LLMs in production. Comfort working with the infrastructure around them, cloud, containers, CI/CD, Kubernetes, that sort of thing. Someone who understands what it takes to keep ML systems stable, monitored and efficient once they’re live. If you’ve touched production LLM systems and the infra that supports them, this is likely relevant.
Jacob GrahamJacob Graham