From the Customer's Side of the Table: What a Forward Deployed Engineer Actually Does

Dominik Römer

Dominik Römer

8 min read

FDE engagements at ellamind tend to look the same at the start. A customer is running a serious evaluation on a production AI system and wants the ellamind team in the room while they shape it. Take a customer-service chatbot at a large German statutory health insurer about to ship to a new department, where the team wants confidence in how it will behave before going live. Evaluations like this don’t fit a template. The test data lives across multiple systems. The quality criteria are written in regulatory language that needs translating into yes/no questions. The compliance reviewers need to weigh in without writing code. These are normal constraints in regulated AI work, and shaping the platform to fit them is what an FDE does.

The work over the next few weeks is roughly this: defining what needs to be evaluated, importing the team’s data into the platform, writing the first set of criteria with their domain experts, and running experiments side by side. By the end the team has running evaluations they trust, criteria their compliance reviewers signed off on, and a clear view of where the bot is failing and why, without anyone writing code.

My title is Forward Deployed Engineer. The job is to make elluminate fit each customer’s specific reality so they get real value from it.

This post is what an FDE actually does at ellamind, and why elluminate has one.

What is a Forward Deployed Engineer?

The term was popularized by Palantir and has since become standard for high-touch enterprise software companies. The short version: it’s an engineer who lives close to a customer. We help integrate and tune the product, run hands-on workshops, scope use cases, and carry what we see back into the product organization.

It’s a hybrid role. Part engineer, part product manager, part sales engineer, part diplomat. Most of the time I’m not writing production code; I’m helping someone else’s domain expert set up evaluations for their AI system, reproducing a bug they don’t have the vocabulary to file, or sitting in a workshop where the customer is still figuring out what they want to evaluate.

When the role works well, two things compress that are normally far apart: the distance from a customer’s reality to our backlog, and the distance from our backlog to their hands.

Why elluminate needs this role

elluminate is the decision layer for reliable AI. We help teams move AI development from intuition to evidence: defining criteria, running experiments, comparing models, and putting quality gates in front of production.

That sounds clean in a slide deck. In practice, AI evaluation is intrinsically domain-specific. There is no off-the-shelf eval suite for a German private health insurer’s claims-processing agent, a public-sector RAG bot answering benefits questions, or an industrial reasoning agent that has to interpret regulated fee schedules. Every customer arrives with the same general problem (“we need to know whether our AI works”) and a completely different concrete one (“…on these 80 cases, against these criteria, judged by this model, with these compliance constraints”).

Two paths compared: the traditional path passes a customer's pain through customer success, product, engineering, build, and a demo back over a quarterly cycle with handoffs that lose signal; the FDE path has one person scope, build, deploy, and onboard with no handoffs

That gap between “the platform has the capability” and “this customer is now getting value from it” is precisely where adoption usually dies. 46% of AI proofs of concept never reach production (Gartner, 2024). 70% of organizations have moved fewer than 30% of their GenAI experiments past the experiment stage (Deloitte, 2024). Bridging that gap is not a documentation problem. It is a relationship problem dressed as a technical one.

That’s the gap I live in. The work breaks into three kinds. Most of my time goes into helping customers onboard the platform and implement it for their use case. Sitting next to them as they work generates the feature requirements I carry back to engineering. And once the platform is in active use, I run workshops to spotlight what’s already happening and find new opportunities for the next round.

Three modes mapped to the engagement lifecycle: Phase 1 Onboarding (introduce elluminate, import data, define criteria, run experiments, interpret findings), Phase 2 Feature loop (watch usage, spot signals, scope requests, build proofs of concept, ship improvements), Phase 3 Workshops (spotlight existing use cases, bring teams onto a shared picture, find new opportunities, scope expansion)

1. Onboarding: from “platform deployed” to “insights extracted”

The hardest part of any platform isn’t installing it. It’s making it part of someone’s daily workflow. Adoption rarely fails at the technical layer; it fails in the gap between “we have access to elluminate” and “our domain experts open it on Monday morning without being asked.”

That’s where a lot of my hours go. I’ll be on a screenshare with an AI engineer importing their first dataset, sometimes from a Langfuse export, sometimes from a CSV of production traces, sometimes from a Notion page of test cases their compliance team wrote in prose. We’ll define the first set of binary yes/no criteria together, hook up the right judge model for their domain, run their first experiment, and read the comparison view side by side.

The moment that matters isn’t the import or the first run. It’s a non-technical reviewer (a compliance officer, a clinical reviewer, a customer-service team lead) opening elluminate for the first time and saying something like “now I see why version two is failing on these cases.” That’s the moment evaluation becomes a habit instead of a project. It’s also the moment a customer stops being a logo on our slide deck and starts being a partner.

The other half of onboarding is meeting customers where they already are. They come to elluminate with existing tooling: a Langfuse instance, an MLflow setup, a homegrown logging pipeline, a chatbot platform of their own choosing. Part of the FDE job is connecting elluminate to those systems instead of asking them to throw the systems away. We import data they already collected, configure connectors to platforms they already run, and make sure their first weeks on elluminate look like “the tools we had, plus evaluation on top,” not “rebuild your stack.” The faster they get value without rebuilding, the faster they trust the platform with the next use case.

2. Carrying feature requirements back to product

Most of the feature requirements that shape elluminate’s roadmap come from sitting next to a customer during onboarding. Some arrive as explicit requests. Most do not. They look like a moment of friction I watch happen in real time, or an offhand question that turns out to be the third version of the same question from a different customer.

Persona and multi-turn evaluation

The most consequential feature requirement I carried back this year started in a one-on-one conversation with a developer at a large German statutory health insurer. He was using elluminate in a way I didn’t expect, treating it as if it could simulate full conversations when the platform at the time only supported single-turn evaluation. Instead of correcting him, I asked enough questions to understand what he was really trying to do.

What he needed was to evaluate a customer-service chatbot the way it actually behaves in production: not as a series of isolated single-turn responses, but as conversations. A user clarifies, contradicts, asks a follow-up. A bot that aces single questions can still derail a conversation.

I took that back into the team and iterated with our dev team and with the customer’s domain experts to write the requirements for a persona and multi-turn evaluation feature. We scoped what a good “simulated persona” looks like: the irritated user, the confused user, the user who keeps switching topics, the malicious user. So their domain experts could contribute personas without writing code. Then I built the proof of concept, demoed it back to the customer, brought it live on their elluminate instance, and ran the onboarding call.

The whole arc, from a single conversation about how a user was actually using the platform to a live feature on their instance, compressed into a handful of cycles because the same person ran it end to end. The customer didn’t have to translate their requirement to a product manager, who would translate it to an engineer, who would later demo it back to someone who’d never been in the original room. Persona-driven, multi-turn evaluation is now a first-class capability in elluminate, and the design started with one customer’s actual usage, not a roadmap brainstorm.

Frontend improvements discovered during onboarding

The other category of feature requirement doesn’t arrive as a feature request at all. It arrives as a moment of friction I watch happen in real time, sitting on a screenshare during an onboarding session.

I was running an onboarding session with a developer at one of our customers. He was trying to set up his experiments and kept hitting limitations in the elluminate frontend that made the workflow more painful than it needed to be. He didn’t file any tickets. He just kept working around them. I wrote down notes during the session and used them as the input for the next round of frontend improvements, making the experiment setup flow more usable for the kind of work he was actually doing.

The most valuable feedback is the kind a customer doesn’t know to give you.

That principle keeps surfacing. Two enterprise health-insurance customers, independently, asked whether the judge model could write its reasoning in German rather than English, partly for review by non-technical reviewers, partly for documentation aligned with EU AI Act Article 17. After the second ask it became a tracked product item, not a Slack message decaying in a thread. An organization-admin permission gap that meant admins couldn’t see their settings page unless they were also on a project: reported on a Friday, fixed by Monday. A project-admin error message that was technically correct but useless to the user: closed within a day. A connector to a customer’s existing chatbot platform that needed a toggle so evaluation calls didn’t pollute their production monitoring stats: shipped in two days.

None of these are large features. All of them are the difference between a customer who renews and a customer who quietly stops logging in. And none of them would have been filed at all if I hadn’t been sitting next to the person who hit them. Customers don’t file tickets for the friction they’ve learned to live with. You only catch it by watching them work, which is exactly what onboarding is.

3. Workshops: spotlighting existing use cases, finding new ones

Once a customer is actively using elluminate, the work shifts. Workshops become the format where we step back and look at the whole engagement instead of one experiment at a time. I sit down with the customer’s AI team and their domain experts and try to do two things at once.

Make their existing use cases visible. A surprising number of evaluation projects inside an enterprise are invisible to the rest of that enterprise. One team has built a careful test set for their chatbot. Another team is wrestling with the same problem on a different bot and doesn’t know the first team exists. The workshop is sometimes the first time those two teams actually talk. We walk through what’s already on the platform, which experiments, which criteria, which results, and turn it into a shared picture of “here’s where we already use this, here’s what’s working, here’s where we got stuck.”

Find new opportunities together. Once the existing work is in the room, the next question, “where else does this hurt?”, opens up. At one large healthcare IT provider, the engagement had started around a knowledge-base assistant for caretakers. The workshop surfaced a separate question their leadership had been quietly worrying about: should the LLM-as-a-Service offering they resell to their own customers come with a built-in evaluation layer? That conversation became its own opportunity, scoped weeks after we’d walked in to talk about something else.

Workshops aren’t sales meetings. They’re the place where the customer realizes how much more they could be doing with the platform, and where I learn what we should be building next.

What scales this: the Forward Deployment Engine

Doing all of this for one customer is doable. Doing it for ten in parallel without losing signals is not. That’s why we built our own internal Forward Deployment Engine, an in-house framework that streamlines customer-facing work across Slack, Notion, Linear, and our codebase. Customer emails route to handlers that triage the signal, capture the interaction, classify feature request vs. bug, link it to the customer in our CRM, and create the Linear issue with the right labels and context. The role is the human in the loop; the engine is what makes the loop fast.

What I’ve learned in the role

A few things I’d tell anyone else taking on this kind of work.

Integration into the customer’s use case is the job, not the feature list.

The most valuable thing I do for a customer is not running a demo or shipping a new capability. It’s making elluminate fit their actual workflow well enough that the team starts to rely on it. The relationship and the renewals follow from that, not from the feature list.

The unit of progress is not “feature shipped.” It’s “customer succeeded.” A feature that lands but doesn’t change how a customer works is a metric, not a result. Calibrate against the latter.

The questions a customer doesn’t know to ask are the most important ones. The visible asks are easy; they come over email. The invisible ones come from sitting next to them while they work. Schedule the screenshare.

Translate, don’t transcribe. A customer says “the judge model is too harsh on our chatbot.” That is not a feature request. The underlying ask might be “we want grading criteria that match how a human reviewer in our department would actually score this.” Rewrite the problem in the customer’s domain language before it ever becomes a ticket. As Gerriet has written, the domain expert and the engineer need to be at the same table when criteria get defined; the FDE is often the person who puts them there.

Most of the artifact is not code. The most leveraged thing I write in a week is usually not a pull request. It’s the rephrased problem statement, the workshop summary, or the one-pager that says “two customers asked for X this month, this is now a thing.”

Why this matters for elluminate

elluminate is the decision layer for reliable AI. That positioning is meaningless unless it is true for this customer, with this domain, under this compliance regime. The FDE role exists because the only way to make it true is to send someone into the customer’s reality and have them come back with what the product needs to do next.

When that loop closes fast, customers stop treating us like another vendor and start treating us like part of their team. That is the only kind of relationship that compounds.

It’s also the most honest version of “we listen to our customers” a software company can offer. Not a survey. Not a roadmap webinar. The same person at the demo, in the onboarding session, and in the pull request.


Want to see what an FDE-led engagement looks like? We help teams move from “we have elluminate” to “elluminate is part of how we ship AI.” If you’re rolling out evaluation at your company and want someone in the room with you while you do it, let’s talk. I might be the one running it.

More articles

Unlock the power of AI

See how our products can help you evaluate, deploy, and monitor AI agents with confidence.