← Back to The Lab
Experimentation Shipped dentsu Benelux

Shipping reliable A/B tests for enterprise clients across two years

Between 2022 and 2025, I was the engineer responsible for implementing A/B and multivariate test scripts for major dentsu clients — airlines, telcos, luxury brands. The hypotheses came from strategists. My job was to make them run perfectly in production.

Note

Due to the nature of client work and NDAs, no visual assets are shown for this project.

Role

Mid-Level
Frontend Dev

Team

dentsu Benelux

Timeline

Sep 2022
— Jan 2025

Clients

Transavia
Odido
Audemars Piguet

The context

Experimentation at agency scale

dentsu runs conversion optimisation programmes for clients across retail, travel, luxury, and telecoms. The strategy and hypothesis work is owned by CRO specialists and account teams. The engineering side — implementing the tests without breaking production — is a discipline of its own.

I was the developer turning approved experiment briefs into tested, stable JavaScript that manipulated live pages for split audiences, tracked events accurately, and never caused a visible flash of unstyled content. For high-stakes clients, a flicker or a data-tracking mistake isn't just a bug — it's a client escalation.

Selected clients

Enterprise scale, different constraints

Transavia — Flight booking flow optimisation. Tests focused on reducing drop-off at key funnel steps. Working with a tightly audited booking engine meant every DOM manipulation had to be carefully scoped to avoid side effects on availability queries and payment flows.

Odido — The Netherlands' largest telecom after T-Mobile and Tele2 merged. Experiments ran across product pages and checkout flows. I won a dentsu internal marketing case competition for Odido — one of the few moments where I was invited into the strategy conversation, not just the implementation.

Audemars Piguet — A luxury brand where the experience standard is extremely high. Tests were low-friction and visually precise. The margin for "good enough" was very narrow.

What the work actually involved

Making someone else's hypothesis run flawlessly

Each experiment started with a written brief: the hypothesis, the target pages, the variant designs, the success metrics. My job was to read that brief and produce test code that was invisible to users who weren't in the experiment, reliable across browsers, and cleanly reversible when the test concluded.

Common challenges: preventing layout shifts (FOOC) on test pages, handling SPAs where the DOM mutated asynchronously, aligning event tracking with the analytics team's data layer schema, and coordinating with QA on devices we didn't control.

I also handled client communication on some engagements — mainly around implementation questions, rollout timelines, and explaining technical constraints in non-technical terms.

Recognition

Odido win and a Northstar nomination

During the Odido engagement, dentsu ran an internal case competition. I entered with a marketing strategy and execution concept for Odido — and won. It was outside the typical scope of my role, and it showed me I could hold the bigger picture when given the space to.

I was also nominated for the dentsu Northstar Awards for a separate social impact project during this period — an internal recognition programme for exceptional work.

What I take from this

Execution discipline and a sharper eye for quality

Two years of shipping experiments for clients who notice everything trained something specific in me: a real intolerance for sloppiness. When the code runs on a live booking funnel for 50% of visitors, "it works on my machine" is not a sufficient bar.

I'm not claiming to have designed these experiments. The hypotheses weren't mine. But turning someone else's idea into something that runs reliably, cleanly, and correctly in production — at enterprise scale — is a skill, and I built it here.

Case evidence

Structured testing, measurable outcomes

These artifacts represent practical experimentation delivery: hypothesis framing, controlled variant planning, and outcome interpretation for roadmap decisions.

Hypothesis board with assumptions and success metrics
Hypotheses were paired with measurable criteria before implementation.
Variant matrix across messaging, layout, and audience axes
Variant matrices reduced ambiguity and improved test consistency.
Outcome readout comparing baseline and variant trends over time
Readouts emphasized trend confidence over vanity lifts.
  • Primary deliverable: stable test implementations across enterprise websites.
  • Technical emphasis: low-flicker DOM updates, tracking alignment, and rollout safety.
  • Business impact: clearer signals for prioritization and optimization planning.