User testing: why most companies test too late – and what you can learn from a button that cost 300 million dollars

It was a wednesday afternoon in march when sarah, a senior designer at A scandinavian fintech company, sat in a meeting and witnessed something strange. On the screen in front of her, a 34-year-old user – a person who perfectly matched their target audience – struggled with something that should have taken five seconds. After three minutes, he gave up.

The problem? A button.

Not just any button. A button that the design team had discussed for weeks, Refined in figma, A/B tested in two variants, and finally implemented after careful consideration. A button that looked professional. Modern. Clean.

But to the user, it looked like nothing at all. Gray against gray. Like a placeholder. Like something you couldn’t click on.

Six out of eight test subjects gave up at exactly the same spot. The company had spent four months building an onboarding process that 75% of users never completed. And no one had known about it – until Sarah booked eight users for an afternoon.

The cost of changing the button’s color? A couple of hours of development time. The value of the change? A 34% increase in completed signups. That translates, for this company, to about three million kronor in annual recurring revenue.

This is the story of why most companies don’t have a design problem. They have a timing problem.

The paradox of perfect design

When frank gehry designed the guggenheim museum in bilbao, he first built hundreds of physical models. Small, primitive, ugly models made of cardboard and tape. He tested light entry. Pathways. Angles of view. Long before the first shovel hit the ground, thousands of small decisions had already been made and discarded.

We intuitively understand this when it comes to buildings. An architect who starts building without models is either a genius or an idiot, and the odds say they are the latter.

But when it comes to digital design, something strange happens. Smart people – people who would never buy a couch without sitting on it first – Build entire products based on assumptions. They discuss in conference rooms. They check benchmarks. They follow best practices. And then they launch.

And when it doesn’t work? “Users don’t understand how to use it.”

It’s like frank gehry blaming visitors for getting lost in his museum.

The strange math behind five users

In the early 90s, a man named jakob nielsen did something that would change how we think about user testing. He asked a simple question: how many users do you need to test with to find usability problems?

The answer turned out to follow a curve that most don’t expect. The first user reveals about a third of all problems. The second user finds many of the same problems – but also some new ones. The third, fourth, and fifth continue to find new things, but at a decreasing rate.

By five users, you’ve found about 85% of the major problems.

This is counterintuitive. We are used to bigger always being better. More data points. More statistical significance. More certainty.

But usability problems are not normally distributed. They are binary. Either the user can complete the task or they cannot. And when five people get stuck at exactly the same spot – you don’t need number six to know something is wrong.

The fascinating part isn’t the math. It’s what it reveals: most companies don’t test too little, they test too late.

Five users, one afternoon, one prototype. That’s all it takes to avoid months of building in the wrong direction.

But most wait until the product is finished.

A button is never just a button

Let’s return to sarah and her finance company. After User testing, they did something that many companies don’t do: they asked “why?”

Why did the button look like a placeholder?

The design team had followed the company’s Design system. Gray was their secondary color. The button was supposed to be neutral, not draw too much attention.

But to the user, the design system doesn’t exist. They don’t see “secondary color according to brand guidelines from Q2 2023”. They see a gray box that doesn’t look clickable.

This is the gap between intention and perception. And it only shows when real people interact with your product.

Malcolm gladwell once wrote about how expertise can sometimes blind us. A sommelier can taste hundreds of nuances in a wine that an average person doesn’t even notice. But it also means that the sommelier has lost touch with how a wine tastes to someone who hasn’t drunk thousands of bottles.

Designers are like sommeliers. They see the interface through the lens of hundreds of other interfaces. They know conventions. They understand metaphors. They know that a hamburger icon means menu.

But the user’s grandmother? She sees three horizontal lines.

User testing is not a quality control. It’s a reality check.

The story of the button that wasn’t a button

In 2009, Amazon made a change so small that most never noticed it. But that change increased their revenue by an estimated 300 million dollars in the first year.

They removed a button.

More specifically: they removed the requirement to create an account before purchase. Instead, they added a guest checkout. One single change. One less friction removed.

How did they discover it? User testing.

They saw people filling their cart, clicking to checkout, seeing the account creation form – and closing the tab. Again and again. Not because they didn’t want to buy the product. But because they didn’t want to create yet another account.

This is what user testing reveals: people don’t do what they say. They don’t even necessarily do what they think. They do what feels easiest in the moment.

The uncomfortable truth about experts

There’s a paradox at the heart of product development: The more you know about your product, the less you understand how a new user experiences it.

It’s called “the curse of knowledge”, and it’s why steve krug – the man who literally wrote the book on usability – insists on testing with real users instead of colleagues.

When Dropbox first launched, they had a problem. People didn’t understand what the product did. “Sync files between devices” – that made no sense to the average user in 2008. Technicians understood it immediately. But the public? They asked, “why would i want my files on the internet?”

Dropbox’s solution was genius: they stopped explaining. Instead, they showed a video. A short, simple, funny video that showed exactly what happened when you put something in your Dropbox folder.

No tech jargon. No explanation of backend architecture. Just: put something here, get it there.

How did they know it was the right approach? They tested it. They saw that when users watched the video, they understood. Without the video – confusion.

This is the difference between thinking like an expert and understanding a beginner. And the only way there is by watching real beginners as they try to use what you’ve built.

What really happens when you observe someone?

The strange thing about user testing isn’t the data you collect. It’s what happens in the room when you watch.

Sarah from the fintech company described it this way: “after ten minutes, you stop defending the design. You stop thinking ‘but they are doing it wrong’. You start seeing what they actually see.”

It’s a form of empathy that can’t be faked. When you sit next to someone clicking, searching, scrolling, going back, trying again, giving up – you don’t just see what’s wrong. You feel it.

Designers often talk about “user empathy” as an abstract principle. Personas. Journey maps. Empathy maps. All these tools are useful. But they are like reading about cycling. You understand the concept. But you don’t learn balance until you actually sit on the bike.

User testing is when you sit on the bike.

The art of asking dumb questions

There’s a technique that experienced moderators use that feels completely wrong the first time you do it. When a user asks, “should i click here?” – you don’t answer.

Instead, you ask: “what do you think?”

It feels rude. Like you’re refusing to help. But it’s the only way to understand what’s actually happening in the user’s head.

Because if you say “yes, click there” – you’ve just ruined the test. Now you know the button works when someone tells them to click. But it says nothing about whether a real user, alone at their computer, without you as a guide, would understand to click there.

The best moderators have developed a set of phrases that feel like they are helping – but actually throw the responsibility back:

“What would you do if I weren’t here?” “Tell me what you’re thinking.” “Interesting – why do you think that?”

This requires a special kind of comfort with silence. Most people hate silence in conversation. We want to fill it. Explain. Help.

But silence is where insights live. When a user sits silently for 30 seconds, frustrated, clicking around – it’s not an awkward situation to interrupt. It’s exactly the data you need.

Why small companies test better than large ones

There’s a pattern that recurs again and again: startups with three designers test more than enterprises with 50.

Why?

Large organizations have too much to lose. If you’ve spent six months and ten million building something – the psychological cost of hearing “this doesn’t work” is enormous.

So instead of testing early, when you still only have a prototype and nothing is locked – you test late. When it’s too hard to change. When you secretly hope the test will just confirm that everything is fine.

This is confirmation bias in its most expensive form.

Small companies have a different kind of pressure. They have no time. They need to build quickly. And paradoxically, that becomes their advantage. Because when you don’t have time to build wrong, you must know you’re building right.

Airbnb in the beginning tested the booking flow with paper prototypes. Not because they were hipster-cool. But because they couldn’t afford to develop something that didn’t work.

Large companies say “we can’t afford to test”. Small companies realize “we can’t afford NOT to test”.

The awkward truth about a/b testing

Somewhere around 2010, a/b testing became a religion in the tech world. Google tested 41 shades of blue. Amazon tested everything. Data was king.

And data IS king. But data without context is just numbers.

An A/B test can tell you that a blue button converts 2.3% better than a green button. But it can’t tell you why. And more critically – it can’t tell you what you don’t know you should be testing.

Here’s the difference:

A/b test: “which of these two buttons works better?” user test: “why does no one see the button at all?”

The first optimizes. The second reveals.

Netflix is a master of a/b testing. But do you know what else they do? User testing. Lots of user testing. Because before you can a/b test which thumbnail gets the most clicks, you need to understand how users actually browse through the interface.

The best product teams use both. User testing to understand the problem. A/B testing to validate the solution at scale.

What happens when you test with the “wrong” people

There’s a myth about user testing that refuses to die: that you must find “perfect” participants. Exactly the right demographics. Exactly the right behavior patterns.

But the reality is more nuanced.

Steve krug tells the story of a test he ran for a travel booking site. They had planned to test with “frequent travelers who book at least six flights per year”.

First test participant: a 68-year-old woman who had never booked a flight online before.

The team’s first reaction? “She’s the wrong person for the test.”

But after ten minutes, they realized something. Every time she got stuck – it was at the same places where the “right” users would later get stuck. She just expressed her confusion louder. Clearer.

She wasn’t the wrong participant. She was an amplified version of the problems everyone had.

This doesn’t mean that recruitment doesn’t matter. If you’re building an app for cardiologists, you must test with cardiologists. But it means that perfect recruitment is often an excuse not to test at all.

Better to test with 80% right participants today than wait three weeks for 100% perfect participants.

The final theory: why we don’t test

After thousands of conversations with designers and product teams, there’s a pattern that repeats. When people say “we don’t have time to test” – it’s almost never the whole truth.

The real reason is more uncomfortable: We are afraid of what we will hear.

It’s scary to watch someone struggle with something you’ve built. It’s exposing. Like asking someone to read your diary.

But this is precisely why it’s so valuable. Because either users will struggle while you watch – and then you can fix it. Or they will struggle when you’re not watching – and then you lose them forever.

Sarah and her team ran user tests every six weeks after that first session. Not because it was glamorous. But because they never wanted to experience it again – building for four months and realizing that 75% of users give up.

The cost of testing: one afternoon, a few gift cards. The cost of not testing: four months, a whole feature, millions in lost revenue.

Epilogue: an afternoon that changes everything

What fascinates about user testing is not the complexity. It’s the simplicity.

You don’t need an advanced lab. No eye-tracking equipment. No statistical significance.

You need five people, a few tasks, and the willingness to watch.

What sarah learned that wednesday in march wasn’t how to fix a button. It was something more fundamental: that you don’t know what you don’t know until you look.

Her team had discussed the button for weeks. They had iterated. Debated. Optimized. And yet they had missed the obvious.

But they hadn’t failed. They had just tested too late.

Now they test early. Often. With prototypes that are still messy. With ideas that are barely finished.

And every time they watch a user get stuck somewhere – they are no longer disappointed.

They are grateful.

For every problem they find in the test is a problem they avoid in reality.

That’s the difference between building what you think works and knowing what actually does.

And that difference is, as it turns out, worth about a button.

User testing: why most companies test too late – and what you can learn from a button that cost 300 million dollars

Updated on

December 17, 2025

Reading time

13 minute read

User testing: why most companies test too late – and what you can learn from a button that cost 300 million dollars

The paradox of perfect design

The strange math behind five users

A button is never just a button

The story of the button that wasn’t a button

The uncomfortable truth about experts

What really happens when you observe someone?

The art of asking dumb questions

Why small companies test better than large ones

The awkward truth about a/b testing

What happens when you test with the “wrong” people

The final theory: why we don’t test

Epilogue: an afternoon that changes everything

Related insights