Chapter 6: Assumption Testing & Validation

Every product idea is a bundle of assumptions. Some of those assumptions are obvious -- "customers want this feature." Many are hidden -- "customers will understand how to find this feature," "our infrastructure can handle the load," "this business model generates positive unit economics." The discipline of assumption testing is about surfacing those hidden bets, prioritizing the riskiest ones, and designing the smallest possible experiments to test them before committing significant resources.

The Five Categories of Assumptions

Teresa Torres organizes assumptions into five categories ^cdh. This taxonomy is valuable because teams tend to have blind spots -- they obsessively test one category while ignoring others.

1. Desirability Assumptions

"Do customers want this?" This is the category most teams think of first. Does the target customer experience the pain we think they do? Is the pain acute enough to motivate change? Would they choose our solution over their current workaround?

2. Viability Assumptions

"Does this work for the business?" Can we price this profitably? Does it fit our business model? Will the customer acquisition cost stay below the lifetime value? Will this cannibalize existing revenue? Teams with a strong customer-centric mindset often forget to test viability until late in the process, leading to products that customers love but that the business cannot sustain ^cdh.

3. Feasibility Assumptions

"Can we build this?" Do we have the technical capability? Can we deliver the performance customers expect? Can we build it within the time and budget constraints we have? Feasibility assumptions are the domain of the engineer on the product trio, and they deserve the same rigor as desirability testing.

4. Usability Assumptions

"Can customers figure out how to use this?" Even if customers want the outcome and you can build the technology, will they be able to navigate the interface, understand the workflow, and achieve the result without excessive friction? Usability assumptions are often tested too late -- after the product is built -- rather than with early prototypes ^cdh.

5. Ethical Assumptions

"Should we build this?" Could this product cause harm to users, communities, or society? Does it create perverse incentives? Could it be misused in ways we have not anticipated? Torres notes that most product teams have a blind spot for ethical assumptions because their intentions are good and they forget that good intentions do not prevent bad outcomes ^cdh.

Using the Taxonomy

The five categories serve as a checklist. For every solution you are considering, generate at least one assumption per category. If you cannot think of any assumptions in a category, that is a warning sign -- you likely have unexamined blind spots, not an assumption-free zone.

Identifying Hidden Assumptions

The hardest part of assumption testing is not designing experiments -- it is seeing your own assumptions in the first place. The following techniques help surface what you have been taking for granted.

Story Mapping

For each solution, tell the story of how a customer would experience it from beginning to end ^cdh. Walk through every step: How do they discover it? What is the first thing they see? What do they do next? What happens if they get stuck? What happens after they achieve the outcome?

At each step, ask: "What must be true for this step to work?" The answers are your assumptions. Story mapping is especially effective for surfacing usability and feasibility assumptions that abstract thinking misses.

Pre-Mortems

Adapted from cognitive psychologist Gary Klein, a pre-mortem inverts the typical project review ^cdh^sprint. Instead of asking "What went wrong?" after the fact, you ask: "Imagine it is six months from now. This product launched and it was a complete failure. What went wrong?"

Each team member independently writes down reasons for the hypothetical failure. The combined list is a rich source of assumptions -- each reason for failure implies an assumption that must hold true for the product to succeed. Pre-mortems are particularly good at surfacing viability and ethical assumptions because they create psychological permission to voice doubts that team members might otherwise suppress ^cdh.

Knapp uses a variation of this in the sprint context: on Monday, the team generates "sprint questions" by asking "What are the biggest risks?" and "What could make this whole thing blow up?" ^sprint. These questions frame the entire week's work and define what the Friday test must answer.

Walking the Lines of the OST

Torres recommends a technique specific to the Opportunity Solution Tree: for each line connecting a node to its parent, ask "Why do we believe this connection is true?" ^cdh. The line between an opportunity and a solution implies that the solution will address the opportunity. The line between an opportunity and the outcome implies that addressing this opportunity will drive the outcome. Each line is an assumption that can be questioned and tested.

Questioning Potential Harm

For ethical assumptions specifically, Torres suggests asking: "Who could be harmed by this product? How? What would we do about it?" ^cdh. Consider not just direct users but also non-users, communities, and society at large. Consider not just intended use but also misuse, edge cases, and second-order effects.

Assumption Mapping: Prioritizing What to Test

You will always have more assumptions than you have time to test. Assumption mapping, an exercise designed by David J. Bland, provides a structured way to prioritize ^cdh.

See Assumption Mapping

The 2x2 Matrix

Draw a two-dimensional grid:

X-axis: Evidence -- How much evidence do we already have that this assumption is true? Left = strong evidence, Right = weak or no evidence.
Y-axis: Importance -- How critical is this assumption to the success of the product? Bottom = nice to know, Top = if this is wrong, the whole thing fails.

Place each assumption on the grid. The assumptions in the upper-right quadrant -- high importance, weak evidence -- are your "leap of faith" assumptions. These are the ones you must test first ^cdh.

Practical Application

Do not try to map every assumption. Start with the assumptions generated from story mapping and pre-mortems for your top 2-3 solution ideas. Place them on the map quickly -- this is a rapid prioritization exercise, not a precision measurement. The goal is to identify the 2-3 assumptions per idea that carry the most risk and then design tests for those specifically.

Torres emphasizes testing assumptions across multiple solution ideas in parallel rather than testing one idea exhaustively ^cdh. This protects against confirmation bias: if you are only testing one idea, you will unconsciously seek confirming evidence. If you are testing three ideas simultaneously, you are forced into genuine comparison.

Key Assumptions: Specific, Singular, Measurable

Bill Aulet provides complementary criteria for what makes a well-formed assumption (which he calls a "Key Assumption") ^de:

Specific -- Not "customers will like this" but "freelance designers with 2-5 years of experience will prefer our portfolio builder over creating a custom website."
Singular -- Each assumption tests one thing. If you bundle two assumptions into one test, a failure tells you nothing about which assumption was wrong.
Important -- If this assumption is wrong, the venture fails or must significantly pivot.
Measurable -- You can define a clear criterion for what would confirm or disconfirm it.
Testable -- You can design an experiment to evaluate it within your resource constraints.

If an assumption does not meet these criteria, refine it until it does. Vague assumptions produce vague tests that produce ambiguous results.

Designing Assumption Tests

Torres provides a three-part structure for every assumption test ^cdh:

1. Simulate an Experience

Create a situation where the participant encounters something that mimics the real-world experience your product would create. This might be a prototype, a concierge service, a landing page, a mock-up in a conversation, or even a thought experiment framed as a scenario. The key is that the participant should behave as they would in reality, not evaluate an abstract concept.

The simulation does not need to be high fidelity. It needs to be high enough fidelity that the participant's response is a meaningful signal about what they would actually do.

2. Evaluate Behavior

Observe what the participant does, not just what they say. Do they click? Do they sign up? Do they complete the task? Do they express confusion? Do they abandon? Behavioral data is more reliable than attitudinal data because it is harder to fake and less subject to social desirability bias ^cdh^mom-test.

3. Define Success Criteria Upfront

Before running the test, write down what result would confirm the assumption, what result would disconfirm it, and what result would be ambiguous. This is perhaps the single most important discipline in all of assumption testing.

Without pre-defined success criteria, teams unconsciously move the goalposts after seeing results. If 3 out of 5 participants succeed, the team declares victory. If only 1 out of 5 succeeds, the team finds reasons to discount the failures. Pre-set criteria eliminate this rationalization ^cdh^de.

Example: "If 4 out of 5 participants can complete the checkout flow without asking for help, the usability assumption passes. If fewer than 3 can, it fails. If exactly 3 can, the result is ambiguous and we will refine the test."

The Minimum Viable Business Product (MVBP)

Bill Aulet draws a sharp distinction between the Lean Startup's Minimum Viable Product (MVP) and what he calls the Minimum Viable Business Product (MVBP) ^de. His concern with the MVP concept is that "minimum" and "viable" have been stretched to justify shipping things so crude that they damage the brand and teach nothing useful.

Three Conditions for an MVBP

An MVBP must satisfy all three:

The customer gets value from it. It solves a real problem well enough that the customer is better off using it than not using it. A landing page that collects email addresses does not meet this bar -- the customer gets nothing.
The customer (or someone) pays for it. Not "would pay" or "said they would pay" but actually exchanges money (or another meaningful commitment currency). Payment is the ultimate validation of value.
It creates a feedback loop. The product must be instrumented and the relationship structured so that you learn from actual usage. If you ship and cannot measure what happens, you have not validated anything.

When MVBP Thinking Helps

The MVBP framework is a corrective for teams that have over-indexed on speed at the expense of learning quality. It forces you to ask: "Will this test actually tell us whether customers find this valuable enough to pay for?" If the answer is no, you need to increase fidelity until the answer is yes -- not build less, but build the right minimum ^de.

MVP vs. MVBP: Complementary Perspectives

Aulet's critique of the MVP is not a rejection of lean methodology but a refinement of it ^de. The MVP asks: "What is the least we can build to learn something?" The MVBP asks: "What is the least we can build that delivers real value and generates real payment?"

Both questions are valid at different stages. Very early assumption tests (does this pain exist? do people search for solutions?) may warrant scrappy, sub-MVBP experiments. But before you declare product-market fit or commit significant engineering resources, you should reach the MVBP bar: real value, real payment, real feedback loop.

Concierge Execution

One of the most powerful validation techniques is the concierge approach: manually perform what will later be automated ^de. Instead of building a recommendation engine, have a human curator make personalized recommendations. Instead of building an automated matching algorithm, match people by hand.

Concierge execution tests desirability and viability assumptions without requiring significant feasibility investment. If customers love the manually delivered service and are willing to pay for it, you have strong evidence that the automated version is worth building. If they do not, you have saved months of engineering effort.

The concierge approach also generates deep customer understanding because the person delivering the service is directly experiencing customer reactions, objections, and workarounds -- data that no instrumented product can capture as richly.

Sprint Questions: Reframing Assumptions as Questions

Knapp's sprint methodology translates assumptions into questions that the Friday user test must answer ^sprint. On Monday, the team generates these by asking:

"To meet our long-term goal, what needs to be true?"
"Imagine we travel into the future and see that the project has failed. What caused it?"

These questions are written on the whiteboard and remain visible all week. They guide which part of the prototype needs the most attention on Thursday and what the interviewer probes hardest on Friday ^sprint.

The sprint question format is simpler than Torres's assumption mapping -- it does not use a 2x2 matrix or categorize by type. But it achieves the same core function: forcing the team to articulate what they are betting on before they invest in building anything.

False Positives and False Negatives

Every test with a small sample produces imperfect results. Torres addresses this honestly ^cdh:

False Positives

A false positive occurs when your test suggests the assumption is true, but it is actually false. For example, 4 out of 5 prototype testers complete the task, but in reality the task is too confusing for most of your target market. The 5 participants happened to be unusually tech-savvy or unusually motivated.

False Negatives

A false negative occurs when your test suggests the assumption is false, but it is actually true. For example, none of your 5 testers click the call-to-action button, but the reason is a confusing label, not a lack of desire. With a different label, the assumption would pass.

Managing the Risk

Torres argues that the risk of false positives and false negatives with small samples is real but manageable, and far preferable to the alternative of running large-scale experiments for every assumption ^cdh. The mitigation strategies are:

Triangulation -- Test the same assumption with multiple methods. If a prototype test and a one-question survey and an analysis of existing behavioral data all point in the same direction, your confidence should be high even with small samples from each method.
The early-signal-to-large-experiment ladder -- Start with the cheapest, fastest test (often a conversation or a single-question survey). If the signal is encouraging, invest in a more rigorous test (a prototype simulation, a concierge execution). If that signal is also encouraging, invest further (a limited launch, an A/B test with larger samples). Each rung of the ladder provides more confidence while the early rungs limit wasted investment.
Comparing across ideas -- When testing multiple solution ideas in parallel, relative performance matters more than absolute performance. If Idea A passes 4/5 and Idea B passes 2/5, you have useful signal even if neither number is statistically significant on its own.

Pre-Defining Success Criteria

This discipline is worth emphasizing on its own because it is the single most common point of failure in assumption testing across all the source books.

Torres requires teams to write success criteria before seeing any data ^cdh. Aulet requires that Key Assumptions be measurable before testing begins ^de. Knapp's sprint questions are written on Monday before the prototype is even sketched ^sprint.

The reason is cognitive: once you see data, your brain immediately begins constructing narratives to explain it. If the data is good, you take credit. If the data is bad, you find excuses. Pre-set criteria short-circuit this rationalization process. They create a commitment to intellectual honesty by defining what you will accept as evidence before your ego has a stake in the outcome.

Practical implementation: before every test, write a single sentence in this format: "This assumption is confirmed if [specific observable result]. It is disconfirmed if [specific observable result]. The result is ambiguous if [specific observable result], and we will [specific next step]."

Commitment Currencies: Beyond Verbal Validation

Rob Fitzpatrick identifies three "currencies" of commitment that are more reliable than verbal approval ^mom-test:

1. Time Commitment

Will the prospect invest their time? Agreeing to a follow-up meeting, participating in a pilot program, or introducing you to a colleague are all time commitments. They are weak signals individually but meaningful in aggregate.

2. Reputation Commitment

Will the prospect stake their professional reputation? Introducing you to their boss, recommending your product to a peer, or agreeing to be a public case study are reputation commitments. These are stronger signals because people protect their reputations carefully.

3. Financial Commitment

Will the prospect pay money? A pre-order, a letter of intent, a signed contract, or even a modest deposit is the strongest signal of real demand. Fitzpatrick argues that financial commitment is the only signal that fully eliminates the gap between "would use" and "will use" ^mom-test.

The hierarchy matters. Do not treat verbal enthusiasm as equivalent to financial commitment. "I would definitely buy that" is worth almost nothing. "Here is my credit card number" is worth almost everything. Structure your validation to seek the strongest commitment currency the stage allows ^mom-test.

Concept Testing with the 9segs Map

Nishiguchi offers a quantitative approach to concept testing that leverages the 9segs customer segmentation ^stck. The idea: when evaluating a new product concept, test it separately with customers from different segments of the 9segs map and compare their responses.

A concept that resonates with Loyal + High NPI customers but not with Aware Non-Customers + High NPI tells you something different than a concept that resonates with both. The first might reinforce existing customer loyalty but fail to acquire new customers. The second might have broader market potential.

This segment-specific concept testing prevents the "average response" problem, where a concept that half the market loves and half the market hates looks mediocre on aggregate ^stck. By breaking the response down by segment, you can see whether the concept serves your strategic priority -- whether that is retention, acquisition, or winback.

"Dogs Eating the Dog Food": Measuring Real Adoption

Aulet uses the vivid phrase "dogs eating the dog food" to describe the ultimate validation metric ^de. It is not enough for customers to say they like the product, or even to pay for it once. The real test is whether they adopt it into their actual workflow and continue using it.

Metrics that indicate "dogs eating the dog food":

Activation rate -- What percentage of sign-ups complete the key first action?
Retention -- What percentage return after Day 1? Day 7? Day 30?
Usage depth -- Are they using the core feature or just poking around?
Organic referral -- Are they telling others without being prompted?

These are lagging indicators -- you can only measure them after launching something real. But they are the indicators that ultimately determine whether you have product-market fit. All earlier assumption tests (prototypes, concierge tests, commitment currencies) are proxies for this final validation. Keep that in mind: the purpose of every earlier test is to build confidence that the dogs will, in fact, eat the dog food ^de.

Key Takeaways

Categorize your assumptions. Use the five categories (desirability, viability, feasibility, usability, ethical) as a checklist to surface blind spots.
Surface hidden assumptions actively. Story mapping, pre-mortems, and walking the OST lines are deliberate techniques for seeing what you have taken for granted.
Prioritize with assumption mapping. Plot assumptions on importance vs. evidence. Start with the upper-right quadrant -- high importance, weak evidence.
Design tests around behavior, not opinion. Simulate an experience, observe what people do, and define success criteria before you see any results.
Seek commitment currencies, not compliments. Time, reputation, and money are the only reliable signals that demand is real.
Triangulate. Use multiple methods to test the same assumption. Agreement across methods builds confidence; disagreement signals that more investigation is needed.
Climb the ladder. Start with cheap, fast tests and invest more only when early signals are encouraging. Do not leap to large-scale experiments before validating with small-scale ones.
Pre-define success criteria. Write down what confirmed, disconfirmed, and ambiguous results look like before running any test. This is the single most important discipline for honest validation.

What Qualz.ai does here

Qualz.ai lets you spin up a validation study — interviews, surveys, AI participants — in hours, so your riskiest assumptions never sit untested for a whole quarter.

Segmentation, Personas & Target Customers

Prototyping & Rapid Testing