← Glossary
// FOUNDATIONS

Multivariate test (MVT)

An experiment that tests multiple variables at once in all combinations — e.g., 2 headlines × 3 button colors × 2 layouts = 12 variants. Reveals interaction effects an A/B test cannot, but demands much more traffic.

// what it is

A multivariate test runs all combinations of multiple variables in parallel. With three variables tested at two levels each, the test has 2 × 2 × 2 = 8 variants. The math lets you measure not just the main effect of each variable, but also interactions — whether the green button works better with headline A and the orange button works better with headline B. A/B testing one variable at a time cannot find those interactions.

The catch is sample size. An A/B test that needs 1,000 visitors per variant for adequate power needs roughly 8,000 for an 8-cell MVT — and that is just for the main effects, never mind detecting interactions, which require dramatically more traffic. For most founders, multivariate tests are mathematically out of reach. The right call is usually to A/B test sequentially: ship the headline winner, then test button color on top of it, accepting that you might miss interactions in exchange for verdicts that actually arrive.

// when this matters

When to use it

Run a multivariate test only when you have the traffic for it (typically 100k+ visitors per month on the surface) AND when interaction effects are a plausible source of value. Otherwise sequential A/B tests are faster, simpler, and more honest.

// deeper

What this looks like in practice

MVT shines on high-traffic surfaces where multiple page elements all have plausible independent effects — major retail homepages, large-publisher article layouts, mature SaaS pricing pages. The interaction effects are real on those surfaces and worth the sample-size cost. For everything below that traffic level, the calculation almost never works out: by the time you have powered an MVT, two or three sequential A/B tests would have shipped clearer wins.

Even on surfaces with adequate traffic, MVT has a verdict-readability problem. The output is a table of 12 (or 24, or 36) variant performances, several of which will be statistically tied. The team has to decide which combination to ship, and "the highest cell" is rarely the right answer — the highest cell often has the noisiest estimate. Treat MVT as a way to learn about main effects and one-step interactions, not as a way to mechanically pick the best cell.

Fractional factorial designs are a middle ground worth knowing about. Instead of running all 12 cells, you run a carefully chosen subset (say 6) that lets you estimate main effects efficiently while sacrificing some interaction-detection. The Taguchi method is the most well-known example. For founder-scale tests with three variables but only A/B-ish traffic budgets, a fractional design lets you learn most of what an MVT would tell you at half the cost.

// example

A worked example

// EXAMPLE

A landing-page MVT tests 2 headlines × 2 hero images × 3 CTA variants = 12 cells. To detect a 20% relative lift on a 4% baseline at 80% power, each cell needs ~3,500 visitors — 42,000 total. The team has 5,000 visitors per week, so the test takes 8+ weeks. Verdict: redesign as three sequential A/B tests on the highest-leverage variable, rather than waiting two months for an MVT that may still be underpowered for interactions.

// pitfalls

Common mistakes

  • Running MVT without the traffic.An MVT split across 12 cells with 800 users per cell is 12 underpowered tests, not one well-powered MVT. The math gets worse, not better, when you spread thin.
  • Picking the highest cell as the winner.The cell with the highest observed lift often got there by noise. Read MVT results as main effects + interactions, not as a mechanical best-cell pick.
  • Confusing MVT with A/B/n testing.A/B/n is multiple variants of one variable (4 headlines vs control). MVT is combinations of multiple variables. They have different sample-size profiles and different output structures; using the words interchangeably leads to wrong design choices.
Tool comparison. See how Xi compares with Optimizely, VWO, and AB Tasty.
// related

Related terms

Pick a hypothesis. Vocabulary done.

The fastest way to learn this vocabulary is to commit one experiment. The contract takes about five minutes to write.