Understand the practical Use of RevenueIQ

Discover how to accurately measure and optimize revenue in your experiments thanks to our patented feature. For a deeper dive, download our whitepaper.

The most important KPI in e-commerce is revenue. In an optimization context, this means optimizing two axes:

Conversion: “Turning as many visitors as possible into customers.”
Average Order Value (AOV): “Generating as much value as possible per customer.”

However, CRO often remains focused on optimizing conversion. AOV is often neglected in analysis due to its statistical complexity. AOV is very difficult to estimate correctly with classic tests (t-test, Mann-Whitney) because of highly skewed purchase distributions with no upper bound. RevenueIQ offers a robust test that directly estimates the distribution of the effect on revenue (via a refined estimation of AOV), providing both probability of gain (“chance to win”) and consistent confidence intervals. In benchmarks, RevenueIQ maintains a correct false positive rate, has power close to Mann-Whitney, and confidence intervals four times narrower than the t-test. By combining the effects of AOV and CR, it delivers an RPV impact and then an actionable revenue projection.

To learn more, read our RevenueIQ White paper

RevenueIQ white paper.pdf

Context & Problem

In CRO, we often optimize CR due to a lack of suitable tools for revenue. Yet, Revenue = Visitors × CR × AOV; ignoring AOV distorts the view.

AOV is misleading:

Unbounded (someone can buy many items).
Highly right-skewed (many small orders, a few very large ones).
A few “large and rare” values can dominate the average.
In random A/B splits, these large orders can be unevenly distributed → huge variance in observed AOV.

Limitations of Classic Tests

t-test: Assumes normality (or relies on the Central Limit Theorem for the mean). On highly skewed e-commerce data, the CLT variance formula is unreliable at realistic volumes. Result: very low power (detects ~15% of true winners in the benchmark) and gives very wide confidence intervals → slow and imprecise decisions.
Mann-Whitney (MW): Robust to non-normality (works on ranks), so much more powerful (~80% detection in the benchmark). But only provides a p-value (thus only trend information), not an estimate of effect size (no confidence interval) → impossible to quantify the business case.

RevenueIQ: Principle

It uses and combines two innovative approaches:

Uses a bootstrap technique to study the variability of a measure with unknown statistical behavior.
Instead of measuring the difference in average baskets, it measures the average of basket differences. It compares sorted order differences between variants (A and B), with weighting by density (approx. log-normal) to favor “comparable” pairs. This bypasses the problem of very large observed value differences in such data.

And it deduces:

The Chance to win (probability that the effect is > 0), readable for decision-makers.
Narrow and reliable confidence intervals on the AOV effect as well as on revenue.

Benchmarks (AOV)

Alpha validity (on AA tests): good control of false positives. Using a typical 95% threshold exposes only a 5% false positive risk.
Statistical power measurement: 1000 AB tests with a known effect of +€5
- MW Test: 796/1000 winners, ~80% power.
- t-test: 146/1000, only 15% power.
- RevenueIQ: 793/1000 (≈ equivalent to MW). ~80% power.
Confidence interval (CI): RevenueIQ produces CIs of €8 width, which is reasonable and functional in the context of a real effect of €5. With an average CI width of €34, the t-test is totally ineffective.
CI coverage: The validity of the confidence intervals was verified. A 95% CI indeed has a 95% chance of containing the true effect value (i.e., €0 for AA tests and €5 for AB tests).

From AOV KPI to Revenue

Beyond techniques and formulas, just remember that RevenueIQ uses a Bayesian method for AOV analysis, allowing this metric to be merged with conversion. Our competitors use frequentist methods, at least for AOV, making any combination of results impossible. Under the hood, RevenueIQ combines conversion and AOV results into a central metric: visitor value (RPV). With precise knowledge of RPV, revenue (€ or other currency) is then projected by multiplying by the targeted traffic (for a given period).

Real Case (excerpt) Here is a textbook case for RevenueIQ:

Conversion gain is 92% CTW, encouraging but not “significant” by standard threshold.
AOV gain is at 80% CTW. Similarly, taken separately, this is not enough to declare a winner.
The combination of these two metrics gives a CTW of 95.9% for revenue, enabling a simple and immediate decision, where a classic approach would have required additional data collection while waiting for one of the two KPIs (CR or AOV) to become significant.
For an advanced business decision, RevenueIQ provides an estimated average gain of +€50k, with a confidence interval [-€6,514; +€107,027], allowing identification of minimal risk and substantial gain.

What This Changes for Experimentation

Without RevenueIQ: “inconclusive” results (or endless tests) ⇒ missed opportunities.
With RevenueIQ: faster, quantified decisions (probability, effect, CI), at the revenue level (RPV then projected revenue).

Practical Recommendations

Stop interpreting observed AOV without safeguards: it is highly volatile.
Avoid filtering/Winsorizing “extreme values”: arbitrary thresholds ⇒ bias.
Measure CR & AOV jointly and reason in RPV to reflect business reality.
Use RevenueIQ to obtain chance to win + CI on AOV, RPV, and revenue projection.
Decide via projected revenue (average gain, lower CI bound) rather than isolated p-values.

Conclusion

RevenueIQ brings a robust and quantitative statistical test to monetary metrics (AOV, RPV, revenue), where:

t-test is weak and imprecise on e-commerce data,
Mann-Whitney is powerful but not quantitative. RevenueIQ enables faster detection, quantification of business impact, and prioritization of deployments with explicit confidence levels.

2MB

RevenueIQ white paper.pdf

PDF

Open

PreviousAnalysing transaction goals NextData & reports: generalities and definitions

Last updated 1 day ago

Was this helpful?