Title: Experiment Design
Locale: en
URL: https://sensorswave.com/en/docs/experiments/experiment-design/
Description: Learn to design scientifically rigorous A/B experiments
Scientific experiment design is the foundation for reliable conclusions. This article introduces how to design a rigorous A/B experiment from problem definition, covering Hypothesis formulation, Metric selection, sample size calculation, Duration determination, and other key steps.

## Five-Step Experiment Design

### Step 1: Define the Problem and Hypothesis

**Problem Identification**

Clearly define what problem you want to solve:

- Low conversion rate: Is the checkout flow too complex?
- Low click-through rate: Is the button design not prominent enough?
- Low retention rate: Is the new user onboarding unclear?
- Slow revenue growth: Does the pricing strategy need adjustment?

**Formulate a Hypothesis**

Based on the problem, propose a testable Hypothesis:

**Good Hypotheses**:
- Specific: Clearly states what to change
- Measurable: Has clear Metrics
- Has expectations: Specifies expected improvement

**Examples**:

| Problem | Hypothesis |
|------|------|
| Checkout conversion rate is only 20% | Simplifying the checkout flow from 5 steps to 3 steps can increase conversion rate to 25% |
| Low add-to-cart button click-through rate | Changing the button color from blue to red can increase click-through rate by 10% |
| Unsatisfactory recommendation click-through rate | Using a deep learning algorithm can increase recommendation click-through rate by 15% |
| Low VIP conversion rate | Reducing the annual fee from ¥299 to ¥249 can increase purchase conversion rate by 30% |

### Step 2: Select Experiment Metrics

Experiment Metrics fall into three categories:

#### Primary Metric

The core Metric of focus, used to determine experiment Success or Failure:

- **Conversion metrics**: Click-through rate, registration rate, payment conversion rate
- **Revenue metrics**: Average revenue per user, total revenue, ROI
- **Engagement metrics**: Active days, session duration, content consumption

**Selection principles**:
- Directly related to business goals
- Can be accurately measured
- Sensitive to changes in user behavior

**Examples**:

| Experiment Type | Primary Metric |
|---------|---------|
| Checkout flow optimization | Payment conversion rate |
| Button color test | Button click-through rate |
| Recommendation algorithm comparison | Recommendation click-through rate |
| Pricing strategy test | Purchase conversion rate, total revenue |

#### Secondary Metrics

Supporting Metrics that help fully understand experiment impact:

- **User experience metrics**: Page dwell time, bounce rate
- **Downstream metrics**: Add-to-cart rate, bookmark rate, share rate
- **Long-term metrics**: Next-day retention, 7-day retention

**Example**:

Checkout flow optimization experiment:
- Primary Metric: Payment conversion rate
- Secondary Metrics: Average checkout duration, abandonment rate, order value

#### Guardrail Metrics

Ensure the experiment does not negatively affect critical Metrics:

- **Technical metrics**: Page load time, error rate, crash rate
- **User satisfaction**: NPS, complaint rate, uninstall rate
- **Revenue protection**: ARPU (Average Revenue Per User), total revenue should not decline

**Importance**:
- Prevent optimizing one Metric at the expense of others
- Example: While improving click-through rate, ensure conversion rate does not decline

**Examples**:

| Experiment Type | Guardrail Metrics |
|---------|---------|
| Recommendation algorithm comparison | Add-to-cart conversion rate, total revenue, page load time |
| UI redesign | Page load time, error rate, user complaint rate |
| Pricing strategy test | Total revenue, user satisfaction, churn rate |

### Step 3: Determine Experiment Variants

#### Two-Variant Experiment (Recommended)

**Control vs Test Group**:
- Control: Current solution (baseline)
- Test Group: New solution

**Advantages**:
- Simple to interpret results
- Lower sample size requirement
- Shorter Duration

**Applicable scenarios**:
- Validating a single optimization Hypothesis
- Comparing two clearly defined solutions

#### Multi-Variant Experiment

**Control vs Multiple Test Groups**:
- Control: Current solution
- Test Group A: Solution A
- Test Group B: Solution B
- Test Group C: Solution C (optional)

**Advantages**:
- Compare multiple solutions in one experiment
- Saves time and traffic

**Disadvantages**:
- Higher sample size requirement (longer Duration needed)
- Multiple comparison problem (significance level adjustment needed)

**Applicable scenarios**:
- Pricing strategy testing (comparing 3–4 price points)
- Design solution selection (comparing multiple designs)

**Considerations**:
- Keep the number of Variants manageable (recommended no more than 4)
- Each Variant needs sufficient sample size

### Step 4: Calculate Sample Size

Sample size determines how many users need to participate for reliable conclusions.

#### Sample Size Formula

```
Required sample per group = (Z_α/2 + Z_β)² × 2 × p × (1-p) / (MDE)²
```

**Parameter definitions**:

| Parameter | Meaning | Common Value |
|------|------|--------|
| **α (Alpha)** | Significance level | 0.05 (5%) |
| **β (Beta)** | Complement of statistical power | 0.2 (Power = 80%) |
| **p** | Baseline conversion rate | Actual business data |
| **MDE** | Minimum Detectable Effect | Expected improvement |

**Simplified calculation**:

For conversion rate Metrics, use this approximation:

```
Sample per group ≈ 16 × p × (1-p) / (MDE)²
```

**Example**:

Scenario: Current click-through rate is 20%, want to detect a 10% relative lift (from 20% to 22%)

```
p = 0.20
MDE = 0.02 (10% relative lift = 2% absolute lift)

Sample per group ≈ 16 × 0.20 × 0.80 / (0.02)²
              ≈ 16 × 0.16 / 0.0004
              ≈ 6,400
```

**Conclusion**: At least 6,400 users per group, 12,800 total for two groups.

#### MDE (Minimum Detectable Effect) Selection

**Definition**: The smallest improvement the experiment can detect.

**Smaller MDE**:
- Requires larger sample size
- Longer Duration
- Can detect more subtle differences

**Larger MDE**:
- Requires smaller sample size
- Shorter Duration
- Can only detect obvious differences

**Recommended MDE**:

| Metric Type | Recommended MDE | Notes |
|---------|---------|------|
| Conversion rate (low baseline) | 10–20% relative lift | When baseline  10% |
| Revenue metrics | 5–10% relative lift | Core business Metrics |
| Engagement metrics | 10–15% relative lift | Supporting Metrics |

**Practical tips**:
- New products or features: MDE can be larger (15–20%)
- Mature product optimization: MDE should be smaller (5–10%)
- Core Metric optimization: MDE can be relaxed slightly, extend experiment Duration

#### Online Sample Size Calculators

Use online tools for quick sample size calculation:
- Evan Miller's calculator: https://www.evanmiller.org/ab-testing/sample-size.html
- Optimizely's calculator: https://www.optimizely.com/sample-size-calculator/

### Step 5: Determine Experiment Duration

#### Duration Based on Sample Size

```
Duration (days) = Total sample size / (Daily users × Experiment traffic ratio)
```

**Example**:

Assumptions:
- Total sample size: 12,800 (two-variant, 6,400 per group)
- Daily users: 5,000
- Experiment traffic ratio: 50% (Control 25% + Test Group 25%)

```
Duration = 12,800 / (5,000 × 0.5) = 5.12 days
```

**Recommendation**: Round up to a complete cycle — at least 7 days (covering a full weekday + weekend cycle).

#### Consider Cyclical Factors

**Weekday vs Weekend differences**:
- E-commerce: Weekend traffic and conversion rates may be higher
- B2B products: Weekday traffic and activity are higher

**Recommendations**:
- Run for at least one full week (7 days)
- Extend Duration or avoid holidays when running across holidays

**Special periods**:
- During major promotions (Singles' Day, Black Friday): Avoid running experiments, or analyze separately
- During marketing campaigns: May affect experiment results, need to exclude or analyze separately

#### Duration Reference Table

| Traffic Scale | Recommended Duration | Notes |
|---------|---------|------|
| High traffic (> 100K daily) | 1–2 weeks | Sufficient sample size, quick conclusions |
| Medium traffic (10K–100K daily) | 2–4 weeks | Needs enough time to collect samples |
| Low traffic ( 10%, statistically Significant (p < 0.05), no negative impact on guardrail Metrics
- **Failure**: Click-through rate lift < 5%, or guardrail Metrics decline
```

---

## Related Documentation

- [Core Concepts](core-concepts.mdx): Understand how A/B experiments work
- [Create and Configure](create-and-configure.mdx): Create experiments in the console
- [Targeting and Allocation](targeting-and-allocation.mdx): Deep dive into the split mechanism
- [Metrics and Analysis](metrics-and-analysis.mdx): Analyze experiment results

---

**Last updated**: January 29, 2026