How to Level up Your A/B Testing Game: Maximise Impact and Learning

Item not bookmarked
Resource bookmarked
Bookmarking...
⛏️
Guest Miner:
Sylvain Gauchet
Review star
Review star
Review star
Review star
Review star
💎  x
12

Karan Tibdewal (Growth Consultant at Phiture - mobile growth consultancy) talks about how you can level up your A/B testing game to maximise impact and learnings

Source:
How to Level up Your A/B Testing Game: Maximise Impact and Learning
(no direct link to watch/listen)
(direct link to watch/listen)
Type:
Presentation
Publication date:
April 2, 2020
Added to the Vault on:
April 5, 2020
Invite a guest
These insights were shared through the free Growth Gems newsletter.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
💎 #
1

To get to the point where more testing = more growth being true you need to have a thorough process in place. 

13:28
💎 #
2

3 key goals of the testing framework are:
1. Communicate ideas from all teams and prioritize based on company objectives 2. Understand and define leading and lagging metrics
3. Centralize test results and communicate next steps

14:12
💎 #
3

Use slides for your experiments: excel files are good for internal use but slides are better because they help get the buy-in from different teams and get them involved. 

17:13
💎 #
4

When it comes to goal metrics, you do not want to only optimize for revenue: you want "leading metrics" that feed in to revenue because there are a lot of variables that go into optimizing for revenue. 

22:48
💎 #
5

To know which ideas to prioritize, measure the Impact score with Impact = Reach x Relevance x Frequency. 

24:15
💎 #
6

Set up experiments for success by planning ahead of time and using a sample size calculator tool like the one from Optimizely (link here). 

28:27
💎 #
7

You should not have a "win or lose" mentality. Once you have a significant positive impact you want to scale up the test (apply to 100%) then you want to start iterating on it. Double down on high impact tests and keep iterating. 

31:20
💎 #
8

After 2 or 3 variations of a test idea that does not prove a significant positive impact, do not get too attached to your idea and stop iterating: it is not meant to be. 

32:05
💎 #
9

Do a reach audit where you assess how many tests you could run with your user base with your current engagement rate. Example: don't just look at how many people will receive that email and assume a 10% conversion rate. Instead, look at your past performance, look at how many users you can reach and use calculator to figure out time to reach statistical significance. 

35:50
💎 #
10

Onboarding and activation funnels are the main areas of improvement for almost all of the apps (even 5+ years apps!). 

36:20
💎 #
11

How long to run tests for? A week is often enough but it of course depends on volume. Usually the absolute difference between variants should be more than 200-300 to get proper test results (if you are very small). Do not go for tests that are more than 1-1.5 months. 

49:55
💎 #
12

How often do you revisit experiments? Once you have an experiment in place and you have maximized/iterated on it, it is good to have quarterly reviews. 

52:32
The gems from this resource are only available to premium members.
💎 #
1

To get to the point where more testing = more growth being true you need to have a thorough process in place. 

13:28
💎 #
2

3 key goals of the testing framework are:
1. Communicate ideas from all teams and prioritize based on company objectives 2. Understand and define leading and lagging metrics
3. Centralize test results and communicate next steps

14:12
💎 #
3

Use slides for your experiments: excel files are good for internal use but slides are better because they help get the buy-in from different teams and get them involved. 

17:13
💎 #
4

When it comes to goal metrics, you do not want to only optimize for revenue: you want "leading metrics" that feed in to revenue because there are a lot of variables that go into optimizing for revenue. 

22:48
💎 #
5

To know which ideas to prioritize, measure the Impact score with Impact = Reach x Relevance x Frequency. 

24:15
💎 #
6

Set up experiments for success by planning ahead of time and using a sample size calculator tool like the one from Optimizely (link here). 

28:27
💎 #
7

You should not have a "win or lose" mentality. Once you have a significant positive impact you want to scale up the test (apply to 100%) then you want to start iterating on it. Double down on high impact tests and keep iterating. 

31:20
💎 #
8

After 2 or 3 variations of a test idea that does not prove a significant positive impact, do not get too attached to your idea and stop iterating: it is not meant to be. 

32:05
💎 #
9

Do a reach audit where you assess how many tests you could run with your user base with your current engagement rate. Example: don't just look at how many people will receive that email and assume a 10% conversion rate. Instead, look at your past performance, look at how many users you can reach and use calculator to figure out time to reach statistical significance. 

35:50
💎 #
10

Onboarding and activation funnels are the main areas of improvement for almost all of the apps (even 5+ years apps!). 

36:20
💎 #
11

How long to run tests for? A week is often enough but it of course depends on volume. Usually the absolute difference between variants should be more than 200-300 to get proper test results (if you are very small). Do not go for tests that are more than 1-1.5 months. 

49:55
💎 #
12

How often do you revisit experiments? Once you have an experiment in place and you have maximized/iterated on it, it is good to have quarterly reviews. 

52:32
The gems from this resource are only available to premium members.

Gems are the key bite-size insights "mined" from a specific mobile marketing resource, like a webinar, a panel or a podcast.
They allow you to save time by grasping the most important information in a couple of minutes, and also each include the timestamp from the source.

💎 #
1

To get to the point where more testing = more growth being true you need to have a thorough process in place. 

13:28
💎 #
2

3 key goals of the testing framework are:
1. Communicate ideas from all teams and prioritize based on company objectives 2. Understand and define leading and lagging metrics
3. Centralize test results and communicate next steps

14:12
💎 #
3

Use slides for your experiments: excel files are good for internal use but slides are better because they help get the buy-in from different teams and get them involved. 

17:13
💎 #
4

When it comes to goal metrics, you do not want to only optimize for revenue: you want "leading metrics" that feed in to revenue because there are a lot of variables that go into optimizing for revenue. 

22:48
💎 #
5

To know which ideas to prioritize, measure the Impact score with Impact = Reach x Relevance x Frequency. 

24:15
💎 #
6

Set up experiments for success by planning ahead of time and using a sample size calculator tool like the one from Optimizely (link here). 

28:27
💎 #
7

You should not have a "win or lose" mentality. Once you have a significant positive impact you want to scale up the test (apply to 100%) then you want to start iterating on it. Double down on high impact tests and keep iterating. 

31:20
💎 #
8

After 2 or 3 variations of a test idea that does not prove a significant positive impact, do not get too attached to your idea and stop iterating: it is not meant to be. 

32:05
💎 #
9

Do a reach audit where you assess how many tests you could run with your user base with your current engagement rate. Example: don't just look at how many people will receive that email and assume a 10% conversion rate. Instead, look at your past performance, look at how many users you can reach and use calculator to figure out time to reach statistical significance. 

35:50
💎 #
10

Onboarding and activation funnels are the main areas of improvement for almost all of the apps (even 5+ years apps!). 

36:20
💎 #
11

How long to run tests for? A week is often enough but it of course depends on volume. Usually the absolute difference between variants should be more than 200-300 to get proper test results (if you are very small). Do not go for tests that are more than 1-1.5 months. 

49:55
💎 #
12

How often do you revisit experiments? Once you have an experiment in place and you have maximized/iterated on it, it is good to have quarterly reviews. 

52:32

Notes for this resource are currently being transferred and will be available soon.

Why do we experiment

  • Favor data over opinions
  • You need to track the right things for it to work
  • Overall goal: optimize revenue and user engagement


Different levels depending on company:


This talk focuses on Level 2

Main complexities and challenges

  • Unorganized testing, no tracking, chaos → everybody has different opinions → decision ends up being made by HIPPO (Highest Paid Person's Opinion)
  • 3 Crucial elements to successful A/B testing
  1. Estimation & priorization of impact (RR)
  2. Measurement and communication of test results - you get input from different teams but often you do not give feedback after tests
  3. Continual optimization of successful tests - do not stop after a succesful tests, you should be able to optimize further
  • [💎@13:28] To get to the point where more testing = more growth being true you need to have a thorough process in place.


Framework for A/B testing

[💎@14:12] 3 key goals of the testing framework are :

  1. Communicate ideas from all teams and prioritize based on company objectives
  2. Understand and define core metrics: leading and lagging
  3. Centralize test results and communicate next steps - your test results should not depend on team members (in case of turnover for example)


Different organizations set up their team structures in different ways: CRM doesn't know what Product is doing vs. Metric-driven organization.


Framework allows for iterative learnings, even if you do not do it step by step every time.


While going through the framework, there are 3 key output documents:

  1. Campaign Backlog Document - document ideas that have been generated and work them out in details
  2. Sprint Plan - to see if you are in line with your business objectives and testing frequency
  3. [💎@17:13] Experiment slides - excel files good for internal use but slides better because it allows to get the buy-in from different teams and get them involved.


1. Ideation

Main ways to come up with hypothesis

  1. Quantitative data: what happens in your app like user engagement, feature adoption, onboarding funnel, activation flow. Example: seeing a major drop-off in onboarding funnel.
  2. Qualitative data: user surveys and customer care to understand the pain points that customers of facing and identify trends. It can also be based on positive feedback where you could play on your strengths.
  3. Brainstorming: Natural Trigger Brainstorm (see below), Driver trees (starts with an observation then you ask why in layers until you get to an hypothesis that can be an idea for an experiment)


Initial ideation session to create a backlog (SoundCloud illustrative example)


2. Specify + Prioritize

The make or break of the testing is how you specify and prioritize ideas. You need to bring ideas down to reality in the Campaign Backlog Document.

[💎@22:48] When it comes to goal metrics, you do not want to only optimize for revenue: you want "leading metrics" that feed in to revenue because there are a lot of variables that go into optimizing for revenue.


Dependencies is a crucial part because you might have good ideas but team might not have time or resources to work on these ideas.


3. Prioritization with the RRF Framework


[💎@24:15] To know which ideas to prioritize, measure the Impact score with Impact = Reach x Relevance x Frequency.

Scale of 1 to 5

Resources required to push a test live is a critical element.


4. Setup + Running tests


Measurement - Campaign and Global Controls

How are your campaigns set up, do you have the control groups in place. You do not want to just test variants, you also need a control group.

Besides the control group you want to start with a 50/50 split test because it gives you more direction.


The master control group size depends on your company size (the bigger the user base the smaller % you can use) and can allow you to prove the worth of a team.


Running valid experiments

Checklist to keep in mind


[💎@28:27] Set up experiments for success by planning ahead of time and using a sample size calculator tool like the one from Optimizely (link here).


You might want to give up a bit on significance to get more speed. In this case you have less precise test but can still get direction from results.


4. Analyze results

Your "leading indicator metrics" (e.g. subscription purchase, app open, outbound call) feed into your core metrics (W1/M1 retention, increase of power users).


5. Decide - Scale, Iterate or Kill

[💎@31:20] You should not have a "win or lose" mentality. Once you have a significant positive impact you want to scale it (apply to 100%) then you want to start iterating on it. Double down on high impact tests and keep iterating.


[💎@32:05] After 2 or 3 variations of an idea that does not prove a significant positive impact, do not get too attached to your idea and stop iterating: it is not meant to be.


6. Consolidate learnings

This is the most manual but the most vital when it comes to showing what you have worked on and getting buy-in from teams.


It helps when it is more visual



Tips and low hanging fruits

  • [💎@35:50] Do a reach audit where you assess how many tests you could run with your user base with your current engagement rate. Example: don't just look how many people will receive that email and assume a 10% conversion rate. Instead, look at your past performance, look at how many users you can reach and use calculator to figure out time to reach statistical significance.
  • [💎@36:20] Onboarding and activation funnels are the main areas of improvement for almost all of the apps (even 5+ years apps!).
  • Use directional significance and focus on speed if the value that you are getting could double your growth.
  • Double down on high impact tests.


Examples


Q&A


  • How to exclude paid traffic? Depends on your tech stack. Phiture typically does not take that into account for the test but tracks that retroactively
  • What if you are a Level 0 company (no tests at all)? Think about ideation and prioritizing the activation and onboarding funnel and use the master control group to show the value. A thorough process helps you get buy in.
  • Figuring out Reach or Relevance is a guessing job at the start but as you start doing more tests and getting more data you will become better at it
  • Lead and lagging metrics:
  • For lagging metrics (e.g. retention) you are usually going to measure the impact later like a month ago
  • So best to focus on lead metrics that the test directly impacts so you can take your decisions
  • Who to involve? It works even with siloed team (e.g. just CRM team)
  • [💎@49:55] How long to run tests for? A week is often enough but it of course depends on volume. Usually the absolute difference between variants should be more than 200-300 to get proper test results (if you are very small). Do not go for tests that are more than 1-1.5 months.
  • Firebase as A/B test tool? Quite a good tool but encourages to test other softwares.
  • [💎@52:32] How often do you revisit experiments? Once you have an experiment in place and you have maximized/iterated on it, it is good to have quarterly reviews.
  • When running tests in parallel how do you ensure users are not affected by other tests? One tool that really helps is a mapping of user stages and they try to keep tabs on which experiments are running and their dependencies. But not easy so the master control helps a lot.


The notes from this resource are only available to premium members.

Why do we experiment

  • Favor data over opinions
  • You need to track the right things for it to work
  • Overall goal: optimize revenue and user engagement


Different levels depending on company:


This talk focuses on Level 2

Main complexities and challenges

  • Unorganized testing, no tracking, chaos → everybody has different opinions → decision ends up being made by HIPPO (Highest Paid Person's Opinion)
  • 3 Crucial elements to successful A/B testing
  1. Estimation & priorization of impact (RR)
  2. Measurement and communication of test results - you get input from different teams but often you do not give feedback after tests
  3. Continual optimization of successful tests - do not stop after a succesful tests, you should be able to optimize further
  • [💎@13:28] To get to the point where more testing = more growth being true you need to have a thorough process in place.


Framework for A/B testing

[💎@14:12] 3 key goals of the testing framework are :

  1. Communicate ideas from all teams and prioritize based on company objectives
  2. Understand and define core metrics: leading and lagging
  3. Centralize test results and communicate next steps - your test results should not depend on team members (in case of turnover for example)


Different organizations set up their team structures in different ways: CRM doesn't know what Product is doing vs. Metric-driven organization.


Framework allows for iterative learnings, even if you do not do it step by step every time.


While going through the framework, there are 3 key output documents:

  1. Campaign Backlog Document - document ideas that have been generated and work them out in details
  2. Sprint Plan - to see if you are in line with your business objectives and testing frequency
  3. [💎@17:13] Experiment slides - excel files good for internal use but slides better because it allows to get the buy-in from different teams and get them involved.


1. Ideation

Main ways to come up with hypothesis

  1. Quantitative data: what happens in your app like user engagement, feature adoption, onboarding funnel, activation flow. Example: seeing a major drop-off in onboarding funnel.
  2. Qualitative data: user surveys and customer care to understand the pain points that customers of facing and identify trends. It can also be based on positive feedback where you could play on your strengths.
  3. Brainstorming: Natural Trigger Brainstorm (see below), Driver trees (starts with an observation then you ask why in layers until you get to an hypothesis that can be an idea for an experiment)


Initial ideation session to create a backlog (SoundCloud illustrative example)


2. Specify + Prioritize

The make or break of the testing is how you specify and prioritize ideas. You need to bring ideas down to reality in the Campaign Backlog Document.

[💎@22:48] When it comes to goal metrics, you do not want to only optimize for revenue: you want "leading metrics" that feed in to revenue because there are a lot of variables that go into optimizing for revenue.


Dependencies is a crucial part because you might have good ideas but team might not have time or resources to work on these ideas.


3. Prioritization with the RRF Framework


[💎@24:15] To know which ideas to prioritize, measure the Impact score with Impact = Reach x Relevance x Frequency.

Scale of 1 to 5

Resources required to push a test live is a critical element.


4. Setup + Running tests


Measurement - Campaign and Global Controls

How are your campaigns set up, do you have the control groups in place. You do not want to just test variants, you also need a control group.

Besides the control group you want to start with a 50/50 split test because it gives you more direction.


The master control group size depends on your company size (the bigger the user base the smaller % you can use) and can allow you to prove the worth of a team.


Running valid experiments

Checklist to keep in mind


[💎@28:27] Set up experiments for success by planning ahead of time and using a sample size calculator tool like the one from Optimizely (link here).


You might want to give up a bit on significance to get more speed. In this case you have less precise test but can still get direction from results.


4. Analyze results

Your "leading indicator metrics" (e.g. subscription purchase, app open, outbound call) feed into your core metrics (W1/M1 retention, increase of power users).


5. Decide - Scale, Iterate or Kill

[💎@31:20] You should not have a "win or lose" mentality. Once you have a significant positive impact you want to scale it (apply to 100%) then you want to start iterating on it. Double down on high impact tests and keep iterating.


[💎@32:05] After 2 or 3 variations of an idea that does not prove a significant positive impact, do not get too attached to your idea and stop iterating: it is not meant to be.


6. Consolidate learnings

This is the most manual but the most vital when it comes to showing what you have worked on and getting buy-in from teams.


It helps when it is more visual



Tips and low hanging fruits

  • [💎@35:50] Do a reach audit where you assess how many tests you could run with your user base with your current engagement rate. Example: don't just look how many people will receive that email and assume a 10% conversion rate. Instead, look at your past performance, look at how many users you can reach and use calculator to figure out time to reach statistical significance.
  • [💎@36:20] Onboarding and activation funnels are the main areas of improvement for almost all of the apps (even 5+ years apps!).
  • Use directional significance and focus on speed if the value that you are getting could double your growth.
  • Double down on high impact tests.


Examples


Q&A


  • How to exclude paid traffic? Depends on your tech stack. Phiture typically does not take that into account for the test but tracks that retroactively
  • What if you are a Level 0 company (no tests at all)? Think about ideation and prioritizing the activation and onboarding funnel and use the master control group to show the value. A thorough process helps you get buy in.
  • Figuring out Reach or Relevance is a guessing job at the start but as you start doing more tests and getting more data you will become better at it
  • Lead and lagging metrics:
  • For lagging metrics (e.g. retention) you are usually going to measure the impact later like a month ago
  • So best to focus on lead metrics that the test directly impacts so you can take your decisions
  • Who to involve? It works even with siloed team (e.g. just CRM team)
  • [💎@49:55] How long to run tests for? A week is often enough but it of course depends on volume. Usually the absolute difference between variants should be more than 200-300 to get proper test results (if you are very small). Do not go for tests that are more than 1-1.5 months.
  • Firebase as A/B test tool? Quite a good tool but encourages to test other softwares.
  • [💎@52:32] How often do you revisit experiments? Once you have an experiment in place and you have maximized/iterated on it, it is good to have quarterly reviews.
  • When running tests in parallel how do you ensure users are not affected by other tests? One tool that really helps is a mapping of user stages and they try to keep tabs on which experiments are running and their dependencies. But not easy so the master control helps a lot.


The notes from this resource are only available to premium members.

Why do we experiment

  • Favor data over opinions
  • You need to track the right things for it to work
  • Overall goal: optimize revenue and user engagement


Different levels depending on company:


This talk focuses on Level 2

Main complexities and challenges

  • Unorganized testing, no tracking, chaos → everybody has different opinions → decision ends up being made by HIPPO (Highest Paid Person's Opinion)
  • 3 Crucial elements to successful A/B testing
  1. Estimation & priorization of impact (RR)
  2. Measurement and communication of test results - you get input from different teams but often you do not give feedback after tests
  3. Continual optimization of successful tests - do not stop after a succesful tests, you should be able to optimize further
  • [💎@13:28] To get to the point where more testing = more growth being true you need to have a thorough process in place.


Framework for A/B testing

[💎@14:12] 3 key goals of the testing framework are :

  1. Communicate ideas from all teams and prioritize based on company objectives
  2. Understand and define core metrics: leading and lagging
  3. Centralize test results and communicate next steps - your test results should not depend on team members (in case of turnover for example)


Different organizations set up their team structures in different ways: CRM doesn't know what Product is doing vs. Metric-driven organization.


Framework allows for iterative learnings, even if you do not do it step by step every time.


While going through the framework, there are 3 key output documents:

  1. Campaign Backlog Document - document ideas that have been generated and work them out in details
  2. Sprint Plan - to see if you are in line with your business objectives and testing frequency
  3. [💎@17:13] Experiment slides - excel files good for internal use but slides better because it allows to get the buy-in from different teams and get them involved.


1. Ideation

Main ways to come up with hypothesis

  1. Quantitative data: what happens in your app like user engagement, feature adoption, onboarding funnel, activation flow. Example: seeing a major drop-off in onboarding funnel.
  2. Qualitative data: user surveys and customer care to understand the pain points that customers of facing and identify trends. It can also be based on positive feedback where you could play on your strengths.
  3. Brainstorming: Natural Trigger Brainstorm (see below), Driver trees (starts with an observation then you ask why in layers until you get to an hypothesis that can be an idea for an experiment)


Initial ideation session to create a backlog (SoundCloud illustrative example)


2. Specify + Prioritize

The make or break of the testing is how you specify and prioritize ideas. You need to bring ideas down to reality in the Campaign Backlog Document.

[💎@22:48] When it comes to goal metrics, you do not want to only optimize for revenue: you want "leading metrics" that feed in to revenue because there are a lot of variables that go into optimizing for revenue.


Dependencies is a crucial part because you might have good ideas but team might not have time or resources to work on these ideas.


3. Prioritization with the RRF Framework


[💎@24:15] To know which ideas to prioritize, measure the Impact score with Impact = Reach x Relevance x Frequency.

Scale of 1 to 5

Resources required to push a test live is a critical element.


4. Setup + Running tests


Measurement - Campaign and Global Controls

How are your campaigns set up, do you have the control groups in place. You do not want to just test variants, you also need a control group.

Besides the control group you want to start with a 50/50 split test because it gives you more direction.


The master control group size depends on your company size (the bigger the user base the smaller % you can use) and can allow you to prove the worth of a team.


Running valid experiments

Checklist to keep in mind


[💎@28:27] Set up experiments for success by planning ahead of time and using a sample size calculator tool like the one from Optimizely (link here).


You might want to give up a bit on significance to get more speed. In this case you have less precise test but can still get direction from results.


4. Analyze results

Your "leading indicator metrics" (e.g. subscription purchase, app open, outbound call) feed into your core metrics (W1/M1 retention, increase of power users).


5. Decide - Scale, Iterate or Kill

[💎@31:20] You should not have a "win or lose" mentality. Once you have a significant positive impact you want to scale it (apply to 100%) then you want to start iterating on it. Double down on high impact tests and keep iterating.


[💎@32:05] After 2 or 3 variations of an idea that does not prove a significant positive impact, do not get too attached to your idea and stop iterating: it is not meant to be.


6. Consolidate learnings

This is the most manual but the most vital when it comes to showing what you have worked on and getting buy-in from teams.


It helps when it is more visual



Tips and low hanging fruits

  • [💎@35:50] Do a reach audit where you assess how many tests you could run with your user base with your current engagement rate. Example: don't just look how many people will receive that email and assume a 10% conversion rate. Instead, look at your past performance, look at how many users you can reach and use calculator to figure out time to reach statistical significance.
  • [💎@36:20] Onboarding and activation funnels are the main areas of improvement for almost all of the apps (even 5+ years apps!).
  • Use directional significance and focus on speed if the value that you are getting could double your growth.
  • Double down on high impact tests.


Examples


Q&A


  • How to exclude paid traffic? Depends on your tech stack. Phiture typically does not take that into account for the test but tracks that retroactively
  • What if you are a Level 0 company (no tests at all)? Think about ideation and prioritizing the activation and onboarding funnel and use the master control group to show the value. A thorough process helps you get buy in.
  • Figuring out Reach or Relevance is a guessing job at the start but as you start doing more tests and getting more data you will become better at it
  • Lead and lagging metrics:
  • For lagging metrics (e.g. retention) you are usually going to measure the impact later like a month ago
  • So best to focus on lead metrics that the test directly impacts so you can take your decisions
  • Who to involve? It works even with siloed team (e.g. just CRM team)
  • [💎@49:55] How long to run tests for? A week is often enough but it of course depends on volume. Usually the absolute difference between variants should be more than 200-300 to get proper test results (if you are very small). Do not go for tests that are more than 1-1.5 months.
  • Firebase as A/B test tool? Quite a good tool but encourages to test other softwares.
  • [💎@52:32] How often do you revisit experiments? Once you have an experiment in place and you have maximized/iterated on it, it is good to have quarterly reviews.
  • When running tests in parallel how do you ensure users are not affected by other tests? One tool that really helps is a mapping of user stages and they try to keep tabs on which experiments are running and their dependencies. But not easy so the master control helps a lot.