Archive

Archive for the ‘Analytics & Testing’ Category

Marketing Optimization: How to design split tests and multi-factorial tests

January 23rd, 2012 No comments

I’ve got a research question. Now what do I do with it?

A few weeks ago, Daniel Burstein wrote a blog about writing research questions. In that blog post, we emphasized the importance of asking “which” rather than “what” questions because a “which” question is clearly testable.

You might ask, “Which page format results in the most lead submissions?” or “Which price point generates the most revenue?” Both questions are clearly stated and include two key pieces of information:

  • An independent variable you are going to test
  • The dependent variable you will use to measure your results

 

To know if something is better, first you must know if it is different

With the research question on paper, we can easily create a hypothesis. For the former question: “All page formats will result in the same number of lead submissions.” This type of hypothesis is so famous in research circles that it has a name: “The Null Hypothesis.”

In general terms, the null hypothesis states that varying the independent variable will result in no change to the dependent variable.

In other words, you’re testing to see if changing the page (the independent variable) will change the number of leads (the dependent variable). After all, if there is no change, one cannot be any better than the other.

Why not “The new layout will result in the most lead submissions,” you ask. Because there is no concrete reason to know that there will be a change. Besides, if you already knew the effect of A on B, why would you need to test it?

 

Control vs. Treatment(s)

In most cases, there will be an existing page that all new versions will be compared to. This page is termed the “Control,” and all new pages are dubbed “Treatments” to guide comparisons later.

The next step in testing your research question is to decide on the most appropriate test structure. This will depend on the number of variations you will be testing, and on the amount of traffic your site receives. At MECLABS, our research analysts do this visually using a small flowchart to represent the flow of traffic to the control and treatment pages.

Take your latest research question and write it down. Below it, write out the following until you have listed all the variations to be tested.

 

Click to enlarge

 

At the right hand side of the page, write “All Traffic.” At this point, you need to determine if your traffic should be evenly split between all the tests or if you will pull only a small portion of  traffic into the treatment pages and maintain most of the flow to the existing Control page.

At MECLABS, our analysts use the Test Protocol document to determine how many site visits are required to achieve valid results given a set of treatments and typical conversion rates on the existing page. This process is covered in our Online Testing Course.

 

Split tests

Draw lines between “All Traffic” and the pages to the left showing the split and mark each with a percentage of traffic to be sent in that path (See below). This design is called a split test. It is very important that traffic is randomly split between the treatments and control. In a high traffic site, the percentage sent to the control can be higher than what is sent to the treatments, as long as you will easily meet the required minimum sample size.

 

Click to enlarge

 

Multi-factorial tests

The split test design works for tests of only one step, but sometimes we need to test more than one step in a process. We have two independent variables that we will manipulate separately. For example, if your research question is, “Which checkout process generates the most revenue?” you might want to test several variations of cart layout and payment page layout at the same time.

If you were to test [Cart and Payment Treatment 1] against [Cart and Payment Treatment 2], your results might tell you that [CT and PT 1] produced 15% more revenue than [CT and PT 2], but you would never learn that Cart Treatment 1 paired with Payment Treatment 2 would have yielded an even higher lift!

Essentially, you have two research questions: “Which cart design will generate the most revenue?” and “Which payment design will generate the most revenue?” This means you have two independent variables and one dependent variable.

 

To test multi-step processes, researchers use a research design called a factorial test. Each variation in each independent variable is tested together so that all combinations are tested. A typical factorial design is represented below.

 

Click to enlarge

 

Because the traffic is sent evenly to each pairing, the factorial research design accounts for the natural dependency between steps 1 and 2. If a viewer does not like Cart Treatment 1, they will not proceed to the Payment step, but since you have also tested other combinations of Cart and Payment, you can assume the effect is balanced out.

A factorial test requires a lot more traffic than a split test to achieve validity, but it also gathers a lot more insight. From the results of a factorial test, you can infer not only the winning combination but also which treatment of each step was most successful. This subtle distinction comes in handy if you then wanted to test further refinements of the process.

 

Click to enlarge

 


There are some situations that cause problems with research design. It may not always make sense to pair all the possible combinations together, in which case a factorial design is not possible and a split test should be used instead.

Don’t make the mistake of forming all but one or two pairs of the factorial design. An asymmetrical design does not neutralize the dependency of the second step on the first. In other words, if every factor isn’t matched with every possible other factor, you could overlook a potentially big lift.

 

Traffic volume is crucial for factorial tests

One common reason some marketers don’t run multi-factorial tests is a low-traffic page. For example, with only 3,000 hits a month, a 7% historical conversion rate, and six treatment pairs (2 payment designs x 3 cart designs), it could take as much as three years to validate the factorial design shown above!

When faced with an unreasonable completion time, you have a few choices to make. You can test fewer treatments, resulting in quicker accumulation of hits on each treatment, or you can test one step of the checkout process at a time.

You also have the option to test pairs of pages in a split test, losing the additional insights given by the factorial design. All of those options will reduce the time needed to validate the test.

 

Sequential tests

Some marketers try to learn about which treatment works best through sequential tests. Essentially, one page was live, or one email was sent, and then the page was changed, or another email was sent. One treatment is left online for a set period, followed by the next treatment, and so forth. This is usually because there was no test design to begin with, and marketers are comparing results after the fact.

This could also be because marketers do have a test design but are unable to split traffic. After all, if you can only direct traffic to a single page design at a time, you can only test pages sequentially. (However, with the wide availability of both free and paid optimization tools, this situation has become quite rare.)

Sequential tests are extremely prone to history effects, where an outside event or phenomenon affects the viewers’ behaviors on the site from one moment in time to another (see our Online Testing Course for more information on History Effects).

For example, an email sent out to the mailing list will increase traffic to whatever homepage treatment is currently online, distorting the actual effect of the design changes. This effect is usually noticeable as a sudden rise on an analytics traffic or conversion chart. Although it is not an optimal research design, this type of study can distinguish between a control and a treatment page. Results should only be interpreted if the possibility of history effect has been considered and found insignificant.

 

Related Resources:

Marketing Optimization: You can’t find the true answer without the right question

Artificial Optimization: Why at least 40% of marketers shouldn’t test

Marketing Optimization: How to determine the proper sample size

 

 

Share and Enjoy:
  • LinkedIn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • Digg

Marketing Metrics: Why all numbers aren’t created equal

January 16th, 2012 No comments

What do you get when you divide Jacksonville Beach, Fla. by Arden Hills, MN? I’m sure there’s a punch line in there somewhere. However, if you were tracking your customers’ ZIP codes in a database you would have 32250/55112, or 0.585.

Never mind that it doesn’t make any sense to you and me to divide one ZIP code by another, but a statistical software package is happy to do exactly that for us. Most software just isn’t smart enough to realize that each ZIP code holds a discrete meaning from the next. It sees them as numbers: values which can be sorted in order and used in any type of calculation.

That is why researchers and statistical software packages classify variables into four main types: Nominal, Ordinal, Interval and Ratio.

In this post, I’m going to describe each type of variable to help you understand how they should be used, let you know how this can help improve your data collection…and, while we’re at it, help you sound sharp the next time you’re chatting with your data analyst at the water cooler.

-

Nominal Variables: Used to describe categories

Variables are classified by the structure of what they represent. For example, ZIP codes are an example of a Nominal variable, a categorical name which simply allows us to differentiate between groups.

Gender and Ethnic group are other common examples of this type. Only a limited number of statistical analyses are valid for this type of variable. We can count how many customers have each ZIP code, and compare the counts to see what is most common (Statisticians call this most frequent value the Mode).

We cannot “average” their ZIP codes to determine a population center, or calculate correlations between ZIP code and a customer satisfaction index because there is no real meaning to a “higher” or “lower” numerical ZIP code.

If we wanted to know about geographic patterns in customer satisfaction, we would have to take the average satisfaction index for each ZIP code and compare those averages to one another. Browser type and operating system are two other common Nominal variables.

Word of Caution – This first one seems obvious, but keep in mind it is an easy oversight to have a number in a spreadsheet or database inadvertently become part of a calculation.

-

Ordinal Variables: Used to rank preference

The next level of complexity is represented by the Ordinal variable. Ordinal variables are sequential; they advance in a direction but the increments on the scale are unknown or uneven.

For example, the organizational chart of a company might show that the mailroom attendant is below the marketing analyst, and he in turn is below the vice president, who is below the president. There is a clear direction, but the relationship between ranks is not consistent.

In marketing research, consumers sometimes rank new products in order of preference. They do not necessarily like product 1 twice as much as product 2, or 3 twice as much as 4. So when analyzing the data from the test, a researcher can find the Mode, or calculate the middle ranked item (the Median), but it is not valid to calculate the “average rating” given to an item. Because the distance between items on the scale is unknown it is not possible to really tell an average value.

Calculations such as addition and multiplication can be done with ordinal data, however any calculation made on one must be consistently made on all items in the data set, in order to maintain the proportions and order of all members of the data set.

Word of caution – One common survey scale is the Likert scale, which allows respondents to rate their agreement with statements on a 5- or 7-point scale from “Strongly Agree” to “Strongly Disagree.” Because there is no way to know the difference between “Strongly Agree” and “Agree” in the mind of each respondent, or to ensure that each respondent is consistent in their judgments, these results are Ordinal data.

Many research studies treat Ordinal data as Interval data (more on that next), making a basic and sometimes flawed assumption that the scale represents a consistent interval between one ranking and the next. While each individual will be relatively consistent in their ratings, there is no consistency between individuals. This creates a limitation on the generalization of the results of the calculations, but this type of analysis may still offer significant insights into your data. It is important to understand that the results from such an analysis are imprecise and should only be interpreted generally, rather than by comparisons of small differences.

-

Interval and Ratio Data: Now we can get into the valuable number crunching

Both Interval and Ratio variables possess not only a sequence, but an even interval. Here’s where it gets tricky: the difference between the two types is zero. Yes, 0.

Interval variables may have a point which we designate “zero,” however negative numbers are theoretically possible.

A Ratio variable has a real zero point, a point which nothing can be below.

For example, an item’s price can be zero, or “free,” but price is not a Ratio value. Why? Because -$1.99, or a negative price, is conceptually possible. Take German government bonds. In a recent auction, the bonds yielded negative 0.0122%.

We try never to pay our customers to purchase our products, but theoretically, negative price has meaning. Therefore, price is an Interval variable.

Many true Ratio variables are found in marketing research. “Number of Page Visits” and “Time on Page” are common Ratio variables. The good news is that almost all statistical techniques used in marketing research can be applied to both Interval and Ratio data. Mean, Median, Mode, Correlation, Standard Deviation and ANOVA are all equally valid with both types of data.

-

So what does this mean for you?

When you design your experiments, think about the type of variables you will be collecting data for. Interval and Ratio variables allow the most flexibility in statistical analysis, so whenever possible try to use them rather than Ordinal or Nominal data. A survey question could ask “which of the following tasks have you undertaken in the last 24 hours?” which produces a multiple choice, Nominal, answer.

It could also ask, “Please rank these tasks from most to least recently undertaken,” which produces Ordinal data and allows some additional analysis.

Finally, the survey could ask, “At what time and date did you last undertake these tasks?” producing concrete Interval data which will allow you to compare between respondents and run in depth statistical functions.

In the design phase of your marketing tests, think about the statistical data you would like to produce, and what variable types are required to calculate the results you need in order to answer your research questions. When you enter your data into a statistical software package, be careful to designate the correct variable type in the software so that the program can prevent you from dividing Florida by Minnesota.

-

Related Resources:

Marketing Optimization: You can’t find the true answer without the right question

Research Update: The state of email marketing testing and optimization

Marketing Optimization: What your peers learned this year about Adwords, the inbox, and telling the truth

Evidence-based Marketing: How your peers protect against bad marketing data

 

Share and Enjoy:
  • LinkedIn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • Digg

Free Marketing Tools: 11 worksheets, spreadsheets, and calculators to help make your next optimization project a success

October 31st, 2011 3 comments

Optimizing your webpages and marketing campaigns is a daunting task for any marketer. We need all the help we can get. Free help is even better (as long as it’s actually, well, helpful). So to assist our audience of marketers with daily optimization tasks, our researchers have created 11 free marketing tools over the years, and I compiled the below list.

You can think of this as our idea of Halloween candy.

Please try them out and let us know how they work for you in the comments. Read more…

Share and Enjoy:
  • LinkedIn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • Digg

B2B Lead Testing: “Cheap” data is actually expensive

October 26th, 2011 No comments

This week at the San Francisco leg of MarketingSherpa’s B2B Summit 2011, Brian Carroll, Executive Director of Applied Research, MECLABS, and Nicolette Dease, Program Manager, MECLABS Leads Group, provided tactical training on optimizing lead generation.

Part of this presentation was a case study on finding the most efficient list source based on a test that looked at several different lead sources.

The objective of the test was to determine if higher-cost/higher-quality data can drive down overall cost-per-lead, and the primary research question was, “Which campaign data source will drive the most efficient value?”

The test design looked at six list segments with 300 accounts and 80 hours of calling per segment. Let’s first look at how much each lead costs for discovery:

  • Multi-source 1, validated by phone based on role – record cost $24
  • Multi-source 2, validated by phone based on title – record cost $14.50
  • Multi-source 3, validated by phone – record cost $6
  • Multi-source 4, validated by email – record cost $3
  • User-generated, validated by business cards – record cost $1
  • Single-source, no validation – record cost $0.49

Read more…

Share and Enjoy:
  • LinkedIn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • Digg

Marketing Optimization: Measuring the potential force of your value proposition

September 26th, 2011 1 comment

Ahhhhhh, Fall. There’s a chill in the air. Football is back. And, if you’re a marketer, it’s time once again for MarketingSherpa’s annual B2B Summit, currently happening in Boston, and soon to invade San Francisco.

We’re live onsite, and Dr. Flint McGlaughlin, Managing Director and CEO, MECLABS, opened the proceedings this morning by teaching this enthusiastic audience that most marketers can’t even define “value proposition,” much less tell you their own. Dr. McGlaughlin was out to change this, by helping the audience define the very questions necessary to establish a strong value prop, and differentiate them from competitors.

The fourth step of Dr. McGlaughlin’s six-step process involves a strategic checkpoint in which you measure the potential force of your value proposition(s).

As Dr. McGlaughlin has taught before, “clarity trumps persuasion.” This applies particularly well to value proposition, as we learned value propositions are discovered, not determined. They don’t come about from something that is predetermined; they grow out of what you are as a company or product. This sincerity needs to come through in a value proposition.

Dr. McGlaughlin showed how creating a value prop, measuring its force, and clearly communicating it to the world, let some marketers off the hook, in a sense. You don’t need to be an expert copywriter. You are simply communicating the unique value that your audience is looking for.

But how do you do it? The force of a value proposition can be measured by four essential elements of the offer: Read more…

Share and Enjoy:
  • LinkedIn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • Digg

Artificial Optimization: Why at least 40% of marketers shouldn’t test

September 23rd, 2011 2 comments

If you’ve been reading this blog for a while, then, chances are, you’re probably testing. That’s good …

… at least some of the time.

Marketers who aren’t testing may actually be better off than the ones that are

If you’re not careful, you could be running tests that tell you one thing when, in fact, the situation is completely different. You could be making critical decisions based on bad data. And these are the worst decisions you could make, because you’ve got the data to confirm that you’re right, when you’re actually doing things incorrectly.

This is why we were so surprised when MarketingSherpa’s Landing Page Optimization Benchmark Report came out with the following chart in it:

- Read more…

Share and Enjoy:
  • LinkedIn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • Digg