Online experiments (or A/B testing) are becoming very popular in several areas. Companies use A/B testing to incrementally improve their interface and customer experience (e.g., Google, Amazon, Facebook, Booking.com); academics from different fields run research to advance knowledge about human behaviour; online advertisers run several version of an ad to see which one attracts more clicks.
Academic curricula are including specific modules on experimental design. While it is an essential, and obvious, component of courses like psychology, behavioural sciences/economics, other disciplines now also include fundamentals of experimental design such as the online specialization in Interaction Design by the University of California San Diego or Human-Interaction courses series by Georgia Tech. The inclusion of experimental design becomes essential for students to land jobs in companies that have changed their design and decision-making processes to be based on accurate evidence and data rather than “gut feelings”. Kaufman et al (2017) in their article say that Booking.com’s “New hires coming from more traditional product organizations often find themselves humbled and frustrated when their ideas are invalidated by experiments.” It is in my opinion fantastic that a culture of controlled experiments is introduced in curricula because practitioners can start to think since the beginning on how to evaluate their hunches and ideas. It is also a culture of failing. It is inevitable to incur in numerous null results when running experiments, but if we are able to fail fast we can also learn fast.
“A/B testing is a really powerful tool; in our industry you have to embrace it or die. If I had any advice for CEOs, it’s this: Large-scale testing is not a technical thing; it’s a cultural thing that you need to fully embrace . You need to ask yourself two big questions: How willing are you to be confronted every day by how wrong you are? And how much autonomy are you willing to give to the people who work for you? And if the answer is that you don’t like to be proven wrong and don’t want employees to decide the future of your products, it’s not going to work. You will never reap the full benefits of experimentation.”
Interview with David Vismans – CPO at Booking.co. In S. Thomke (2020)
Central to learning what works and where it works, there is the idea of democratising experimentation within an organisation. Booking.com wants to enable everyone in the company to come up with ideas to test instead of following the dictate of product managers on what to test. The company runs thousands of tests every year and so in order to leverage all that evidence they keep a searchable repository of failures and successes (Kaufman et al – 2017. )
SKY in U.K. also has a process of democratisation of experimenting. Their digital team wants employees to design and run experiments on their own by training them to think scientifically. An approach that sky has already used to improve their customer service by being able to reduce number of calls and increase customer satisfaction (in S. Thomke 2020)
Online experiments are not just used by for-profit companies but it is widely used in academia. Researchers from different disciplines like psychology, economics, behavioural sciences know it well. Several survey experiments are run online in a relatively inexpensive and fast way by using platforms that allows to recruit participants from all over the world. This allows researchers to obtain the necessary data to complete their publications. In the same fashion private sector researchers could run experiments for different reasons including eliciting choices and understand perceptions and behaviours.
As of now I don’t intend to explain here methods of experimental design which are necessary to run an experiment. Instead I am listing tools and platforms that can be explored to get started with experiments in business and academia.
Tools for conducting online experiments
To satisfy the demand of online studies a plethora of web services is active and growing. If you are just beginning you need to do some exploration of the different tools and ideally would be great to have an idea of your experiment design to see which platform fits your needs. Sometime you need to be creative to workaround the functions and make it fit your needs. Here is a list by no-means exhaustive but enough to get one started.
Optimizely – Famous platform to run A/B tests on your website or app.
Google optimize – Popular platform for A/B test on websites and apps integrating with the other Google tools.
Gorilla – no-coding platform to design proper behavioural web-based experiments. Free to design and pay per respondent. Includes also a library of example tasks such as IAT, attitude tasks, attention tasks, and many more.
Expilab – Another great behavioural research platform for academia and business.
Inquisit Web/Lab – Offers both a web platform as well as a downloadable software to use in a lab for psychological experiments.
TESTABLE – Testable claims to be a one-stop solution for behavioural experiments, survey and data collection.
Tatool – an open-source software for running online and offline computer-based experiments. It has been conceptualized according to the requirements for running psychological experimental studies.
oTree – an open-source platform for behavioural research. It lets you create controlled behavioral experiments in economics, market research, psychology, and related fields; multiplayer strategy games, like the prisoner’s dilemma, public goods game, and auctions; surveys and quizzes, especially those that require customized or dynamic functionality not available with conventional survey software.
z-Tree – z-Tree is a widely used software package for developing and carrying out economic experiments. The software is implemented as a client-server application with a server application for the experimenter, and a client application for the subjects.
Other tools allow to build form scratch your own platform. Take a look at: Nodegame, lab.js, JSPsych,
Some upcoming tools to keep an eye on are: xprmntr (for creating experiments using R) and GuidedTrack
Platforms to build survey experiments (or just questionnaire surveys)
Qualtrics – Probably the most famous survey platform for surveys widely used in academic research. While the platform supports only survey-style data collection, it includes a randomisation component allowing to run experimental designs.
LimeSurvey – Limesurvey is another survey platform. It was born as an open source software so it is possible to install an instance of the LimeSurvey Community Edition on a server and have a complete survey platform for free. Strongly recommend a donation though. Recently hey have also created a cloud service offering the survey tool at an affordable price.
SurveyMonkey – SurveyMonkey is also one the most popular survey platforms. They offer a paid built-in function for A/B testing
Remember that in some of the survey tools you can also embed additional code so that you can run more complicated and customised tasks on your participants (e.g. IAT test).
Another important consideration is randomisation of the assignment of participants to conditions. Randomisation is random, and, depending on the platform, you won’t obtain groups that are perfectly equal. You should plan to recruit more participants that you have actually calculated or try to tweak conditions of assignment to “force” balancing among groups. Always run pilot tests, and keep an eye on data coming in to see if the system is working correctly without running any analysis on data before reaching your decided sample size.
Platform for recruiting participants online for experiments and surveys
Ideally one would also like to have some participants in their experiments. While of course one is free to recruit participants on their own, there are some services that can accelerate the process. Although, these platforms makes it easy to recruit participants, it is recommended to put in place strategies to mitigate certain risks related to online recruiting by: using attention checks, IP address checks, cross-checking of participant information or even verification through phone number (phone call or text message), verifying email address.
Prolific – Great service for collecting high quality responses from people around the world.
Amazon Mechanical Turk – Born as a micro-tasking platform run by Amazon to facilitates crowdsourcing of small tasks. It became popular among researchers for being able to provide participant for studies and now used by academics, government agencies, and businesses. MTurk gives you the freedom of deciding how much to pay for your respondents and there is a case for exploitation (see article here). In fact since this platform has been extensively used for research, there are several academic articles and studies on its pro and con.
Another platform called CloudResearch leverages this pool of workers on MTurk offering a tool to simplify the process since the MTurk interface is not the easiest to navigate.
Qualtrics panel online Sample – Qualtrics claims to offer any participant demographic by using multiple market research panels and social media. Their service is more bespoke and they are able to pool from other sources if researchers needs a specific target group. Price is also personalised by requesting a quote and requires a subscription to their survey software.
SurveyMonkey Audience – SurveyMonkey (like Qualtrics Sample) also offers a recruiting service and it is also connected to the use of their survey software. Mainly U.S. participants and offers demographic targeting.
Facebook ads (or other social media) – Not exactly designed for recruiting respondents, but one could run advertisements on Facebooks directing respondents to a survey/experiment webpage. Facebook is quite widespread and offer a certain degree of demographic and geographical targeting as well.
Recruiting participants for free
Apart from inviting you parents, cousins, and friends to complete your experiments there are some platforms that allow recruiting participants for free. This could be ideal for student with limited budget but don’t expect big numbers of participants. One of these platform is Survey Circle . It’s a club of research enthusiasts, that help each other with their research. You also sign up to complete studies by other researchers gaining points than you can use as a currency to run your own research project.
ORSEE – A last mention goes to ORSEE. ORSEE is an open source web-based Online Recruitment System, specifically designed for organizing economic experiments. It is written in PHP and can be hosted on any webserver and it aims :
- to simplify the organization of economic laboratory experiments
- to standardize the procedures of experiment organization
- to depersonalize the experimenter-subject interaction
- to provide information and statistics about the subject pool
Readings
BOOKS
Thomke, S. H. (2020). Experimentation Works: The Surprising Power of Business Experiments. Harvard Business Press.
Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press.
ARTICLES
Kohavi, R., & Thomke, S. H. (2017). The surprising power of online experiments. Harvard Business Review
Thomke, S. H. (2020). Building a Culture of Experimentation. Harvard Business Review
Bojinov, I., Saint-Jacques, G., Tingley, M. (2020). Avoid the Pitfalls of A/B Testing. Harvard Business Review
McGinn, D. (2020). “The Power of These Techniques Is Only Getting Stronger”. Harvard Business Review
Kaufman, R. L., Pitchforth, J., & Vermeer, L. (2017). Democratizing online controlled experiments at Booking.com. arXiv preprint arXiv:1710.08217.
Bakshy, E., Eckles, D., & Bernstein, M. S. (2014, April). Designing and deploying online field experiments. In Proceedings of the 23rd international conference on World wide web (pp. 283-292).
Tosch, E., Bakshy, E., Berger, E. D., Jensen, D. D., & Moss, J. E. B. (2019). PlanAlyzer: assessing threats to the validity of online experiments. Proceedings of the ACM on Programming Languages, 3(OOPSLA), 1-30.
Teitcher JEF, Bockting WO, Bauermeister JA, Hoefer CJ, MinerMH, Klitzman RL. Detecting, preventing, and responding to“fraudsters”in internet research: ethics and tradeoffs.J LawMed Ethics.2015;43(1):116–133.
Curran PG. Methods for the detection of carelessly invalid responses in survey data. J Exp Soc Psychol. 2016;66:4–19
Fort, K., Adda, G., & Cohen, K. B. (2011). Amazon mechanical turk: Gold mine or coal mine?. Computational Linguistics, 37(2), 413-420.
Lovett, M., Bajaba, S., Lovett, M., & Simmering, M. J. (2018). Data Quality from Crowdsourced Surveys: A Mixed Method Inquiry into Perceptions of Amazon’s Mechanical Turk Masters. Applied Psychology, 67(2), 339-366.
Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4-19.