AB testing plays a crucial role in traditional science as well as data science. It isn't easy to imagine a scientific experiment worth its time without AB testing. It's such a useful technique that it features heavily in data analytics too. In this article, we'll explore this essential method of data analysis, focusing on its role in scientific work and data science. In a nutshell, AB testing uses data analysis to determine if two different samples are significantly different from each other, concerning a given variable. The latter is usually a continuous variable, used to examine how different the two samples are (it can be nominal too, however). The two samples often derive from a partitioning of a dataset based on another variable, which is binary. AB testing is closely linked to Statistics, although any heuristic could be used to evaluate the difference between the two samples. Still, since Statistics yields a measurable and easytointerpret result in the form of a probability (pvalue), it's often the case that particular statistical tests are used for AB testing. AB testing is used heavily in scientific work. The reason is simple: since there are several hypotheses the analyst considers, it's often the case that the best way to test many of these hypotheses is through AB testing. After all, this methodology is closely linked to the formation of a hypothesis and its testing, based on the data at hand. Naturally, the usefulness of AB testing is also apparent in data science and data analytics during the data exploration stage. The statistical tests used for AB testing are ttests, chisquare tests, and to a lower extent, ztests. The ttest handles cases where a continuous variable is involved (e.g., Sales), while the chisquare one is geared towards nominal variables. Ztests are very much like ttests, but they are less powerful and make stronger assumptions about the data. All statistical tests yield a pvalue as a result, which is compared to a predefined threshold (alpha), taking values like 0.05, 0.01, or 0.001. The lower the pvalue, the more significant the result. Having a pvalue lower than the alpha value means that you can safely disprove the Null Hypothesis (which states that any differences between the two samples are due to chance). Note that AB testing is a deep topic, and it's hard to do it justice in a blog article. Also, it requires a lot of practice to understand it thoroughly. So, if it sounds a bit abstract, that's normal, especially if you are new to Statistics. Cheers!
0 Comments
Your comment will be posted after it is approved.
Leave a Reply. 
Zacharias Voulgaris, PhDPassionate data scientist with a foxy approach to technology, particularly related to A.I. Archives
March 2021
Categories
All
