Robotic Process Automation, or RPA for short, is a methodology involving the automation of specific data-related processes through specialized scripts. RPA usually applies to low-level monotonous, and easy-to automate tasks, though lately, it has escalated into other tasks that are more high-level. RPA is quite popular for saving money, but it has its perils when it is in excess. Whatever the case, it dramatically impacts our field, so it's essential to know about it. This article is all about that.
Let's start by looking at the usefulness of RPA. RPA is useful in the extract, transform, and load (ETL) operations of all kinds. This kind of work involves moving the data from A to B, making changes to it, and making it available to other project stakeholders. Even in data science, there is a lot of ETL work involved, and often the data scientist is expected to undertake it. Naturally, RPA in ETL is very useful as it saves us time, time which we can spend on more challenging tasks, such as picking, training, and refining a model. Also, many data engineering tasks require a lot of time, so if they can be automated to some extent through RPA, it can make the whole project more efficient.
The danger of RPA when in excess becomes apparent when we try to automate the whole data science pipeline or even just individual parts of it. Take data engineering, for example. If we were to automate the whole thing, we'd be left with no say in what variables are worth looking into and what the data has to tell us. All initiatives from the data scientist would be gone, and the whole project would be mechanical and even meaningless. Some insights might still come about, but the project wouldn't be as powerful. The same goes for other parts of the data science workflow.
RPA is used in specialized frameworks for data modeling, the part of the pipeline that follows after data engineering. Systems like AutoML, for example, attempt to apply RPA in data modeling through various machine learning models. Although this may have some advantages over traditional approaches, in the long run, it may rob data scientists of their expertise and the personal touch of the models developed. After all, if everything becomes automated in data science, what's the point of having a human being at that role? Maybe certain occupations can be outsourced to machines, but it's not clear how outsourcing even the more high-level work to them will benefit the whole.
The good news is that even though RPA can undertake many aspects of data science work, it cannot replicate the data science mindset. This frame of mind is in charge of how we work the data and why. It's what makes our work worth paying for and what brings about real value from it. You can learn more about the data science mindset and other aspects of data science work from my book Data Science Mindset, Methodologies, and Misconceptions. Feel free to check it out when you have some time. Cheers!
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.