Data analytics and data science are both about finding useful (preferably actionable) insights in the data and helping with the decisions involved. In data science, this usually involves predictive models, usually in the form of data products, while the analysis involved is more in-depth and entails more sophisticated methods. But what tools do data analysts and data scientists use?
First of all, we need to examine the essential tasks that these data professionals undertake. For starters, they usually gather data from various sources and organize it into a dataset. This task is usually followed by the task of cleaning it up to some extent. Data cleaning often involves handling missing values and ensuring that the data has some structure to it afterward. Another task involves exploring the dataset and figuring out which variables are the most useful. Of course, this depends on the objective, which is why data exploration often involves some creativity. The most important task, however, is building a model and doing something useful with the data. Finally, creating interesting plots is another task that is useful throughout the data analyst/scientist's workflow.
You can do all the tasks mentioned earlier using a variety of tools. More specifically, for all data acquisition tasks, a SQL piece of software is used (e.g., PostgreSQL). This sort of software makes use of the SQL language for accessing and querying structured databases so that the most relevant data is gathered. For data science work, NoSQL databases are also often used, along with their corresponding software. As for all the data analytics tasks, a program like MS Excel is used by all data analysts, while a programming language like Python or Julia by all the data scientists. Data analysts use Tableau or some similar application for data visualization tasks, while data scientists employ a graphics library or two in the programming language they use. For all other tasks (e.g., putting things together), both kinds of data professionals use a programming language like Python/Julia.
Naturally, all of the tools mentioned previously have evolved and continue to evolve. The fundamental functionality may remain about the same, but new features and changes to the existing features are commonplace. For example, new libraries are coming about in programming languages, expanding the usefulness of the languages while creating more auxiliary tools for various data-related tasks. No matter how these tools evolve, the one thing that doesn't change is the mindset behind all the data analytics/science work. This mindset is the driving force of all such work and the ability to use whatever tools you have at your disposal to make something useful for the data at hand. The mindset is closely related to a solid understanding of the methodologies involved in data analytics/science.
For more information on this subject, particularly the mindset part, check out my book “data science mindset, methodologies, and misconceptions.” In it, I cover this topic in sufficient depth, and even if it is geared towards data science professionals, it can be useful to data analysts and other data professionals also. Cheers!
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.