It may seem that we are getting off-track here but this is highly relevant to any data scientist, particularly those on the data engineering path. Yet, as this is an overloaded term, let’s first clarify what we mean when we say data modeling as a field.
In a nutshell, data modeling is the field that deals with the design and implementation of databases, and any organization of data flows in an environment. It entails a combination of design elements such as UML diagrams, and some analytical aspects, such as code for creating and querying databases, based on certain specialized diagrams called database schemas (the image used above is one such schema, though in practice they tend to be more detailed). Data modeling professionals also deal with the cloud since many databases these days live there. Also, some data modeling experts work directly with the business and help the stakeholders of a project optimize the flow of information in the various departments of their organization, or build pipelines to better process the data at hand.
Data modelers come in different shapes and forms. From the more business-oriented ones to the more hands-on ones (e.g. DBAs), they cover a wide spectrum of roles. This is akin to the data scientists, who also are quite specialized these days. However, data modelers have been around longer so their roles are more established and more acknowledged in the business world. After all, databases have been around since the early days of computing, even if only recently have they evolved enough to be an important component in modern technologies such as big data governance and cloud computing. Also, note that most data modelers these days are involved in NoSQL, even if they are proficient in SQL-based languages. The reason is that most data today is semi-structured, something that NoSQL databases are designed for. Of course, structured data remains but usually, it's not as much nor as easy to produce.
Hopefully by now the link between the data modeling field and the data science one has started to become clear. After all, they are both data-oriented fields. The common link is databases since that's the core product of data modelers and the starting point of most data science projects. Without databases, we don't have much to work with so it's not uncommon to work with data modelers, particularly in the initial stages of a data science project. Also, data modelers have an interest in analytics so it's not uncommon for them to dabble with predictive models, e.g. in a proof-of-concept project. What's more, data modeling conferences can be a valuable educational resource for data scientists as it enables us to view parts of an organization that aren't always evident in a data science conference, where the focus is more technical in general.
Data modeling is particularly relevant if not essential to data engineers, those data scientists who specialize in the initial stages of a data science project. This involves a lot of ETL work as well as querying and augmenting databases. So, data engineers need to have a more concrete understanding of data modeling, even if it is on the more hands-on part of the field. After all, anyone can do some basic querying or table-creating, but to build an efficient and scalable database it takes much more than that.
Fortunately, nowadays it's easier than ever to learn more about data modeling. Also, you can do that without spending too much time, since the material on the field is well organized and in abundance. The fact that it's not a "sexy profession" like that of the data scientist, makes it less prone to hype and halfwits taking advantage of it through low-quality material. What's more, some publishers specialize in data modeling, such as Technics Publications. Finally, using the promo code DSML you can get a 20% discount on all the books and any webinars the publishing house offers.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.