It may seem that we are getting off-track here but this is highly relevant to any data scientist, particularly those on the data engineering path. Yet, as this is an overloaded term, let’s first clarify what we mean when we say data modeling as a field.
In a nutshell, data modeling is the field that deals with the design and implementation of databases, and any organization of data flows in an environment. It entails a combination of design elements such as UML diagrams, and some analytical aspects, such as code for creating and querying databases, based on certain specialized diagrams called database schemas (the image used above is one such schema, though in practice they tend to be more detailed). Data modeling professionals also deal with the cloud since many databases these days live there. Also, some data modeling experts work directly with the business and help the stakeholders of a project optimize the flow of information in the various departments of their organization, or build pipelines to better process the data at hand.
Data modelers come in different shapes and forms. From the more business-oriented ones to the more hands-on ones (e.g. DBAs), they cover a wide spectrum of roles. This is akin to the data scientists, who also are quite specialized these days. However, data modelers have been around longer so their roles are more established and more acknowledged in the business world. After all, databases have been around since the early days of computing, even if only recently have they evolved enough to be an important component in modern technologies such as big data governance and cloud computing. Also, note that most data modelers these days are involved in NoSQL, even if they are proficient in SQL-based languages. The reason is that most data today is semi-structured, something that NoSQL databases are designed for. Of course, structured data remains but usually, it's not as much nor as easy to produce.
Hopefully by now the link between the data modeling field and the data science one has started to become clear. After all, they are both data-oriented fields. The common link is databases since that's the core product of data modelers and the starting point of most data science projects. Without databases, we don't have much to work with so it's not uncommon to work with data modelers, particularly in the initial stages of a data science project. Also, data modelers have an interest in analytics so it's not uncommon for them to dabble with predictive models, e.g. in a proof-of-concept project. What's more, data modeling conferences can be a valuable educational resource for data scientists as it enables us to view parts of an organization that aren't always evident in a data science conference, where the focus is more technical in general.
Data modeling is particularly relevant if not essential to data engineers, those data scientists who specialize in the initial stages of a data science project. This involves a lot of ETL work as well as querying and augmenting databases. So, data engineers need to have a more concrete understanding of data modeling, even if it is on the more hands-on part of the field. After all, anyone can do some basic querying or table-creating, but to build an efficient and scalable database it takes much more than that.
Fortunately, nowadays it's easier than ever to learn more about data modeling. Also, you can do that without spending too much time, since the material on the field is well organized and in abundance. The fact that it's not a "sexy profession" like that of the data scientist, makes it less prone to hype and halfwits taking advantage of it through low-quality material. What's more, some publishers specialize in data modeling, such as Technics Publications. Finally, using the promo code DSML you can get a 20% discount on all the books and any webinars the publishing house offers.
Throughout our careers in data science and AI, we constantly encounter all sorts of obstacles that hinder our development. This is something inevitable, particularly when we undertake a role that's constantly evolving. However, the biggest obstacle is not something external, as one might think, but something closer to home. On the bright side, this means that it’s more within our control than anything subject to external circumstances. Let’s clarify.
The biggest obstacle is related to the limits of our aptitude, something primarily linked to our knowledge and know-how. After all, no one knows all there is to know on a subject so broad as data science (or AI). However, as we gather enough knowledge to do what we are asked to, we are overwhelmed by the idea that we know enough. Eventually, this can morph into a conviction and even expand, letting us cultivate the illusion that we know everything there is to know in our field. Naturally, nothing could be further from the truth since even a unicorn data scientist has gaps in her knowledge.
One great way to avoid this obstacle is to constantly challenge yourself in anything related to our field. I'm not talking about Kaggle competitions and other trivial things like that. After all, these are hardly as realistic as data science challenges. I'm referring to challenges to techniques and methods that you are lacking as well as refining those that you already have under your belt. This may seem simple but it's not, especially since no one enjoys becoming aware of the things he doesn't know or doesn't know fully. Perhaps that’s why developing ourselves isn’t something easy or popular.
Another way to enhance ourselves is through reading technical books related to our field. Of course, not all such books are worth your while, but if you know where to look, it's not as challenging a task. What's more, it's good to remember that the value of such a book also depends on how you process this new information. For example, in many such books, there are exercises and problems that the reader is asked to solve. By taking advantage of such opportunities, you can learn the new material better and grow a deeper understanding of the topics presented.
One way for learning more is through Technics Publications books. Although many of the books from that publishing house are related to data modeling, there are a few data science-related ones as well as a couple on AI. Of course, even the data modeling books can be useful to a data scientist, since we often need to deal with databases, particularly in the initial stages of a project. Also, if you were to buy a book from this publisher using the coupon code DSML, you can get a 20% discount. The same applies to any webinars you may register for. So, if the cost of this material is an obstacle for you, at least with this code you can alleviate it and get a bigger bang for your buck!
Normally I don't do book reviews on this blog but for this one, I thought I'd make an exception. After all, it's not every day I encounter a book that tackles topics like Logic head-on, without getting all abstract and theoretical. This book not only manages to remain practical but also gives a good overview of the topic of logic, something that every data professional can benefit from. Note that this book is on the subject of data modeling, which although related to data science, is its own field and is concerned with databases, as well as the design of such systems.
First of all, the book provides an excellent introduction to Logic, without getting too mathy about the topic. When I was looking into Ph.D. topics, I briefly considered doing my research on this subject. However, I quickly dismissed it because it was too abstract and theoretical. This book addresses this point and presents the subject in a very practical way, making it relatable and interesting. This is something it manages by providing a connection between Logic and databases, with plenty of examples. This enables the reader to maintain a practical viewpoint across the different topics covered in the book and view logic as a useful tool.
What’s more, the author does a pretty good review of other books on the subject with a robust criticism of their strengths and weaknesses. In a way, it feels like reading a bunch of books, getting the gist of their approaches, without having to go through their text. It is evident that the author knows the subject in great depth, something that he exhibits through his approach on the subject, which is also quite distinct. For example, he provides a great analysis of topics that weren't covered properly elsewhere such as that of integrity.
Also, the author provides lots of references for each topic at the end of each chapter, making the whole book feel a bit academic in that sense, but without the rigid style that characterizes such books. However, for someone who wishes to explore the various topics further, this list of relevant resources at the end of each chapter can be quite handy.
Moreover, the book is fairly easy to understand even for non-experts in data modelings or logic. This is important since it’s not common to find a technical book that’s accessible to non-experts in the topic. This book, however, seems to have a very broad audience, even people who know very little about the subject.
Finally, there are lots of definitions of key concepts and a scientific approach to the subject overall. This is also not very common since not all technical books are written by scientists. Also, many people nowadays write a book based on their experience and empirical knowledge on a subject. This book, however, was written in a scientific manner, even if it doesn't have the typical academic style.
So, if you are interested in buying this book, you can do so directly from the publisher. Also, if you were to use the code coupon DSML you can get a 20% discount, making this purchase a bargain. Note that this code applies to other books available at the Technics Publications site, including some of the webinars.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.