Everyone uses text in digital format, especially since the rise of social media. That’s why text has become one of the most commonly used resources in data science. In his latest book, Turning Text into Gold (Technics Publications), the father of data warehousing takes a stab at this intriguing topic. Here, we’ll take a look at his book from a couple of different angles (rather than just giving an opinionated review like pretty much everyone else does on e-commerce sites).
Before we start, let me say for the record that I don’t know Bill Inmon personally, nor has anyone asked me to review any of his books. I just find the topics he deals with quite interesting and worth learning more about, even if they don’t directly relate to my field of expertise.
In his book, Turning Text into Gold, Bill Inmon examines various topics related to text modeling and NLP. Namely, he looks at taxonomies, ontologies, databases briefly, text data types, text analytics, two different levels of text processing, and four different use cases of text analytics across the industry. His overall style is high-level, while the book is rich with diagrams to clarify the points he makes. He also has a number of examples in every chapter to clarify these points further. The structural complexity of the text is fairly basic, so everyone can read it, even on a busy coffee shop or while riding the bus. Make no mistake, however: the book is not targeted at novices. In fact, in order to make the most of this resource, you’ll need some basic understanding of text analytics, otherwise it is bound to appear a bit abstract.
Data Modeling Perspective
From a data architect’s viewpoint, this book covers the topic very extensively. The author’s expertise in the field becomes abundantly clear from the get-go, as he explains the key concepts of text-related data structures in such simple terms that only a true master of the field could. Without hiding behind jargon or complex text structures, he presents the main ideas of each topic elegantly and with enough detail to make them comprehensive. It would be great if he would add a few links or references in general for further investigation, however, as some topics are quite deep and may require more research for someone new to this field.
Data Science Perspective
From a data scientist’s perspective, this book is not very relevant, unless you are already an expert in NLP. The author doesn’t provide any guidance about how to implement any of the ideas he exhibits, nor does he hint towards any particular packages / tools for applying the frameworks he describes. So, if you are a data scientist who is new to NLP and text analytics in general, you may find this book a bit too introductory. Nevertheless, if you read it in conjunction with other, more low-level books, you may find it very insightful. Also, if you are already adept in the techniques of NLP, you may find it very useful for understanding where everything fits, in the bigger picture.
Just like the alchemists of our times, who aim to turn low-value data into gold, the reader can make a similar transmutation of the text of this book. However, she may need to combine its contents with know-how from other sources, for a smoother process. Nevertheless, this book is an excellent introductory resource to the field of text analytics, which has a lot to offer to both data modeling and data science alike.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.