Everyone uses text in digital format, especially since the rise of social media. That’s why text has become one of the most commonly used resources in data science. In his latest book, Turning Text into Gold (Technics Publications), the father of data warehousing takes a stab at this intriguing topic. Here, we’ll take a look at his book from a couple of different angles (rather than just giving an opinionated review like pretty much everyone else does on e-commerce sites).
Before we start, let me say for the record that I don’t know Bill Inmon personally, nor has anyone asked me to review any of his books. I just find the topics he deals with quite interesting and worth learning more about, even if they don’t directly relate to my field of expertise.
In his book, Turning Text into Gold, Bill Inmon examines various topics related to text modeling and NLP. Namely, he looks at taxonomies, ontologies, databases briefly, text data types, text analytics, two different levels of text processing, and four different use cases of text analytics across the industry. His overall style is high-level, while the book is rich with diagrams to clarify the points he makes. He also has a number of examples in every chapter to clarify these points further. The structural complexity of the text is fairly basic, so everyone can read it, even on a busy coffee shop or while riding the bus. Make no mistake, however: the book is not targeted at novices. In fact, in order to make the most of this resource, you’ll need some basic understanding of text analytics, otherwise it is bound to appear a bit abstract.
Data Modeling Perspective
From a data architect’s viewpoint, this book covers the topic very extensively. The author’s expertise in the field becomes abundantly clear from the get-go, as he explains the key concepts of text-related data structures in such simple terms that only a true master of the field could. Without hiding behind jargon or complex text structures, he presents the main ideas of each topic elegantly and with enough detail to make them comprehensive. It would be great if he would add a few links or references in general for further investigation, however, as some topics are quite deep and may require more research for someone new to this field.
Data Science Perspective
From a data scientist’s perspective, this book is not very relevant, unless you are already an expert in NLP. The author doesn’t provide any guidance about how to implement any of the ideas he exhibits, nor does he hint towards any particular packages / tools for applying the frameworks he describes. So, if you are a data scientist who is new to NLP and text analytics in general, you may find this book a bit too introductory. Nevertheless, if you read it in conjunction with other, more low-level books, you may find it very insightful. Also, if you are already adept in the techniques of NLP, you may find it very useful for understanding where everything fits, in the bigger picture.
Just like the alchemists of our times, who aim to turn low-value data into gold, the reader can make a similar transmutation of the text of this book. However, she may need to combine its contents with know-how from other sources, for a smoother process. Nevertheless, this book is an excellent introductory resource to the field of text analytics, which has a lot to offer to both data modeling and data science alike.
After talking with my publisher, I got him to offer a 25% discount option for my latest book, to the people I connect with through this blog and through beBee. This discount applies to both the printed version and the PDF. So, if you found the price was too steep for you, here is a promo code you can apply to get a 25% off, for a limited time only: Zack25
A public domain photo refactored as a piece of digital art by a deep learning network, on my laptop
Painting is not my favorite art. Nevertheless, I do enjoy it more than most people (apart from people who actually practice the art perhaps), since it’s easy on the eyes and the meaning it tries to convey is far easier to grasp than any other art. Creating something in this art form is very time-consuming though, which is why I admire those who have the patience to make something beautiful out of their canvases and their paints. Also, it takes a special kind of intelligence to be able to create in this domain. Could it be that artificial intelligence can emulate that? The answer is yes!
Over the years, machines have been used in a variety of creative tasks, particularly music. This is obvious for those who have delved into this art but I don’t want to get into a tangent here. Doing something creative with A.I. in the painting domain is whole different kind of challenge though, especially if you don’t know much about the art, like most A.I. people. Of course everyone can do some rudimentary kind of drawing but does that qualify as art? I doubt it and I’m sure anyone who has indulged into the fascinating history of art would agree. There is something else when it comes to making a painting, something that has eluded A.I. algorithms… up until now.
So, what is A.I.-based painting? Well, it is digital for starters. It’s not like the A.I. picks up a palette and a brush and starts coloring a canvas (although I wouldn’t be surprised if there were robots out there equipped with such an A.I. doing just that). Most A.I. systems that can paint do so with a deep learning system that has been trained in a particular style of painting. As an input, such an A.I. system usually takes a digital image, which is the equivalent of the idea or subject that usually fuels such creative endeavors in human beings. What the A.I. does after that is create a new image that makes use of the primary features of the original image (the quintessence of the subject, if you will). These features, which correspond to a particular color palette, shapes, locations of these shapes, and other relevant information, are then processed by the deep learning network they employ. The output of that network is then mapped into a form akin to the original image or corresponding to a set of specifications regarding the resolution of the art piece. The output, naturally, is an image of that resolution. Of course, the A.I. doesn’t have a clue of what it is doing, but given enough training data in its deep learning network, it can perform the task quite creatively.
Although most such synthetic artistic products are very interesting, not all of them are particularly pleasing or even worth the wait of the whole creation process (which is non-trivial when undertaken by a single computer, even if the deep learning networks are already trained beforehand). So, if you are an artist committed to this particular art form, you shouldn’t worry about your work becoming outsourced to machines any time soon! Whatever the case, applications like this are by far more meaningful than other, less thoughtful uses of computational resources for A.I. purposes. This, however, is probably the topic of a future post on the subject…
Julia has been a topic of controversy in the previous year, the year that was critical for the language’s future, at least in the data science domain. In the beginning of that year, while working at a small-medium company as a data science contractor, I remember making the argument that Julia is ready for data science and that we should give it a shot. Both the people of that company and the people of a vendor company (a local data science start-up that was acquired by Apple later that year) were very skeptical about this. Claims that “Julia is not data science ready” which floated all over the web seemed to echo in our conversations as well.
Later that year I focused on my book on the language and its applications on data science, a book I had started writing the previous Fall. At that point no-one else seemed to care about Julia in the data science community and the big players in the corporate world that had a say about data science (e.g. Amazon, Microsoft, etc.) didn’t seem to even take notice on this promising technology. Still, I knew that the merits of this language would one day surface in people’s minds as well as in the web. So, I finished the book, got it published, and gave a couple of talks on the language. Even though it was the first book to have ever be written on this topic (focusing on the data science applications of Julia), it was soon followed by another one from another publisher, bearing the same title! Also, a few days before I gave my first talks on the subject, Julia entered the top 50 languages in the TIOBE index for that month (blog article from Julia Computing). Clearly the claim that Julia was not data science ready had started to seem like an opinion of the less informed people.
It was that Fall, about a year after I’d started working feverishly on my Julia book, that Amazon took a very bold step, which I consider to be the tipping point. That Fall, Julia started to rise in the eyes of the corporate world, as Amazon adopted the MXNet deep learning framework, which included Julia as one of the languages that it supported (MXNet article on my blog). The researchers involved in this project even published a scientific article about this, in collaboration with the University of Washington, a very prestigious academic institution that was one of the first ones to popularize data science education through its corresponding programs.
After that point, Julia was officially a fully cloud-supported technology. Microsoft soon joined the game by adopting it in the Azure framework (blog article by a Julia user in Denmark). Even Google decided to support Julia in its Tensorflow deep learning system, which up until then was Python exclusive. It seems that the use of Julia in data science is not a fad after all!
Yet, there are still people claiming that Julia in not a data science language and that language X is the way to go because most people have been using X in the past few years. Perhaps they are right, at least subjectively. Some companies are so conservative that will probably die before admitting that the technology they are using is not the best out there. However, instead of paying attention to them, you can do your own research on the topic and form your own view on the matter. That’s what I did and I never regretted it!
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.