Recently I decided to do something a bit more experimental, which very few people have tried covering in a video. So, I tackled a more niche sub-topic of Natural Language Processing, related to custom-made features and their construction. Despite its seemingly simple nature, this skill is something that can differentiate you from a newcomer in NLP. This A.I. video assumes some knowledge of NLP but you don’t need to be a seasoned data scientist to follow. Also, I provide several examples, as well as an original taxonomy to help you organize all this information in your mind. So, check it out on Safari when you have a moment.
Note that a subscription to the Safari portal is required in order to view the video in its entirety. With the subscription you have access to a large number of books and videos, across various publishers and domains.
So, the NLP Fundamentals video I made recently is online as of today (you can find it on the Safari site). Note that since Natural Language Processing is a very broad subject, it is quite hard to do it justice in a single video. However, for someone needing a good introduction to it, this video should be fine. Enjoy!
With all the hype about A.I. lately, many people have jumped on the A.I. bandwagon without realizing that what they are producing is not always related to A.I. and that their false promises can only get them that far. That’s not to say that modern processes in data science that leverage alternative approaches to analyzing data without relying on a predefined data representation system are not A.I. Far from it. However, there is a lot of jazz about knowledge representation systems (KRS), such as those applied in Natural Language Processing (NLP) that are merely transformations of text data into a quantitative format. Calling that an A.I. is calling a sedan a 4-by-4 monster truck!
Knowledge representation is useful in many ways as it is an often necessary component to Natural Language Understanding (NLU) and other NLP-related systems. For example, the NLTK package in Python has a process in place that categorizes a given text into a series of parts of speech (PoS), by labeling each word with the most appropriate PoS tag. That’s useful, but it’s not exactly A.I. technology. Similar frameworks providing some kind of labeling of text data fall under the same umbrella. In fact, without someone processing their output and building some kind of model based on it, such a labeling is utterly useless. It’s like the dough someone makes, which without additional processing (e.g. baking), it’s bound to be something you’d probably not serve in a dinner party as-is (though many kids may be quite content eating it in this form).
People managing data-driven products, however, are not kids. They expect some kind of value from the processing of the text-based data streams (which sometimes come at a cost) and a positive ROI. It’s quite unlikely that serving them some half-baked data using a knowledge representation system on the given data is going to make them content. Maybe they are fooled once into believing that this is A.I. at work, but it’s probably going to be a one-time thing. This is especially true if they have some data scientist on-board, who knows a thing or two about text analytics.
A.I. systems are automated processes that make an in-depth transformation of the data they are fed, yielding something of value at the end. They usually require a lot of sophisticated processes in the back-end, such as the generation of a large number of meta-features, gradually refining the original features into something that encapsulates the information in them, and then use the end-result to make predictions of some kind. When it comes to data, this could be some new text that mimics the style of the original text, or some better representation of the data using a compact feature set. All this is done through computationally heavy processes that often employ the usage of GPUs. So, saying that a knowledge representation system that can run on an average computer, without any additional computing power, is an A.I. system, is inaccurate and misleading. Best case scenario, its results will be later discovered to be interesting but practically useless. After all, A.I. systems are robust because they drill into the data in ways that no human can do, and usually not even comprehend fully.
So, if you hear someone claim that they have developed some new A.I. system that can handle raw text data, without the use of some non-parametric model, they are probably trying to sell you snake oil. This is expected in times where new technologies are available yet not fully understood, and charlatans trying to take advantage of the fact are promoting products convoluted enough to masquarade as this new tech, without actually offering any real value to the user. The answer to this situation is to better understand the field through methodical study (it doesn’t have to be too time-consuming) through reliable sources and the consultation of A.I. professionals and data scientist with an NLP focus. Once you are armed with this understanding, no KRS charlatans can take advantage of you since you’ll be able to see through their lies.
Sentiment Analysis is a popular NLP topic that I've been involved in for a while now. I even wrote an article about it for a friend of mine, who is an editor at a marketing blog. Anyway, after I finally finished my latest book (Technics Publications, ETA: Fall 2017), I had some time to work on a video for Safari Books Online. This video is now online at Safari and is probably going to be followed by similar ones on NLP and NLU related topics. Any suggestions are welcomed!
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.