Why the Role of A.I. in the Job Market Is Very Much a Business Decision Technical Professionals Can Contribute to
Lately there is a lot of talk about AIs potentially taking people’s jobs in the future and how this is either catastrophic, or some kind of utopia (or, less often, some other stance in between). Although we as data science and A.I. professionals have little to do with the high-level decisions that have some influence on this future, perhaps we are not so detached from the reality of the situation. I’m not talking about the A.I. choir that is happy to recite its fantasies about an A.I.-based future that is akin to the sci-fi films that monetize this idea. I’m talking about grounded professionals who have some experience in the development of A.I. systems, be it for data science or other fields of application.
The problem with business decisions is that they are by their nature related to quite complex problems. As such, it is practically impossible to solve them in a clear-cut manner that doesn't invite reactions, or at least some debate. That’s why those individuals who have the courage to make these decisions are paid so handsomely. It’s not the time they put in, but the responsibility they undertake, that makes their role of value. However, it is important to make these decision as future-proof as possible, something that these individuals may not be able to do on their own. That’s why they have advisors and consultants, after all. Besides, even if some of the decision-makers are technical and can understand the A.I. matters, they may lack the granularity of comprehension that an A.I. professional has.
People who make business decisions often see A.I. as a valuable resource that can help their organization in many ways (particularly cut down on some costs, via automation or increased efficiency in time-consuming or expensive processes). However, they may not always see the implications of these moves and the shortcomings of this, still not yet mature, technology. A.I. systems are not objective, nor immune to errors. After all, most of them are black boxes, so whatever processes they have in place for their outputs are usually beyond our reach, and oftentimes beyond our comprehension. Just like it is impossible to be sure what processes drive our decisions based on our brain patterns, it is perhaps equally challenging to pinpoint how exactly the decisions of an A.I. are forged. That’s something that is probably not properly communicated to the decision makers on A.I. matters, along with the fact that AIs cannot undertake responsibility for these decisions, no matter how sophisticated these marvels of computing are.
Perhaps some more education and investigation into the nature of A.I. and its limitations is essential for everyone who has a say in this matter. It would be irresponsible to expect one set of people to navigate through this on their own and then blame them if their decisions are not good enough or able to withstand the test of time. This is a matter that concerns us all and as such we all need to think about it and find ways to contribute to the corresponding decisions. A.I. can be a great technology and integrate well in the job market, if we approach it responsibly and with views based on facts rather than wishful thinking.
A.I. is great, especially when applied to data science. Many people lately are quite concerned about the various dangers it may entail. This naturally polarizes people, splitting views of the topic into two main groups: the ones neglecting these concerns and those mirroring a fear that the end of the world is upon us. Probably the truth lies somewhere in-between, but given the lack of evidence, any speculation on the matter may be premature and likely to be inaccurate.
In this post I’d like to focus on another danger that many people don’t think much about, or don’t see it as a danger at all: the sense of complacency that may arise from a super-automated world. Of course, complacency is a human condition and has little to do with A.I. but someone may consider that it is A.I. to blame for this condition. After all, super-automation may be possible only through this new technology becoming wide-spread.
This danger, which can find its way to data science too if left unchecked, is a real one. However, it is neither singular nor catastrophic. After all, every large-scale technological innovation has brought about social changes that have triggered this condition to some extent. This does not mean that we should go back to the stone age, however. After all, technology is largely neutral and the people who make it available to the world have the best intentions in mind. So, it seems that blaming a new tech for this matter may be a bit irresponsible.
Yet, the advent of technology can be a good thing if dealt with in a mature manner. Just like you can own a car and still make time for physical exercise, you can have access to an A.I. and still be a creative and productive person. It’s all a matter of power, at the end of the day. If we give away our power, our ability to choose and to shape our lives, then we are left powerless victims of whoever has taken hold of that power. In the case of A.I., if we cherish automation so much that we outsource every task to it, then we are willingly creating our own peril. So, if we choose to maintain a presence in all processes where A.I. is involved, the latter is not going to be a threat, not a considerable one anyway.
There is no doubt that A.I. can be dangerous, much like every other technological advancement. However, it seems that the crux of the problem lies within us, rather than at the machines that incarnate this technology. If we give into a sense of complacency and allow the AIs to have a gradually more active part in our society, then maybe this tech will create more problems than the ones it’ll solve. However, if we deal with this new technological advent maturely, we can still benefit from it, without making ourselves obsolete or irrelevant, in the process.
When people think about the benefits of A.I. and its impact in our world, they usually think of self-driving cars, advanced automations, deep learning systems, clever chatbots, etc. Those particularly infatuated with the idea of A.I. tend to go even further and fantasize about super-intelligent machines that will magically solve all our problems without any effort from us (pretty much like a deus ex machina figure in some ancient theater play). However, the more pragmatic A.I. thinkers focus more on particular applications of A.I. that can be implemented fairly easily, and that target specific issues that would be impractical to solve in conventional ways. One such case is that of detecting how contaminated beehives are by a particular parasite.
Why should we care about this matter? Don’t we have larger problems to deal with? Perhaps. After all, there are more evident problems out there that require unconventional ways of tackling them, problems that could benefit a lot by a narrow A.I. designed for them. However, the issue of infested beehives is not a minor one, as it represents a real danger for the whole species of these buzzing insects. It’s worth noting that bees are not useful for just the honey they produce; they are key in plant polination, and as such they play an important role in our planet’s fragile ecosystem, that’s on the wane lately. So, it may be a big deal after all.
Developing an A.I. to tackle the beehive infestation problem is a project disproportionate to its impact, as it is fairly manageable with the existing technology, at least for a particular parasite, called the Varroa mite. These organisms can cause serious issues to the bees, issues that are observable with the naked eye. However, assessing the infestation may not be so straight-forward, making it difficult to take intelligent action against it (e.g. how can you tell which beehives are in imminent danger and prioritize accordingly?). That’s where Computer Vision comes in handy, an automated way for a computer system to evaluate what a camera attached to it observes. The images from the camera feed, when coupled with some deep learning network, can help measure the magnitude of the issue in a very small amount of time (check out a demo of an app by TopLab, that does just that). Will this be enough? Possibly, if this process is coupled with an effort to eliminate the parasites once identified. However, knowing about the infestation issue in an objective and practical manner, can definitely speed things up.
Perhaps A.I. is not as futuristic as it is often perceived, nor as high-level as it comes across. After all, just like any other applied science, it aims to solve real-world problems right here and now, in an efficient and effective manner. The question is, are we willing to apply it to more strategic problems, like the case of an impaired ecosystem, or are we going to use it only to make our urban lives more convenient? Hopefully that’s a question we can answer with just our natural intelligence...
This post is inspired by Joel Grus’s latest blog post on an interview of his and his showcasing of his Tensorflow know-how during it. Now, this interview was probably imaginary since I doubt anyone would be that foolish in an interview, but he is not afraid to make fun of himself to get a point across, something that is evident in most of his writings (including his book, Data Science from Scratch). I have no intention to promote him, but I find that his whole approach to data science is very fox-like, so it is only natural that he is mentioned in my blog!
In this dialectic post of his, Joel describes an interviewee’s efforts to come across as knowledgeable and technically adept, as he tries to solve the Fizz Buzz problem he is asked to whiteboard, using this Deep Learning package, along with Python. Of course, why someone would waste time asking him to solve such a simple problem is incomprehensible to me, but perhaps it’s for an entry level position or something. Still, if this were a data science position, the Fizz Buzz problem would be highly inappropriate as it has nothing to do with programming that’s relevant to data science. Joel goes on in his blog post to describe the great lengths he has to go to in order to get a basic neural network trained and deployed so that he can solve the problem, though even though he does nothing wrong (technically), his approach fails to yield the desired output and he fails the interview. That’s not to say that he or his tools are bad, but clearly illustrates the point he’s trying to make: advanced techniques don’t make one a good data scientist!
This is an issue with many data scientists today who have gotten intoxicated with the latest and greatest A.I. tech that’s found its way into data science. The tech itself is great and the tools it has been implemented with are also great. However, just because you can use them, it doesn’t make you a good data scientist. So what gives? Well, even though Deep Learning is a great framework for tackling tough data science problems, it fails miserably in the simpler ones, which are also quite common. Perhaps it’s the lack of data points, the fact that it takes a while to configure properly, or some other reason that depends on the problem at hand. Whatever the case, as data scientists we ought to be pragmatic and hands-on. Just because we know an advanced Machine Learning technique, it doesn’t mean that we should use it to solve all of the problems we are asked to solve. Sometimes we just need to come up with some simple heuristic and work with that.
There is an old saying that illustrates this issue the Joel describes in that post: killing a mosquito with a cannon. Yes, you may actually succeed in killing the poor insect with your fancy artillery weapon, but is that really cost-effective? Nowadays many data scientists go with the Deep Learning option because someone convinced them that it’s the best option out there in general, without sitting down for a minute and thinking if it’s the best option for the particular problem they are facing. Data science is not as simple and straight-forward an approach to problem-solving as some people make it out to be. So let’s get real for a minute and tackle problems like engineers, opting for a simple solution that works, before calling the cavalry for A.I. to help us. Being super adept may be appealing, but we first need to be adept at what we do by employing a down-to-earth approach that just works, before opting for improvements through more advanced models.
We hear a lot about deep learning (DL) lately, mainly through the social media. All kinds of professionals, especially those involved in data science, never get tired of praising it, with claims ranging from “it’s greatly enhancing the way we perform predictive analytics” to “it’s the next best thing since sliced bread or baked bread for that matter!” What few people tell us is that most of these guys (they are mainly male) have vested interests in DL, so we may want to take these claims with a pinch of salt!
Don’t get me wrong though; I do value DL and other A.I. methods for machine learning (ML). However, we need to be able to distinguish between the marketing spiel and the facts. The former is for people poised to promote DL at all costs (for their own interests), while the latter is for engineers and other down-to-earth people who prefer to form their own opinions on the matter, rather than get all infatuated with this tech like some mindless technically inept fanboy.
Deep Learning involves the training and application of large ANNs to predictive analytics problems. It requires a lot of data and it promises to provide a more robust generalization based on that data, definitely better than the already obsolete statistical models, whose performance in most big data problems leaves a lot to be desired. Still, it is not clear whether DL can tackle all kinds of problems. For example, it is quite challenging to acquire the amount of data that is needed in order to solve fraud detection or other anomaly detection problems. When it comes to classifying images, however, the data available is more than adequate to train a DL network and let it do its magic. In addition, if we are interested in finding out why data point X is predicted to be of value Y (i.e. which features of X contribute the most for this prediction), we may find that DL isn’t that helpful because of the black box problem that it inherently has, just like all other ANN-based models. If however all we care about it getting this prediction and getting it fast, a DL network is sufficient, especially if we train it offline before we deploy it on the cloud (or on a physical computer cluster, if you are more old-fashioned).
Deep Learning can be of benefit to data science as it is a powerful tool. However, it’s not the tool that is going to make all other tools obsolete. As long as there are other parts in the pipeline beyond the data engineering and data modeling ones (e.g. data visualization, communicating the results, understanding the business questions, formulating hypotheses, among others), getting a DL system to replace data scientists is a viable option only in sci-fi movies. People who fantasize about the potential of DL in data science, imagining it to be the panacea that will enable companies to replace data scientists probably don’t understand how data science works and/or how the business world works. For example, someone has to be held accountable for the predictions involved and that person will have to explain them, in comprehensive terms, to both her manager and the other stakeholders of the data science project. Clearly, no matter how sophisticated DL systems are, they are unable to undertake these tasks. As for hiring some technically brilliant idiot to operate these systems and be a make-believe data scientist, with the salary of an average IT professional, well that’s definitely an option, but not one that any sane person would be likely to recommend to an organization, given that she wants to keep that organization as a client. If such a decision is to be made, it is most likely going to come from some person who cares more about pleasing his supervisor by telling her what she wants to hear, than about saying something that is bound to stand the test of time.
All in all, DL is a great tool, but we need to be realistic about its benefits. Just like any other innovative technology, it has a lot of potential, but it’s not going to solve all our problems and it’s definitely not going to replace data scientists in the foreseeable future. It can make existing data scientists more productive though, especially if they are familiar with A.I. and have some experience with using ANNs in predictive analytics. If we keep all that in mind and manage our expectations accordingly, we are bound to benefit from this promising technology and use it in tandem with other ML methods, making data science not only more efficient but also richer and even more interesting than it already is.
It is often the case that we treat a new A.I. as a child that we need to teach and pay close attention to, in order for it to evolve into a mature and responsible entity. However, a fox-like approach to this matter would be to turn things around and see how we, as human beings, can learn from an A.I., particularly of a more advanced level.
Of course A.I. is still in a very rudimentary stage of its evolution so it doesn’t have that much to teach us that we can’t learn from another human being. However, that wise human who would be a great mentor is bound to be bound by his everyday commitments, personal and professional making him inaccessible. Also, finding him may take many years, assuming that it is even possible given our circumstances. So, learning from an A.I. may be the next best thing, plus we don’t have to deal with personality-related impediments that often plague human relationships, even the more professional ones.
An A.I., first and foremostly is unassuming. This is something that we can all develop more, no matter how objective we think we are. A.I. doesn’t have any prejudices so it deals with every situation anew, much like a child, making it more poised to finding the optimum solution to the problem at hand. That’s something that is encouraged and often practiced in scientific ecosystems, like research centers and R&D departments, where the objective is so important that all assumptions are set aside, at least long enough for this approach to yield some measurable results.
A.I.s also tend to be very efficient, minimizing waste and unnecessary tasks. They don’t care about politics or massaging our egos. Their only focus is maximizing an objective function, given a series of restraints and, whenever it is applicable, take actions based on all this. If we were to act like that we’d definitely cut our time overheads significantly since we’d be concentrating more on results rather than pleasing some person who may have some influence over us professionally or personally.
A third lesson we could get from A.I. is organization. Although we most certainly have organization in our lives to some extent, we have a lot to learn from the cool-headed A.I. that employs an organizational approach to things. An A.I. tends to model its knowledge (and data) in coherent logical structures, immune to emotional or otherwise irrational influences. It deals with the facts rather than its interpretations of them. It builds functional structures rather than pretty pictures, to deal with the inherent disorder that its inputs entail. It makes graphs and optimizes them, rather than graphics that are easy on the eyes (although there is value in those too, in a data science setting). Clearly we don’t have to abandon our sentimental aspects in order to imitate this highly efficient approach to problem-solving, but we can try to be more detached when dealing with our work, rather than let sentimental attachments and eye candy exercise influence over our process.
Perhaps if we were to treat A.I. as a potential teacher of sorts, in the stuff it does well, it wouldn’t seem so threatening. Maybe feeling scared of it is merely a projection of ours, an objectification of our inherent fear of our own minds, which is still largely uncharted territory. A.I. doesn’t have an agenda and is not there to get us. If we treat it as an educational tool, it may prove an asset that will bring about a mutually beneficial synergy. It’s up to us.
Natural Language Processing, or NLP for short, is a very popular Data Science methodology that has gained a lot of traction over the past few years as more and more companies have realized that it’s easy to get access to text data and use it to derive valuable insights. Twitter, for example, offers a rich data stream which when processed properly, it can yield a lot of insights on a particular topic or brand, using just NLP as a paid resource. However, NLP wouldn’t have gone far if it weren’t for A.I., since the latter allows it to go beyond the rudimentary statistical models that NLP has in its toolkit. So, what’s the relationship between these two fields and how is it expected to evolve in the years to come?
Just to clarify NLP is so much more than just running a Bayesian classification system, or a regression model on text data that’s been encoded into binary features. NLP also involves topic discovery, text similarity, and even summarization of a document, among other things. All these tasks would be extremely difficult, if not impossible, if it weren’t for A.I. So, NLP is at least partly dependent on A.I., at least for applications that a really worth a data scientist’s time. Of course A.I. is not there is displace Stats, but rather complement this more formal approach. Think of it as the fox that works side-by-side with the hedgehog against a common enemy, rather than two animals fighting each other for dominance.
What about the other side of the relationship? Does A.I. need NLP in any way? Well, the short answer is “it depends” since A.I. is very application-specific. So, for any A.I. system that involves communicating with a human as a main part of its agenda it is important for it to be able to use natural language as much as possible. So, NLP is not only useful but also necessary. That’s something we observe a lot nowadays with chatbots, for example, A.I. systems geared at emulating human communication through a web API, in order to convey useful information or facilitate a certain action. Also, personal assistants like Cortana greatly depend on NLP to connect with their users. However, A.I. systems like the one in many strictly operational scenarios, such as autonomous vehicles, don’t really need NLP since they don’t communicate with the users, at least not as a primary function. This is bound to change in the future though, as it would be easier to market a vehicle that you can talk with, particularly in case of an unexcepted situation, such as an engine problem.
Naturally, the relationship of NLP and A.I. is as much essential as it is conditional. Still, as general A.I. is getting closer and closer, we should expect NLP being an inherent part of A.I. since such a system should be able to excel in pretty much every task that a human can undertake (as well as some tasks beyond our abilities, such as handling big data). So, instead of seeing NLP and A.I. as static entities (hedgehog-like approach), we ought to view them as co-evolving ones (fox-like approach) that at the present moment they have a co-dependent relationship. Still, a A.I. becomes more and more advanced, it is not far-fetched to expect NLP being just another module of an A.I. system, much like the linguistic center is part of the human brain, which is the primary center of intelligence.
What does all this mean for us, data scientists? Clearly, there is no point ignoring either one of these fields, even if our specialty lies in some other part of data science. So, at the very least we ought to be informed about what’s happening in NLP and how A.I. influences data science. We don’t need to write our own algorithms on these fields, but at one point we should be able to tackle an NLP problem, preferably using some A.I. method, or develop an A.I. system that makes use of NLP in the back-end. It may not be easy but as the relationship between NLP and A.I. becomes stronger, it’s bound to become something of a requirement in the near future.
I have talked in another post about the new kind of data science that is becoming more and more popular nowadays. Namely, there is a kind of data science that leverages A.I. via a framework known as Deep Learning. This is what I refer to as fringe data science, since it is without a doubt the state-of-the-art of the field. However, even though it’s so advanced that I may not be able to describe in a blog post, it is not without its issues. Namely, up until now it’s been limited by the languages involved in its implementation. So, if you want to use Theano, for example, you need to do so in Python. And although Python is a lovely and very versatile tool, it may not be your forte. So, what do you do? Well, now it seems that there is a system for ML and DL that doesn’t care about which language you use. This system, developed by Amazon, is called MXnet (pronounced: mix-net).
MXNet is not yet another system for deep learning or machine learning in general. It is a paradigm-shift kind of tech. What’s more, it embraces a number of different programming languages, such as C++, Python, R, Go, and Julia. In other words, you don’t need to be a developer to work it. Even us high-level coders who use programming to tackle data science tasks can make use of it. This is huge. With this system you can have a team of diverse professionals who can collaborate on projects via this platform. You don’t need to make your company a Python shop, or an R shop, for example. Also, if you have some data scientists in your company who are more fox-like and like to experiment with new programming technologies, such as Julia and Go (not to be confused with the popular strategy game), there is a place for them too!
So, what do you think? Is this new tech worth all the hype that Amazon scientists bring about with their articles? Is it a hype that some tech journalists have created to make money off their articles? Or is it an actually useful tech? Feel free to let me know in the comments below.
A.I. is great. There is no doubt about that. It’s been around long enough to be a respectable field of science and survive many years of skepticism, becoming more hands-on in the process. Nowadays, it’s been experiencing a Renaissance as it has become the favorite tool of many data scientists. Some people (not data scientists necessarily) even go so far as to claim that it will replace data science, as it is bound to automate the whole pipeline. Yet, whether it manages to replace the actual people involved in the data science process is still debateable.
Contrary to what the blind advocates of A.I. think, data scientists are not some mindless automatons who apply a formula until the hit an insight. In my experience, even the most mediocre data scientists out there has some intelligence and the know-how to apply it with some effectiveness. The aforementioned A.I. advocates probably never experienced that, as they tend to base their ideas on stuff they have read on some blog or some news article. Still, even though A.I. has displaced some of the traditional models that data scientists employ, there is more to the work a data scientist does than just crunching numbers. This is something that these A.I. fanboys fail to comprehend. This is probably beacuse this part of the data scientist’s work is not that appealing to the masses, so it rarely gets mentioned in those articles the A.I. fans are reading.
A data scientist’s role involves a lot of communication. That’s something that is yet to be accomplished by machines, even those running good A.I. systems on the back-end. Because communication is not just figuring out what the words you hear or read mean, it’s also about understanding intent and those subtle cues that are often in the words that are not there. I’d like to see an A.I. system handle that, especially when the communicator it has to understand is stressed out and fails to articulate properly what he expects, or if he is in the dark about what’s possible with the available data. A.I. is excellentfor NLP, but there is more to communication than this niche aspect of language-related data streams.
Moreover, a data scientist has to communicate the findings she comes up with or the roadblocks she encounters. Sometimes it takes several meetings to accomplish that and she needs to liaise with several other people in the company, many of whom are not data scientists and/or have a very limited view of the data at hand. Also, she needs to do that in a way that is succinct and comprehensible. Will an A.I. system be able to cope with that, within a reasonable timeframe? I doubt it.
So, without neglecting the value that A.I. adds and will continue to add to data science, it is important to manage our expectations of it. A.I. systems like the one in the movie “Her” may never become mainstream in the data science world, even if they do come about eventually. Say that company X invents such a system, do you honestly think that every company out there will be able to afford a license for it? If so in the beginning, for how long do you think it will remain affordable? These business-related aspects of technology may not be as exciting but they are as important as the technical ones. After all, someone has to pay the bills and that someone is not going to spend a lot of money on a system that may or may not be cost-effective.
A more realistic view of how things will be in the A.I.-imbued data science world is as follows. Most likely, A.I. will dominate in the data science pipeline, in those steps that can be automated. This will yield great efficiency, making the data scientist’s job somewhat different. So, instead of her focusing on building the models and fine-tuning them, she will concentrate on the more high-level aspects of the role. The A.I. is not going to replace her, but there is bound to be a synergy between the two players, with the human providing guidance and insight, while the machine takes care of all the low-level work. The future doesn’t have to be bleak like some Hollywood movies like to portray it (since that makes for a more interesting story). It can be something worth looking forward to, especially when it comes to data science.
They are here. They mingle with us. They are luring more and more eyeballs towards their direction. Don't worry, I'm not talking about any of the malign A.I. creatures that Hollywood films tend to protray. I'm referring to the DS videos I'm making and publishing to Safari Books Online, via Technics Publications. The latest one, "Data Science and A.I. - What's the Difference?" is now available on O'Reilly's digital media platform. Check it out when you have a moment.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.