The Role of Imagination in Data Science
Contrary to what many people will have you believe, imagination is not just for researchers and PMs, when it comes to data science. Every single aspect of a scientific project has some imagination in it, even the most mundane and straight-forward parts. As data science involves a lot of creativity, at least for the time being, it is not far-fetched to presume that imagination has an important role to play in the field.
By the term imagination I mean the conscious use of the mind for projecting new forms or perceiving forms that could be, but are not manifested. It is very different from the unconscious use of the mind, which is what psychologists refer to as fantasy, a fairly futile endeavor that frequents the undisciplined and immature minds. As data scientists we often need to see what is not there and create it if it’s useful, or find some way to deal with it if it’s a potential issue.
Of course there are those hard-core Deep Learning (DL) people out there who believe that with a good enough DL network you don’t really need to worry about all this matter. They advocate the idea that A.I. can take care of all this through the systematic and/or stochastic handling of all the possibilities in a feature set, yielding the optimum collection of features that it will then use for the task at hand. Although there is no doubt that a good enough ANN can do all that, it still doesn’t solve all the potential issues, nor does it make the role of a human being unnecessary. Just like a good motorbike can alleviate a lot of the hard work required for getting from A to B, it still doesn’t eliminate the need for someone who steers the vehicle and keeps it safely on the road.
Imagination is our navigator in many projects and although it often lends itself to feature engineering and other data engineering tasks, it is also useful for something else that no A.I. has managed to achieve yet: the development of hypotheses and a plan of action based on the data at hand. Data science is not all about getting a model working and coming up with some good score in a performance metric. This is just one aspect of it, the one that Kaggle focuses on, in order to make its competitions more appealing. However, a large part of the data science work involves exploring the data and figuring out what kind of insights it can yield. A robust A.I. system can be an invaluable aid in all that, but we cannot outsource this task to it, no matter how many GPUs we use or how slick the training algorithms the system employs. Just like an organization cannot function properly if its members are all complete imbecils (the components an automated DS system comprises of), a DS project needs some higher intelligence too (the equivalent of a competent manager in the aforementioned organization).
We need to set goals in our project and foresee potential problems and opportunities, before we come to that part of the pipeline, otherwise we risk having to go back and forth, wasting valuable resources. So, even though focused and meticulous work is essential, being able to step back and see the bigger picture is equally important. That’s why oftentimes a data science endeavor is handled by a team of professional, with the data science lead undertaking that role. So, if you want to make things happen in data science, something that an A.I. is unable to undertake, you need to use imagination. The latter, along with the systematic aspects of the role, can lead you to the desired outcome of your data science project, be it insights or a data product. Imagine that!
Portrait of the Faux Data Scientist
The other day I was talking with an acquaintance of mine who is the CEO of a local startup in London and I was astonished to discover that the faux data science trend that plagues the West Coast of the US is in London too. The British capital is not only the home of the top A.I. startup, Deep Mind, which was acquired by Google lately, but it’s also the place where a great deal of data scientists have come about. Also, it prides itself for its pragmatism and for how grounded it is, especially when it comes to science. Still, somehow many of the data science practitioners in this great city are what I call faux data scientists, professionals who use the term “data scientist” on their business card, even though they have no real relation to the field.
Contrary to a real data scientist, the profile of whom I describe in my first book, a faux one is both confusing and confused. A (true) data scientist focuses on predictive analytics, usually through the use of ML systems and lately systems powered by A.I., even though he also makes use of Statistics in various ways. A faux data scientist, on the other hand, relies mainly on Stats and some of the most rudimentary ML models (though he may use ANNs too, without bothering to configure them properly or even read up on the corresponding scientific literature). While a data scientist relies on science to obtain insights and makes use of various methods for communicating her results, a faux one creates pretty plots that may or may not convey any real findings, though they may impress his audience. A faux data scientist usually outperforms the real data scientist in another thing: BS talk. The real data scientist tends to be more humble and veers away from extravagant claims about what the data can yield. This is particularly true if he comes from an academic background. However, the faux data scientist has no inhibitions when it comes to making excessive promises and delivering insights that would qualify for a Nobel prize, if they held water. In other words, the faux data scientist is full of hot air, but manages to hide all the BS of his methodology behind fancy talk brimming with buzzwords and anything else he could come up with in order to convince (or please, rather) his audience.
Unfortunately the damage that a faux data scientist goes beyond his personal work. Given enough time, the managers of the corresponding projects will see through all the BS this “data scientist” does. That’s the time when the faux data scientist will probably leave or go about to start his own company. However, the loss of confidence in the profession is bound to linger. And even though it took years of hard work and equally hard research in the field to build this confidence, it’s not as strong as it needs to be in order to sustain this kind of damage. Of course the faux data scientist doesn’t care because he’s in it for the money, the reputation, or whatever other personal gain his ambitions dictate. He hasn’t done any research on the science behind the techniques and is adept only at applying other people’s work, through the myriad of Python and R packages that are out there. But it’s not all bad. As he is bound to talk his way into all sorts of situations, once the field no longer serves his purposes, he is bound to jump ship to some other field (whatever is trendy at that time) and never look back.
However, the faux data scientist is not to blame entirely for all this. She is just taking advantage of the situation, particularly the fact that the hiring managers look for 1) x years for experience in the field, experience they are unable or even incapable of assessing accurately, and 2) someone with “excellent communication skills”, especially when it comes to showcasing projects brimming with eye candy and buzz words the management will recognize. So, unless we start seeing through the BS of the faux data scientists and treat them the way they deserve, this situation is not going to go away any time soon…
Static, Dynamic, and Versatile Intelligence and How They Apply to Data Science
Many people have developed different taxonomies for intelligence over the years, focusing on its function and its objective. Most of us have heard of the logical/mathematical intelligence, for example, which is the one responsible for handling all the operators that our math teacher seems to care about, or the operators related to solving a Sudoku puzzle. The linguistic intelligence is of a different type, as it focuses on words and the semantics of language, an intelligence that all literary writers have traditionally been adept at. Then there is also EQ and SQ, intelligence types that focus on the emotional and the spiritual aspects of our lives respectively. However, so far, there hasn’t been a taxonomy that deals with the form of intelligence, its rudimentary essence, at least not to the best of my knowledge.
According to this paradigm of intelligence, there are three main types: static, dynamic, and versatile intelligence. This may be apparent, considering the three-fold structure of the human brain. What may not be as apparent though is how these are directly related to data science, since if done properly, data science is an intelligent application of intelligence.
Static intelligence is the intelligence of the hedgehog. It involves mental structures that are established, reliable, and fool-proof. Most of science is actually done using this kind of intelligence since it aims to discover and apply knowledge that is accountable and void of surprises. This intelligence is the one that you develop reading books and watching educational videos (wink-wink!), while it is highly valued in most educational institutions. Intelligence of this type doesn’t change much over one’s life, since it is very passive, in a way.
The exact opposite of static intelligence is dynamic intelligence (duh!), the inteligence of the fox. It entails mental structures that are in constant flux and are experimental, risky, and unpredictable. Most of scientific innovation and all processes that employ creativity are based on this kind of intelligence. Dynamic intelligence aims to bring about new ideas and concepts for further evaluation and seek new processes for handling existing information streams. Without this kind of intelligence any kind of feature engineering would be impossible, while coming up with a robust and unconventional model would be a stretch, even if you have many years of experience in data science. Intlligence of this type is quite volatile, evolving, and very active, in a way.
Something between and beyond these intelligence types is versatile intelligence, aka the dolphin-like approach to things. This is basically the intelligence of all successful inventors, both the known and the unknown ones. Also, I believe that the first pioneers of data science employed this kind of intelligence when they lay the foundations of the field. Versatile intelligence is concrete, yet playful, accountable, yet experimental, formal, yet also chaotic. Most science leaders, be it researchers or practitioners, rely on this kind of intelligence in their everyday lives. Also, entrepreneurs seem to employ this sort of thinking, particularly if they are committed to their work. All major innovations in data science (and science in general) are based in one way or another on versatile intelligence.
Although there is a certain charm in correlations and finding links among different classes of people with different classes of intelligences, I will refrain from indulging into this intriguing yet futile endeavor. The reason is that we all have all three types of intelligence, even if we tend to express one over the others, because of our life requirements and goals. Intelligence is just like water, fluid and able to take whatever form we ask it to, so applying a static approach to this taxonomy, although interesting, is not realistic. So, when we need to do some ETL work with our data, we need to apply static intelligence mainly. In other parts of the pipeline (e.g. data modeling), dynamic intelligence may lend itself better. And when it comes to conveying our findings via insight deliverance or the creation of a data product, employing the versatile intelligence would make more sense.
Note that all this is my own approach to the topic and may not have any scientific literature to back it up whatsoever. However, in practice it has worked well so far. What about you? What does your intelligence tell you about the nature of intelligence?
The Paradox in Data Science
Even though data science is considered to be a “sexy” profession as it has a lot of demand, there seems to be a shortage of “good” data scientists, those few individuals that deliver what the field promises, even if those promises are quite unrealistic at times. One would expect that the more data scientists there are out there, the more adept ones will be available. However, this is not what’s observed! For that we can either blame the media (maybe there is a fake news article out there!) or we can delve deeper into the problem and find out why this phenomenon takes place.
The main issue of this whole “paradox” of sorts is that we assume some kind of normal-like distribution of data science competence, for some reason. However, this kind of distributions are rarely encountered in cases like that, when you have a more Zipfian phenomenon, where a few cases make up the majority of the area under the distribution. Just like a small number of websites account for the majority of the traffic on the web, a small number of data scientists account for all the glory and all the contribution to the field. Before you start thinking that data science is an elitist society of sorts, let’s consider how other scientific fields are. You have a few talented and/or hard-working individuals who make the headlines, and a lot of other, more average ones, who you’ll never hear about, unless you frequent the conferences they go to. The difference is that most scientists have some academic position so they make ends meet somehow, even if in the majority of cases they are not paid nearly enough for the work they do, or the research they contribute to their field. However, the data scientists who fail to stand out end up being unemployed, getting absorbed in some odd jobs vaguely related to data science, or they end up spending all their time doing Kaggle competitions. That’s not because they are inferior in any way to their academic counterparts, but simply because the standards in the industry are (much) higher, so if you don’t bring enough value to improve the bottom line, you are unlikely to linger in an organization.
However, being average in a field that is rapidly evolving is not only acceptable but quite natural. Unless you are super motivated, you’ll have to make a choice about what you will focus on. No-one can be a great programmer, an excellent analyst, and an adept in big data tech, while at the same time have a silver tongue that will charm everyone in a meeting room. These imaginary data scientists, who are often referred to as unicorns, are not around, though it is possible that they will eventually come about, once there is enough infrastructure in place to allow for someone to evolve into such a multi-faceted professional. Until then, the closest you can get are some super talented data scientists who would probably be successful no matter what tech role they would undertake. Since this kind of people are exceptions, it is natural that they wouldn’t come about as often as we would want them to. Also, after a certain level of competence, such a professional would not look for a data science position at all. A person like that who is tech savvy and also adept in the ways of the business world, would probably evolve into an entreperneur and start his own company. After all, data scientists tend to be smart, so they wouldn’t settle for a salary, no matter how high, if they could hit the jackpot of a successful startup.
After considering the situation from a couple of different angles, it doesn’t seem so paradoxical after all. Data scientists will be many, but the few exceptional ones that many companies have come to want to recruit will always be in short demand. Now, we could start blaming this or the other factor that brings about this phenomenon, or we could start having more realistic expectations about the data scientists out there. It may not be easy since they are an expensive resource, but in the long run, it’s probably the only sustainable option.
What We Can Learn from an A.I.?
It is often the case that we treat a new A.I. as a child that we need to teach and pay close attention to, in order for it to evolve into a mature and responsible entity. However, a fox-like approach to this matter would be to turn things around and see how we, as human beings, can learn from an A.I., particularly of a more advanced level.
Of course A.I. is still in a very rudimentary stage of its evolution so it doesn’t have that much to teach us that we can’t learn from another human being. However, that wise human who would be a great mentor is bound to be bound by his everyday commitments, personal and professional making him inaccessible. Also, finding him may take many years, assuming that it is even possible given our circumstances. So, learning from an A.I. may be the next best thing, plus we don’t have to deal with personality-related impediments that often plague human relationships, even the more professional ones.
An A.I., first and foremostly is unassuming. This is something that we can all develop more, no matter how objective we think we are. A.I. doesn’t have any prejudices so it deals with every situation anew, much like a child, making it more poised to finding the optimum solution to the problem at hand. That’s something that is encouraged and often practiced in scientific ecosystems, like research centers and R&D departments, where the objective is so important that all assumptions are set aside, at least long enough for this approach to yield some measurable results.
A.I.s also tend to be very efficient, minimizing waste and unnecessary tasks. They don’t care about politics or massaging our egos. Their only focus is maximizing an objective function, given a series of restraints and, whenever it is applicable, take actions based on all this. If we were to act like that we’d definitely cut our time overheads significantly since we’d be concentrating more on results rather than pleasing some person who may have some influence over us professionally or personally.
A third lesson we could get from A.I. is organization. Although we most certainly have organization in our lives to some extent, we have a lot to learn from the cool-headed A.I. that employs an organizational approach to things. An A.I. tends to model its knowledge (and data) in coherent logical structures, immune to emotional or otherwise irrational influences. It deals with the facts rather than its interpretations of them. It builds functional structures rather than pretty pictures, to deal with the inherent disorder that its inputs entail. It makes graphs and optimizes them, rather than graphics that are easy on the eyes (although there is value in those too, in a data science setting). Clearly we don’t have to abandon our sentimental aspects in order to imitate this highly efficient approach to problem-solving, but we can try to be more detached when dealing with our work, rather than let sentimental attachments and eye candy exercise influence over our process.
Perhaps if we were to treat A.I. as a potential teacher of sorts, in the stuff it does well, it wouldn’t seem so threatening. Maybe feeling scared of it is merely a projection of ours, an objectification of our inherent fear of our own minds, which is still largely uncharted territory. A.I. doesn’t have an agenda and is not there to get us. If we treat it as an educational tool, it may prove an asset that will bring about a mutually beneficial synergy. It’s up to us.
People like to work together, even if they don’t always admit it. Just like bees, we enjoy collaboration, especially if this entails some bonding too. This is sometimes depicted with the term Honeybee Effect, which has applications in every endeavor that can accommodate teamwork. However, we tend to ignore that collaboration is sometimes not only preferable but also essential, particularly in more challenging projects, such as digging up insights in large and diverse datasets. So, let’s see how all this idea can yield some honey.
The honeybee effect is all about people working together on a project and doing so in an intelligent manner. This usually brings about a result that is objectively better than the best that any single member of the group would be able to deliver on their own. You don’t have to be in a modern and foxy framework like Agile in order to have the honeybee effect though. In fact, you can observe it anywhere where there is some intelligence involved in people’s collaboration. Interestingly, if you have people of average competence working in a honeybee fashion, you would expect them to outperform a group of very competent people working in a show-off fashion. This is why most successful organizations prefer team players rather than solitary geniuses, to man their work posts.
When it comes to data science, it is easy to see that there are various tasks corresponding to different parts of the data science pipeline. Naturally, these tasks can be undertaken by different people. If these people work together in a way that embodies the honeybee effect, it is quite likely that the team is going to be as good as a super competent data scientist doing everything by herself. The upside of this is that you wouldn’t have to pay some diva data scientist a very high salary and have the fear that he may take off when Google opens up a new data scientist position in one of its campuses. The downside of all this is that getting a group of people to work harmoniously is a very challenging task, even if the people themselves are willing to do so. There is always the need to organize and lead these people, something that most managers of data science teams find quite challenging, even if they are experienced in management. This is probably why there is such a high demand in chief data scientists, team leaders who are adept in the craft themselves. It’s possible of course to be led by a business person who is not trained in data science but such a leadership would be lacking in mentorship and technical aid. The latter can be fixed by involving a consultant in the whole process. As for the former shortcoming, well, the jury is still out on that one…
So, what does it take to cultivate the honeybee effect in a team and individually? Well, communication is the most obvious prerequisite. By communication we mean being able to express yourself adequately and, most importantly, understand what others want to say, without having to spend five hours in a meeting with them. Another factor of the honeybee effect is being aware of your strengths and limitations (or of everyone’s strengths and limitations, if you are the leader of the team). This will allow you to offer and accept help from your teammates. Finally, in order to get the honeybee effect going, you need to be able to own whatever you undertake and handle it professionally. This doesn’t mean that you won’t ask for help if you can’t deal with a particular issue in the data or the model you work with. However, you need to be independent in whatever task you undertake and rely primarily on yourself to get it done.
There are other aspects of the honeybee effect that you need to develop, of course, but these are the most important ones, in my experience. What about you? What factors of the honeybee effect do you observe and how would you incorporate them in your skillset?
Natural Language Processing, or NLP for short, is a very popular Data Science methodology that has gained a lot of traction over the past few years as more and more companies have realized that it’s easy to get access to text data and use it to derive valuable insights. Twitter, for example, offers a rich data stream which when processed properly, it can yield a lot of insights on a particular topic or brand, using just NLP as a paid resource. However, NLP wouldn’t have gone far if it weren’t for A.I., since the latter allows it to go beyond the rudimentary statistical models that NLP has in its toolkit. So, what’s the relationship between these two fields and how is it expected to evolve in the years to come?
Just to clarify NLP is so much more than just running a Bayesian classification system, or a regression model on text data that’s been encoded into binary features. NLP also involves topic discovery, text similarity, and even summarization of a document, among other things. All these tasks would be extremely difficult, if not impossible, if it weren’t for A.I. So, NLP is at least partly dependent on A.I., at least for applications that a really worth a data scientist’s time. Of course A.I. is not there is displace Stats, but rather complement this more formal approach. Think of it as the fox that works side-by-side with the hedgehog against a common enemy, rather than two animals fighting each other for dominance.
What about the other side of the relationship? Does A.I. need NLP in any way? Well, the short answer is “it depends” since A.I. is very application-specific. So, for any A.I. system that involves communicating with a human as a main part of its agenda it is important for it to be able to use natural language as much as possible. So, NLP is not only useful but also necessary. That’s something we observe a lot nowadays with chatbots, for example, A.I. systems geared at emulating human communication through a web API, in order to convey useful information or facilitate a certain action. Also, personal assistants like Cortana greatly depend on NLP to connect with their users. However, A.I. systems like the one in many strictly operational scenarios, such as autonomous vehicles, don’t really need NLP since they don’t communicate with the users, at least not as a primary function. This is bound to change in the future though, as it would be easier to market a vehicle that you can talk with, particularly in case of an unexcepted situation, such as an engine problem.
Naturally, the relationship of NLP and A.I. is as much essential as it is conditional. Still, as general A.I. is getting closer and closer, we should expect NLP being an inherent part of A.I. since such a system should be able to excel in pretty much every task that a human can undertake (as well as some tasks beyond our abilities, such as handling big data). So, instead of seeing NLP and A.I. as static entities (hedgehog-like approach), we ought to view them as co-evolving ones (fox-like approach) that at the present moment they have a co-dependent relationship. Still, a A.I. becomes more and more advanced, it is not far-fetched to expect NLP being just another module of an A.I. system, much like the linguistic center is part of the human brain, which is the primary center of intelligence.
What does all this mean for us, data scientists? Clearly, there is no point ignoring either one of these fields, even if our specialty lies in some other part of data science. So, at the very least we ought to be informed about what’s happening in NLP and how A.I. influences data science. We don’t need to write our own algorithms on these fields, but at one point we should be able to tackle an NLP problem, preferably using some A.I. method, or develop an A.I. system that makes use of NLP in the back-end. It may not be easy but as the relationship between NLP and A.I. becomes stronger, it’s bound to become something of a requirement in the near future.
I have talked about the value of a mentor in data science in a previous post. The thing is that even the best mentor in the world is bound to be ineffective if she is working with someone who is not embodying the protege role to a decent degree. But what does it mean to be a protege and how is that relevant in the path of development as a data science professional?
Let’s start by what a protege is not, since that’s more straight-forward and it is often a misconception in people’s minds, regarding this topic. So, a protege is not someone who passively receives knowledge and know-how from a mentor, nor is it someone who obeys blindly the instructions of his guide. A protege doesn’t have to be a helper either to the person who is mentoring him, although it is not unheard of. Also, a protege is not bound to given mentor, since he may be learning different things in his life or career, requiring a number of mentors.
A protege is more of a person willing to learn, mainly through his own efforts, yet open to guidelines by people more experienced and more knowledgeable than himself. A protege teaches himself and makes use of his mentor’s suggestions through an intelligent assimilation of them and through a constantly refined comprehension of the stuff he is working on. The mentor is more of a leader figure, who inspires, rather than demands, leading by example. The protege is humble enough to listen to her before judging the validity of what he hears and makes an effort to understand before choosing to go with it, or discard it. We can think of a protege like a bee, bound to a goal, but with the freedom to go about it in the most efficient way he comes up with. Also, if he decides to be an assistant of sorts to the mentor (usually in a company setting, where there is a more formal work relationship between the two), it is out of free will, rather than obligation.
Finally, it is important to note that the mentor is not a know-it-all so if she is true to herself and values mentoring, she is also a protege. Also, the protege himself may also be a mentor to someone else, perhaps some intern in his team. And since no mentor is adept in everything, it is quite common for someone to have several mentors throughout one’s life. In data science, for example, you may have a mentor to guide you through the whole pipeline of insight-derivation and data product development. However, you may find that you want to delve deeper into programming and choose to have another mentor in that aspect of the craft. Also, you may be into other activities, like creative writing and find that you need a different mentor there. So, it’s good to keep an open mind about the whole mentor-protege relationship.
What is your experience in being a protege? What would you expect from a mentor to make the most of your time with them? Where do you see the most value in being a protege?
I have talked in another post about the new kind of data science that is becoming more and more popular nowadays. Namely, there is a kind of data science that leverages A.I. via a framework known as Deep Learning. This is what I refer to as fringe data science, since it is without a doubt the state-of-the-art of the field. However, even though it’s so advanced that I may not be able to describe in a blog post, it is not without its issues. Namely, up until now it’s been limited by the languages involved in its implementation. So, if you want to use Theano, for example, you need to do so in Python. And although Python is a lovely and very versatile tool, it may not be your forte. So, what do you do? Well, now it seems that there is a system for ML and DL that doesn’t care about which language you use. This system, developed by Amazon, is called MXnet (pronounced: mix-net).
MXNet is not yet another system for deep learning or machine learning in general. It is a paradigm-shift kind of tech. What’s more, it embraces a number of different programming languages, such as C++, Python, R, Go, and Julia. In other words, you don’t need to be a developer to work it. Even us high-level coders who use programming to tackle data science tasks can make use of it. This is huge. With this system you can have a team of diverse professionals who can collaborate on projects via this platform. You don’t need to make your company a Python shop, or an R shop, for example. Also, if you have some data scientists in your company who are more fox-like and like to experiment with new programming technologies, such as Julia and Go (not to be confused with the popular strategy game), there is a place for them too!
So, what do you think? Is this new tech worth all the hype that Amazon scientists bring about with their articles? Is it a hype that some tech journalists have created to make money off their articles? Or is it an actually useful tech? Feel free to let me know in the comments below.
A.I. is great. There is no doubt about that. It’s been around long enough to be a respectable field of science and survive many years of skepticism, becoming more hands-on in the process. Nowadays, it’s been experiencing a Renaissance as it has become the favorite tool of many data scientists. Some people (not data scientists necessarily) even go so far as to claim that it will replace data science, as it is bound to automate the whole pipeline. Yet, whether it manages to replace the actual people involved in the data science process is still debateable.
Contrary to what the blind advocates of A.I. think, data scientists are not some mindless automatons who apply a formula until the hit an insight. In my experience, even the most mediocre data scientists out there has some intelligence and the know-how to apply it with some effectiveness. The aforementioned A.I. advocates probably never experienced that, as they tend to base their ideas on stuff they have read on some blog or some news article. Still, even though A.I. has displaced some of the traditional models that data scientists employ, there is more to the work a data scientist does than just crunching numbers. This is something that these A.I. fanboys fail to comprehend. This is probably beacuse this part of the data scientist’s work is not that appealing to the masses, so it rarely gets mentioned in those articles the A.I. fans are reading.
A data scientist’s role involves a lot of communication. That’s something that is yet to be accomplished by machines, even those running good A.I. systems on the back-end. Because communication is not just figuring out what the words you hear or read mean, it’s also about understanding intent and those subtle cues that are often in the words that are not there. I’d like to see an A.I. system handle that, especially when the communicator it has to understand is stressed out and fails to articulate properly what he expects, or if he is in the dark about what’s possible with the available data. A.I. is excellentfor NLP, but there is more to communication than this niche aspect of language-related data streams.
Moreover, a data scientist has to communicate the findings she comes up with or the roadblocks she encounters. Sometimes it takes several meetings to accomplish that and she needs to liaise with several other people in the company, many of whom are not data scientists and/or have a very limited view of the data at hand. Also, she needs to do that in a way that is succinct and comprehensible. Will an A.I. system be able to cope with that, within a reasonable timeframe? I doubt it.
So, without neglecting the value that A.I. adds and will continue to add to data science, it is important to manage our expectations of it. A.I. systems like the one in the movie “Her” may never become mainstream in the data science world, even if they do come about eventually. Say that company X invents such a system, do you honestly think that every company out there will be able to afford a license for it? If so in the beginning, for how long do you think it will remain affordable? These business-related aspects of technology may not be as exciting but they are as important as the technical ones. After all, someone has to pay the bills and that someone is not going to spend a lot of money on a system that may or may not be cost-effective.
A more realistic view of how things will be in the A.I.-imbued data science world is as follows. Most likely, A.I. will dominate in the data science pipeline, in those steps that can be automated. This will yield great efficiency, making the data scientist’s job somewhat different. So, instead of her focusing on building the models and fine-tuning them, she will concentrate on the more high-level aspects of the role. The A.I. is not going to replace her, but there is bound to be a synergy between the two players, with the human providing guidance and insight, while the machine takes care of all the low-level work. The future doesn’t have to be bleak like some Hollywood movies like to portray it (since that makes for a more interesting story). It can be something worth looking forward to, especially when it comes to data science.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.