So, when I was in the US recently, I interviewed with some people from a Podcast geared towards SW engineering and data science topics (with some A.I. stuff too). This interview, which constitutes a whole episode on that podcast, covered various topics related to both data science as a field and some specific aspects of it that can help someone embrace it as a practitioner / professional in it. The podcast episode is now online and freely available. Although it's by no means a thorough coverage of the field of data science, or even the topic of the mindset related to it, it's a good introduction to it, engaging enough to keep your commute somewhat more interesting than listening to the radio. Enjoy!
How the Use of A.I. in the Road-based Logistics and Transportation Can Be Smooth and Congruent with the Current Status Quo
People talk a lot these days about how self-driving cars will solve all of our logistics and transportation related problems when they finally hit the roads. The thing is that the problems they are trying to solve are not as simple, nor is their adoption going to be as easy as these idealistic people think. Although there is nothing wrong with dreaming of a better future, free of traffic and avoidable accidents, it’s also important to look at this matter from a more realistic point of view.
First of all, the self-driving car needs to be re-examined. The idea of a car completely autonomous is a long ways from manifestation, even if there are A.I. systems out there that can navigate a car effectively over large distances. However, considering that these A.I. drivers will become the norm in the foreseeable future is quite unrealistic. The reason is simple economics. These systems are going to be very expensive, so they will naturally appeal only to a small part of the population. Also, as they gradually become more affordable, they will push down the price of conventional vehicles, making the latter more appealing. This is dynamic systems 101, something that apparently many of these visionaries of the self-driving cars are not that familiar with, just like they don’t understand people that well. If Joe and Jane find that this new self-driving car costs 50% more than the car they’ve been dreaming of for the past 5 years, because that particular make of car has been around forever and that model has been heavily advertised ever since they can remember, they will probably go with the conventional car, even if the self-driving car is an objectively better choice in general.
However, if A.I. systems in cars were to adopt an auxiliary role, much like Elon Musk envisions for his Tesla vehicles, then they have a chance. After all, not many people are willing to give up control of their cars just yet. This is evident when you talk with competent drivers who have been outside the US. These people take a strong interest in the stick-shift cars, since it gives them more control over the car, making them feel better about their role as drivers. Also, stick-shift cars are more economical, require less maintenance in terms of the transmission (e.g. no transmission fluids), and are generally quite reliable (as much as their automatic counterparts). Unless of course you never learn how to use the clutch, which is another matter!
If self-driving cars are self-driving only at certain times when the driver chooses to (e.g. in the case of a long road trip, or a mundane commute over I-90), then they can definitely add value. However, if they are entirely self-sufficient with no potential input from the human in the driver’s seat, then they are less likely to gain people’s trust, apart from those prejudiced towards their inherent value. Whatever the case, it is interesting to see how this new trend will evolve and what kind of data it will bring about for data science professionals to analyze!
People talk a lot these days about what it takes to be a good data scientist and how if you do their boot camp or join their course you will acquire that and make yourself stand out from the data scientist pool. Some of these people may be on to something but they generally focus a lot of specific skills and general abilities. That’s fine if you have the time to study what they are saying and find for yourself what you need. However, if you just want a single idea that is in the root of all the stuff they talk about, that’s something few can share with you, because they probably don’t know.
There are data scientists know, however, what it takes to be a good data scientist and many of them have already embodied this in their careers. Yet, they are so busy applying this that they don’t go out of their way to let you know, unless of course they are into education, in which case they will probably mention it in their books or videos.
One feature that I’ve found it succinctly summarizes what it takes to be a good data scientist, regardless of your domain or your specialization, is persistent engagement in the craft. Let’s break this down a bit, since it’s a fairly complex feature (a meta-feature if you will). This comprises of two things working in tandem: persistence and engagement. The first has to do with a sense of rhythm and commitment. All decent data scientists are very focused on what they are doing, even if they are involved in other things (e.g. 90-95% of my work is around data science, though I’m also involved in Cyber Security and to a smaller extent, in Neuroscience). Also, we tend to practice data science in one way or another very regularly. In other words, it is part of our daily routine. That’s all manifestations of consistency.
As for engagement, that is more of an inner state, an aspect of the mindset of a good data scientist. It involves being fascinated by the craft, even if it may seem that it doesn’t have any secrets from you any more. The thing is that there are always new things to learn, especially over time as it evolves and new methods and techniques come about. Engagement is akin to what is known in Zen as the “beginner’s mind” which is a certain approach to things as if they are completely new to you. Coupled with the experience and expertise that a good data scientist has, this approach allows him to go more in depth regarding the field and find new ways to bring about value through data science. It also involves coming up with new models, new processes for data engineering, and in some cases, new data products.
Consistent engagement in data science doesn’t require particular talent or experience, however. Everyone can (and ought to) embrace it. So, instead of trying to memorize the inner workings of some obscure model, just because someone else says so, try cultivating this trait first. Afterwards, everything else will appear easier and more interesting, just like new know-how appears intriguing and within reach, to a novice that has a genuine thirst for learning. After all, there are many ways to achieve mastery of the craft, but they all go through consistent engagement.
If you are looking into a way to hide those ultra-secret blog articles before they hit the web, or those intimate poems of yours, then you may be interested in this Cyber Security methodology called Steganography.
This video I made that was recently published on Safari, takes you through the basics of Steganography and provides you with enough know-how to appreciate it, as well as with some tools you can use to hide your important documents from the unsuspecting eavesdroppers out there. Check it out when you have the chance!
For those of you celebrating Thanksgiving, I just wanted to wish you all a happy Thanksgiving weekend! There are lots of things to be thankful for and studies have shown that gratitude is linked to happiness. So, even if it doesn't always seem like it, this is a holiday for happiness (not just getting some good deals on Black Friday!). What are you thankful for in your life (apart from being into the fields of data science and A.I. of course)?
Sometime in October, one of the Foxy Data Science readers contacted me with a question/suggestion about this topic. As I hadn’t really thought about it much, I decided to look into it and write a blog post about it. I’m not an expert in AEI, but I believe I know enough about A.I. in general and about the business world to venture an insightful view on the matter. At the very least, it can trigger some interesting contemplation in you.
Artificial Emotional Intelligence is a kind of A.I. that emulates the EQ aspects of our mental process. In other words, it is machines that know (to some fairly limited extent) how to exhibit qualities that fall on the intersect between intelligence and emotional maturity, aka EQ. By the way, I do not believe that EQ is more important than IQ, nor that it is any less important. Both are equally useful and neither can be a substitute for SQ (moral intelligence), which is a truly superior kind of intelligence. This, however, could be the topic of another blog post…
Considering the possibility of computers and machines in general, emulating empathy and other traits that are under the EQ umbrella seems a bit futuristic. However, there are already A.I. systems that do just that. Not only that, some of them are quite successful, particularly in psychology roles, even more so than their human counterparts (link to some interesting research by USC).
Could this be the end of EQ-based professions? Probably not, though these people may start considering offering something more than just listening and nodding, if they are to stand out from their AEI competition. Naturally, psychology is so much more than helping someone vent about their issues and showing them that there are more constructive ways to dealing with their problems, something that AEIs may be able to do equally well. That’s why this whole AEI business may be an incentive for these professionals to expand their profession and turn their sessions into something more, something AEIs may not be able to mimic (for the time being). Art therapists, for example, seem to do just that, combining the benefits of conventional psychology with that of an art form (usually music, painting, or dance).
AEIs may be nothing more than a novelty now, but it very poignantly points to the possibility of new forms of A.I. that the original pioneers of the field may not have thought of. Movies like “Her” may be science fiction but for how long? These are interesting things to think about, since A.I. just like natural intelligence, can take many forms, not just the ones that we are more inclined to investigate so far. Surely Deep Learning may still be the most relevant A.I. for data science, but it doesn’t hurt to consider other ways that a machine can benefit the world through A.I. After all, there is much more to life than predicting a hand-written digit with high accuracy. Maybe in the years to come there will be AIs that can look at your handwriting and not only understand it, but also figure out if you are going through a difficult time in your life and require solace and comfort. We definitely live in interesting times!
“I have never let my schooling interfere with my education.” (quote believed to be originally by Mark Twain)
People talk about education a lot these days, particularly in a data science setting. However, we need to discern between actual education and training. Both are essential, but it is the former that holds the most value. The latter is easier and oftentimes faster, but it may not be a good investment of your time if it is not accompanied by the former.
Education is all about mindset development and the ability to feel inspired from knowledge, thereby developing a healthy yearning for it. It is what happens when you teach a child how to play a game, or do a specific task. Although it’s more of a state of mind than anything else, education also has a formal aspect to it which is related to courses, seminars, workshops and talks, geared towards enhancing one’s understanding and comprehension of the topic at hand.
Training on the other hand is more geared towards techniques, methods, and the technical details of the topic taught. This is useful, of course, since every data scientist needs to know all these things. That’s why there are so many data science books and videos out there! However, knowing how to build an SVM or a neural network doesn’t make someone a competent data scientist. In fact, in some cases it doesn’t make him even an employable one.
Perhaps there is a reason why most companies require X years of experience in their recruits. Some things in data science you can only learn through time, by practicing them and by developing an intuition for the data and how it is processed. Although the idea that a data scientist has to have X years of experience to be worthy is something that remains debatable (why X and not Y?), this trend shows that hiring managers can spot a difference between someone who knows data science from a book (or videos) and someone who knows the craft because she has worked the data and has developed a bunch of models, through lots of trials and the inevitable mistakes that ensue.
Education is therefore something that can be attained through experience, not just reading and watching data science material on the Safari platform. The latter can be a great start, but you still need to get your hands dirty and also think about the whole thing, instead of just following recipes, from a data science cookbook. It’s important to know techniques, no doubt, but unless you have developed an understanding that allows you to go beyond these techniques and explore alternative features and alternative models, you may never grow beyond the advanced beginner stage.
Even someone who has spend most of his life in data science can still learn about this field, as it's a) very diverse and wide-spread, and b) always evolving. Personally, I still find that I’m learning new things as I delve deeper into the field and as I converse with other data scientists and A.I. professionals, of all levels. This too can be a form of education, not any less valuable than the education of creating a new data analytics method, or a new data product. The moment someone starts looking down on education and thinks that he knows “enough” is the moment he begins becoming obsolete.
We often tend to forget that at the end of the day, data science is a business process and that data is a business resource. Whether this business is a for-profit or a non-profit is irrelevant. The essence of the whole thing is that data science is not a typical scientific field. In fact, some would argue that it’s not a “real science” at all since it is so attached to the business world. Although these people would probably view this as a defect of the craft, I tend to look at it from a very positive aspect. After all, what constitutes a real science is often a matter of debate.
Sometimes it’s easy to get carried away and focus on data science too much, losing sight of the applications of it. Although this is something somewhat common in an academic setting (particularly in universities that don’t have any ties to the industry), it may happen in companies too. When this happens, it’s usually best to walk away, since data science without any real-world application can be problematic.
Data science and A.I. that’s geared towards data analytics, involve a lot of scientific methodologies, which are quite interesting on their own. This may urge someone to get lost in that aspect of the craft and neglect the application part, particularly the one where these methodologies are employed for solving real-world problems. That’s not to say that doing data science research is bad. Quite the contrary. However, when the research is without any application, focusing too much on the math side of things, it is bound to be a waste of resources (unless you are doing this as part of a research project, e.g. for a research center or a university, in which case this is expected). The reason is that data science is by definition an applied field, much like engineering. Particularly when it is undertaken by a company (e.g. a startup), it needs to be able to deliver something concrete, and more importantly, something useful.
It’s hard to over-estimate the value of this aspect of data science that has to do with the end-user. After all, this person is often the one paying the bills! Also, focusing on the application part of the craft enables something else too: the more practical implementation of the technologies developed and the inception of new methods that are more hands-on and therefore useful. This is one of the reasons that data science has veered away from Statistics, a field which is by its nature more theoretical and more math-y than applied Science. That’s also the main reason why data science involves a lot of programming, oftentimes building things from scratch, even if it’s simple scripts. That’s quite different than using an all-in-one software package, like SAS or SPSS, where the user merely calls functions and does rudimentary data processing.
You can come up with ingenious methods in data science, that would be able to fetch a journal publication or two. However, if these methods don’t add value to an organization, they are not that great, from a holistic standpoint. This is observed in other parts of Science too, e.g. Electromagnetism. Despite the various theoretical aspects of that field, its usefulness is also apparent. People who practice this part of Physics tend to be very practical and oftentimes come up with interesting inventions that add value to their user (e.g. in the case of electromagnets, or power transformers). Data science is not any different.
All the clever mathematics behind a method may be enchanting for the mind, but it’s when this method is put into practice and yields some oftentimes actionable insight when it really becomes meaningful. That’s something worth remembering, since it’s easy to lose sight of the questions we are trying to answer, and focus too much on the possibilities that we discover. And some may argue that it’s the journey that matters, but for a journey to be a journey there needs to be a destination. The latter is usually some person who doesn't care much about the science behind the insights, but more about their applicability and usefulness. Companies like MAXset LLC may be completely ignorant of that, but this doesn't make it a viable strategy. On the other hand, companies that have a chance of providing true value to the world make the business aspect of the craft their priority.
People like to talk about the V’s of big data, since it is a topic comprehensive to almost everyone, while it also provides insight regarding the benefits of using data science in an organization. Naturally, these benefits are linked to having access to various data streams, usually resulting to massive amounts of data, and usually referred to as big data. Not everyone agrees as to what V’s are valid for characterizing this valuable resource (some say it’s 4, others exclude Veracity, while other include a couple of others too). However, there seems to be a consensus about the last V, namely Value. Nevertheless, whether there is value in big data or not is something that remains to be determined, since not all big data is created equal.
The issue with the V of value is that it’s not inherent in the data. If that were the case, someone could just buy this data (or license it) and then automatically improve his organization’s ROI. The value of big data is actually something that stems from data science’s transformation of this data into insights and/or data products. The same data that would otherwise be gathering dust on some computer cluster somewhere is turned into something people can use and oftentimes monetize, through data science. This is something that takes effort, however, and most importantly, requires a certain quality in the data to begin with.
It’s often useful to think of data as a gold mine. After all, just because it has the potential of yielding large amounts of the valuable metal, it doesn't mean that it will. Perhaps the mine is all dried up, or doesn’t have much gold to begin with. No amount of data science can remedy that. Data science can yield something of value if there is something in the data that could be of value. Many time people forget that, just like the people who buy a gold mine and expect that they’ll be swimming in gold soon enough.
The V’s of big data, on the other hand, are something real and present in every data stream that qualifies as big data. In fact, they are more like characteristics of the data itself, rather than something dependent on data science. However, the V’s themselves may provide some insight as to how much of big data the data at hand is, but not much regarding its potential for an organization. For example, big data of high veracity that’s related to people’s views on a particular commercial product may be completely useless to an organization that is all about some service. The data itself is fine, but doesn't add value to the organization.
So, in order for big data to be of actual value, we need certain things to be in place. First of all, the data needs to be handled by a data science team (or a single data scientist, if he’s competent enough). Moreover, it needs to have some affinity to the organization’s domain. Finally, there needs to be something insightful in the data, which can be surfaced through a data science project, be it through a better understanding of a situation or through a data product that the organization can use.
In conclusion, the fact that some data stream can offer value doesn't necessarily mean that it will. After the data science team has done its part, the stakeholders of the project need to take action, utilizing the insights and/or the data product developed. People sometimes forget that and neglect leveraging the benefits of a data science project to the fullest extent, much like a gold miner may obtain the gold from a mine, but never get around to doing anything useful with it...
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.