If you are looking into a way to hide those ultra-secret blog articles before they hit the web, or those intimate poems of yours, then you may be interested in this Cyber Security methodology called Steganography.
This video I made that was recently published on Safari, takes you through the basics of Steganography and provides you with enough know-how to appreciate it, as well as with some tools you can use to hide your important documents from the unsuspecting eavesdroppers out there. Check it out when you have the chance!
For those of you celebrating Thanksgiving, I just wanted to wish you all a happy Thanksgiving weekend! There are lots of things to be thankful for and studies have shown that gratitude is linked to happiness. So, even if it doesn't always seem like it, this is a holiday for happiness (not just getting some good deals on Black Friday!). What are you thankful for in your life (apart from being into the fields of data science and A.I. of course)?
Sometime in October, one of the Foxy Data Science readers contacted me with a question/suggestion about this topic. As I hadn’t really thought about it much, I decided to look into it and write a blog post about it. I’m not an expert in AEI, but I believe I know enough about A.I. in general and about the business world to venture an insightful view on the matter. At the very least, it can trigger some interesting contemplation in you.
Artificial Emotional Intelligence is a kind of A.I. that emulates the EQ aspects of our mental process. In other words, it is machines that know (to some fairly limited extent) how to exhibit qualities that fall on the intersect between intelligence and emotional maturity, aka EQ. By the way, I do not believe that EQ is more important than IQ, nor that it is any less important. Both are equally useful and neither can be a substitute for SQ (moral intelligence), which is a truly superior kind of intelligence. This, however, could be the topic of another blog post…
Considering the possibility of computers and machines in general, emulating empathy and other traits that are under the EQ umbrella seems a bit futuristic. However, there are already A.I. systems that do just that. Not only that, some of them are quite successful, particularly in psychology roles, even more so than their human counterparts (link to some interesting research by USC).
Could this be the end of EQ-based professions? Probably not, though these people may start considering offering something more than just listening and nodding, if they are to stand out from their AEI competition. Naturally, psychology is so much more than helping someone vent about their issues and showing them that there are more constructive ways to dealing with their problems, something that AEIs may be able to do equally well. That’s why this whole AEI business may be an incentive for these professionals to expand their profession and turn their sessions into something more, something AEIs may not be able to mimic (for the time being). Art therapists, for example, seem to do just that, combining the benefits of conventional psychology with that of an art form (usually music, painting, or dance).
AEIs may be nothing more than a novelty now, but it very poignantly points to the possibility of new forms of A.I. that the original pioneers of the field may not have thought of. Movies like “Her” may be science fiction but for how long? These are interesting things to think about, since A.I. just like natural intelligence, can take many forms, not just the ones that we are more inclined to investigate so far. Surely Deep Learning may still be the most relevant A.I. for data science, but it doesn’t hurt to consider other ways that a machine can benefit the world through A.I. After all, there is much more to life than predicting a hand-written digit with high accuracy. Maybe in the years to come there will be AIs that can look at your handwriting and not only understand it, but also figure out if you are going through a difficult time in your life and require solace and comfort. We definitely live in interesting times!
“I have never let my schooling interfere with my education.” (quote believed to be originally by Mark Twain)
People talk about education a lot these days, particularly in a data science setting. However, we need to discern between actual education and training. Both are essential, but it is the former that holds the most value. The latter is easier and oftentimes faster, but it may not be a good investment of your time if it is not accompanied by the former.
Education is all about mindset development and the ability to feel inspired from knowledge, thereby developing a healthy yearning for it. It is what happens when you teach a child how to play a game, or do a specific task. Although it’s more of a state of mind than anything else, education also has a formal aspect to it which is related to courses, seminars, workshops and talks, geared towards enhancing one’s understanding and comprehension of the topic at hand.
Training on the other hand is more geared towards techniques, methods, and the technical details of the topic taught. This is useful, of course, since every data scientist needs to know all these things. That’s why there are so many data science books and videos out there! However, knowing how to build an SVM or a neural network doesn’t make someone a competent data scientist. In fact, in some cases it doesn’t make him even an employable one.
Perhaps there is a reason why most companies require X years of experience in their recruits. Some things in data science you can only learn through time, by practicing them and by developing an intuition for the data and how it is processed. Although the idea that a data scientist has to have X years of experience to be worthy is something that remains debatable (why X and not Y?), this trend shows that hiring managers can spot a difference between someone who knows data science from a book (or videos) and someone who knows the craft because she has worked the data and has developed a bunch of models, through lots of trials and the inevitable mistakes that ensue.
Education is therefore something that can be attained through experience, not just reading and watching data science material on the Safari platform. The latter can be a great start, but you still need to get your hands dirty and also think about the whole thing, instead of just following recipes, from a data science cookbook. It’s important to know techniques, no doubt, but unless you have developed an understanding that allows you to go beyond these techniques and explore alternative features and alternative models, you may never grow beyond the advanced beginner stage.
Even someone who has spend most of his life in data science can still learn about this field, as it's a) very diverse and wide-spread, and b) always evolving. Personally, I still find that I’m learning new things as I delve deeper into the field and as I converse with other data scientists and A.I. professionals, of all levels. This too can be a form of education, not any less valuable than the education of creating a new data analytics method, or a new data product. The moment someone starts looking down on education and thinks that he knows “enough” is the moment he begins becoming obsolete.
We often tend to forget that at the end of the day, data science is a business process and that data is a business resource. Whether this business is a for-profit or a non-profit is irrelevant. The essence of the whole thing is that data science is not a typical scientific field. In fact, some would argue that it’s not a “real science” at all since it is so attached to the business world. Although these people would probably view this as a defect of the craft, I tend to look at it from a very positive aspect. After all, what constitutes a real science is often a matter of debate.
Sometimes it’s easy to get carried away and focus on data science too much, losing sight of the applications of it. Although this is something somewhat common in an academic setting (particularly in universities that don’t have any ties to the industry), it may happen in companies too. When this happens, it’s usually best to walk away, since data science without any real-world application can be problematic.
Data science and A.I. that’s geared towards data analytics, involve a lot of scientific methodologies, which are quite interesting on their own. This may urge someone to get lost in that aspect of the craft and neglect the application part, particularly the one where these methodologies are employed for solving real-world problems. That’s not to say that doing data science research is bad. Quite the contrary. However, when the research is without any application, focusing too much on the math side of things, it is bound to be a waste of resources (unless you are doing this as part of a research project, e.g. for a research center or a university, in which case this is expected). The reason is that data science is by definition an applied field, much like engineering. Particularly when it is undertaken by a company (e.g. a startup), it needs to be able to deliver something concrete, and more importantly, something useful.
It’s hard to over-estimate the value of this aspect of data science that has to do with the end-user. After all, this person is often the one paying the bills! Also, focusing on the application part of the craft enables something else too: the more practical implementation of the technologies developed and the inception of new methods that are more hands-on and therefore useful. This is one of the reasons that data science has veered away from Statistics, a field which is by its nature more theoretical and more math-y than applied Science. That’s also the main reason why data science involves a lot of programming, oftentimes building things from scratch, even if it’s simple scripts. That’s quite different than using an all-in-one software package, like SAS or SPSS, where the user merely calls functions and does rudimentary data processing.
You can come up with ingenious methods in data science, that would be able to fetch a journal publication or two. However, if these methods don’t add value to an organization, they are not that great, from a holistic standpoint. This is observed in other parts of Science too, e.g. Electromagnetism. Despite the various theoretical aspects of that field, its usefulness is also apparent. People who practice this part of Physics tend to be very practical and oftentimes come up with interesting inventions that add value to their user (e.g. in the case of electromagnets, or power transformers). Data science is not any different.
All the clever mathematics behind a method may be enchanting for the mind, but it’s when this method is put into practice and yields some oftentimes actionable insight when it really becomes meaningful. That’s something worth remembering, since it’s easy to lose sight of the questions we are trying to answer, and focus too much on the possibilities that we discover. And some may argue that it’s the journey that matters, but for a journey to be a journey there needs to be a destination. The latter is usually some person who doesn't care much about the science behind the insights, but more about their applicability and usefulness. Companies like MAXset LLC may be completely ignorant of that, but this doesn't make it a viable strategy. On the other hand, companies that have a chance of providing true value to the world make the business aspect of the craft their priority.
People like to talk about the V’s of big data, since it is a topic comprehensive to almost everyone, while it also provides insight regarding the benefits of using data science in an organization. Naturally, these benefits are linked to having access to various data streams, usually resulting to massive amounts of data, and usually referred to as big data. Not everyone agrees as to what V’s are valid for characterizing this valuable resource (some say it’s 4, others exclude Veracity, while other include a couple of others too). However, there seems to be a consensus about the last V, namely Value. Nevertheless, whether there is value in big data or not is something that remains to be determined, since not all big data is created equal.
The issue with the V of value is that it’s not inherent in the data. If that were the case, someone could just buy this data (or license it) and then automatically improve his organization’s ROI. The value of big data is actually something that stems from data science’s transformation of this data into insights and/or data products. The same data that would otherwise be gathering dust on some computer cluster somewhere is turned into something people can use and oftentimes monetize, through data science. This is something that takes effort, however, and most importantly, requires a certain quality in the data to begin with.
It’s often useful to think of data as a gold mine. After all, just because it has the potential of yielding large amounts of the valuable metal, it doesn't mean that it will. Perhaps the mine is all dried up, or doesn’t have much gold to begin with. No amount of data science can remedy that. Data science can yield something of value if there is something in the data that could be of value. Many time people forget that, just like the people who buy a gold mine and expect that they’ll be swimming in gold soon enough.
The V’s of big data, on the other hand, are something real and present in every data stream that qualifies as big data. In fact, they are more like characteristics of the data itself, rather than something dependent on data science. However, the V’s themselves may provide some insight as to how much of big data the data at hand is, but not much regarding its potential for an organization. For example, big data of high veracity that’s related to people’s views on a particular commercial product may be completely useless to an organization that is all about some service. The data itself is fine, but doesn't add value to the organization.
So, in order for big data to be of actual value, we need certain things to be in place. First of all, the data needs to be handled by a data science team (or a single data scientist, if he’s competent enough). Moreover, it needs to have some affinity to the organization’s domain. Finally, there needs to be something insightful in the data, which can be surfaced through a data science project, be it through a better understanding of a situation or through a data product that the organization can use.
In conclusion, the fact that some data stream can offer value doesn't necessarily mean that it will. After the data science team has done its part, the stakeholders of the project need to take action, utilizing the insights and/or the data product developed. People sometimes forget that and neglect leveraging the benefits of a data science project to the fullest extent, much like a gold miner may obtain the gold from a mine, but never get around to doing anything useful with it...
It is easy to fall into this misconception of believing that in data science we are all solitary people doing our work and interacting only in the workplace and in the social media. Perhaps we are part of some data science team, but still feel we are still on our own when it comes to our relationship with the field. However, this is just one of many possibilities in how we relate to the data science world, and it is definitely not the best one.
Being part of a community in data science is not only possible but also necessary. Of course just networking with other data scientists may not be enough, but it is often a good starting point. This is particularly important towards the beginning of one’s career. After all, not even the best data science books can give someone solace in times of difficulty or doubt. That’s when having a good mentor comes in very handy. After all, even if that mentor is a bit aloof and preoccupied with his own stuff, he tends to have a genuine interest in your career and is motivated to help you out, at least to some extent. This can be another step towards becoming part of a community of data science professionals.
Make no mistake, however. Neither the mentor, nor anyone else is going to fight your battles for you. The other data scientists, be it professional acquaintances, mentors, or teammates, have their own battles to tackle. However, they may be able to offer you advice or help you gain insight to solutions that you couldn't think of by yourself, especially during the time you are immersed in the problems you are tackling.
Finding a physical community may not always be possible. Not all cities are as advanced as the ones where the field thrives and has a cohorts bustling with data science events and activities. However, data scientists are out there who are also in need of a community, so it’s only a matter of time before you find them. Perhaps you’ll “meet” them online, through some social network or a data science forum. Maybe you’ll encounter them in a data science conference, or a webinar. Bottom line, if you are open to finding a community of data scientists, the opportunities to do so will manifest, sooner or later.
Being part of a data science community is not only to help you in difficult times though. It’s also a great accelerator for developing yourself as a data scientist through being exposed to new trends, novel approaches to known problems, and most importantly, to unknown problems that you’d probably not encounter on your own, even if you work in a data-driven company. All that is bound to foster in you the knowledge and know-how you need to advance to the next level, whatever that level is for you. At the same time, it can help you maintain your enthusiasm for data science, and perhaps even make you more zestful about the field. After all, it is usually the people who are passionate about something that make the most progress in it and are also consistent in do so. Data science is not any different in that respect.
Everyone talks about data science these days, as well as A.I., since the value these disciplines can add to an organization is being verified more and more. However, there are organizations out there that are not ready yet to make use of data science, even if they have ads for data scientists in various job forums. Before applying to places like that, you may want to answer this question for yourself: is this organization I’m interested in data science ready?
Just because an organization has seen value in a data science proof-of-concept (PoC) project, it doesn't make it ready to employ and utilize data science professionals. First of all, it has to have a solid leadership team, one that at the very least has a CTO who has worked with data scientists, though additional roles like that of a CIO and a CDO, would also be useful. If the C-level team of an organization hasn't worked with data scientists and doesn't have a clear idea of what data science can and what it cannot do, then this is a red flag.
In addition, an organization that has access to a variety of data streams, even if these don’t qualify for “big data” status, is essential for making it data science ready. If all its data is in Excel spreadsheets and SQL data bases, perhaps they need a data analyst, a business intelligence professional, or a statistician. If they do get a data scientist, they won’t be able to do much more with her, since she will not have enough to work with and provide sufficient value, that can translate to a positive ROI for her group. That data scientist is better off working somewhere else where they make better use of her skills and her mindset.
Moreover, a data science ready organization has realistic expectations and a good plan about how to utilize its data resources. Just because it has access to good data, it doesn’t mean that it can get value from it, even if it employs a group of very talented data scientists. It also need to know what it is going to do with it, what data products it can create, how it is going to leverage the insights the data science team provides, etc. All that is not going to take place in the next quarter necessarily, especially if the organization is new to data science. So, expecting some ground-breaking results within the next 3 months would be naive and financially irresponsible. An investment like this is bound to take some time before it yields dividends and if the organization is not aware of this, then it may not be ready just yet.
Beyond these signs, there are other, more specialized ones that are more domain-specific or data-specific. However, mentioning them here would make the article so long that you’ll need to run some text analytics system on it to derive all the information from it! So, let’s just say that there are other thing that can be good predictors as to whether an organization is worth your time as a data scientist, or in the case you are a hiring manager of such an organization, whether you should start recruiting data scientists at this point. After all, data science is a long game, so there is no point rushing into it. It’s more beneficial if it is conducted in an environment that is conducive to it, and capable of fostering a congruent and efficient team, poised to add value to whatever data it utilizes.
People like to argue, especially about things they can reason with. However, just because you can justify that your view has merit, giving some practical examples or through logical reasoning, this doesn't make alternative views invalid. If there are several programming languages in data science, perhaps an oversimplification like “X is the best language for data science because Y” doesn't hold much water. Let’s examine why.
Although it is possible to rule out certain languages (e.g. Assembly or C) as optimal for data science, this doesn't mean that the problem has a clear-cut solution. Also, the assumption that a single programming language can cover all the use cases of a data science professional is a quite unjustifiable one. Some data scientists use two or three programming languages, sometimes in combination, getting the best of each, for optimal overall performance.
Also, data science is all about solving a business problem in a scientific manner. Just because say Dr. Smith prefers to use language X over Y, it doesn't mean that you have to follow her example. Maybe she has used language X during her PhD and didn't have time to learn another language, or she attained mastery of that language, so she feels more comfortable doing her data science work with that. She may be a successful data scientist but following her programming habits won’t make you a great data scientist necessarily.
Moreover, with new languages and new packages in the existing languages coming about all the time, which language is best is like the best performing basketball team. Definitely not something particularly stable! Besides, it’s often the case that a particular project may requite special handling, so what is a top-performer now, may not be the best option for that particular case.
In addition, the almost religious attitude towards programming languages that many people have (not just data scientists) is by itself problematic. If a potential employer sees you arguing about how your language of choice is the best and that you are not open to consider alternatives, he may not be so eager to hire you, since this kind of attitude creates disharmony and difficulty in collaboration among the members of a team. Besides, in most companies nowadays, they rarely ask for a specific language in the candidate requirements. As long as you can do the task that’s required of you, they don’t really care much what your programming background is. Of course companies that have already invested in a particular language and have all their code in that language may not be so flexible, but that shouldn't be the principle factor in your decision about which language you learn.
Finally, when it comes to deep learning, many modern frameworks, like Apache’s MXNet, have APIs for a variety of programming language. So if your A.I. guru friend tries to convince you that you should learn language X because that’s the best deep learning language, take that suggestion with a pinch of salt!
The important thing is for whatever language you decide to learn for data science, you make sure that you learn it well. Familiarize yourself with its packages, use it to solve various problems, and learn the best strategies for debugging code written in that language. If you do that, you can still make good use of it for your data science projects, even if the majority of people prefer this or the other language instead.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.