After investigating this topic quite a bit, as I was looking into A.I. stuff, I decided to create a video on it. To make it more complete, I included other methods too, such as Statistics-based and heuristics-based ones. Despite the excessive amount of content I put together into this project (the script was over 4000 words), I managed to keep the video at a manageable length (a bit less than half an hour). Check it out on Safari when you have some time!
Ever since social media (SM) became a mainstream option for spending one’s time on the web, it has started to disrupt the way we view information and even knowledge to some extent. Even though there is no doubt that SM offer substantial benefits in advertising and branding, there is little they can offer when it comes to actually learning something. Here is why.
Even though some articles can be thought-provoking, but consuming information to satisfy your curiosity and actually assimilating it are two different things. This is particularly true when it comes to a technical field, like data science, where being informed about something is barely enough to have an opinion on the topic, let alone do something useful with it. Many people who roam the SM in search of mentors don’t realize that. They tend to forget that following someone in an attempt to learn from them is the equivalent of body-building by just hanging out at the lobby of a gym. Yet, they do it anyway because it’s easy and it doesn’t cost them anything (other than some time, assuming that they read the stuff their leaders post on the SM).
If you really want to learn something, especially something complex and multifaceted like data science, you need to get your hands dirty and you have to break a sweat. The various things someone posts on the SM aren’t going to help much. There is a reason why books and videos on the subject sell, even if there is abundant information on the web. Also, in my experience, if a platform doesn’t charge you for the “products” it offers to you, that’s because you are the product! SM are designed with that in mind. Of course, some of them may be worth the time you spend on them since they can be a source of a diverse array of views on a topic (hopefully from different perspectives), but that’s not the same as applicable knowledge. If you want to hone your data science skills you need something you can rely on, not something someone types on the SM while enjoying their morning coffee, to pass the time.
So, what can you do, instead of following someone on the SM? There are various strategies, each with its own sets of benefits. Ideally, you would do a combination of them to maximize your learning opportunities. The main ones of these strategies are:
What are your thoughts on the matter? How do you learn data science?
For the past few months I've been working on a tutorial on the data modeling part of the data science process. Recently I've finished it and as of 2 weeks ago, it available online at the Safari portal. Although this tutorial is mainly for newcomers to the field, everyone can benefit from it, particularly people who are interested in not just the technical aspects but also on the concepts behind them and how it all relates to the other parts of the pipeline. Enjoy!
So, when I was in the US recently, I interviewed with some people from a Podcast geared towards SW engineering and data science topics (with some A.I. stuff too). This interview, which constitutes a whole episode on that podcast, covered various topics related to both data science as a field and some specific aspects of it that can help someone embrace it as a practitioner / professional in it. The podcast episode is now online and freely available. Although it's by no means a thorough coverage of the field of data science, or even the topic of the mindset related to it, it's a good introduction to it, engaging enough to keep your commute somewhat more interesting than listening to the radio. Enjoy!
“I have never let my schooling interfere with my education.” (quote believed to be originally by Mark Twain)
People talk about education a lot these days, particularly in a data science setting. However, we need to discern between actual education and training. Both are essential, but it is the former that holds the most value. The latter is easier and oftentimes faster, but it may not be a good investment of your time if it is not accompanied by the former.
Education is all about mindset development and the ability to feel inspired from knowledge, thereby developing a healthy yearning for it. It is what happens when you teach a child how to play a game, or do a specific task. Although it’s more of a state of mind than anything else, education also has a formal aspect to it which is related to courses, seminars, workshops and talks, geared towards enhancing one’s understanding and comprehension of the topic at hand.
Training on the other hand is more geared towards techniques, methods, and the technical details of the topic taught. This is useful, of course, since every data scientist needs to know all these things. That’s why there are so many data science books and videos out there! However, knowing how to build an SVM or a neural network doesn’t make someone a competent data scientist. In fact, in some cases it doesn’t make him even an employable one.
Perhaps there is a reason why most companies require X years of experience in their recruits. Some things in data science you can only learn through time, by practicing them and by developing an intuition for the data and how it is processed. Although the idea that a data scientist has to have X years of experience to be worthy is something that remains debatable (why X and not Y?), this trend shows that hiring managers can spot a difference between someone who knows data science from a book (or videos) and someone who knows the craft because she has worked the data and has developed a bunch of models, through lots of trials and the inevitable mistakes that ensue.
Education is therefore something that can be attained through experience, not just reading and watching data science material on the Safari platform. The latter can be a great start, but you still need to get your hands dirty and also think about the whole thing, instead of just following recipes, from a data science cookbook. It’s important to know techniques, no doubt, but unless you have developed an understanding that allows you to go beyond these techniques and explore alternative features and alternative models, you may never grow beyond the advanced beginner stage.
Even someone who has spend most of his life in data science can still learn about this field, as it's a) very diverse and wide-spread, and b) always evolving. Personally, I still find that I’m learning new things as I delve deeper into the field and as I converse with other data scientists and A.I. professionals, of all levels. This too can be a form of education, not any less valuable than the education of creating a new data analytics method, or a new data product. The moment someone starts looking down on education and thinks that he knows “enough” is the moment he begins becoming obsolete.
Just wanted to clarify something about the videos I post on Safari Books Online. Each one of these videos is not an audio-visual version of a book on the topic, but more of an overview of it.
I have specific requirements about the duration, so it is infeasible to go into much depth on any one of the topics, especially those topics that are more general. So, if you decide to watch a video of mine, please manage your expectations accordingly. None of these videos will make you an expert or provide you with the specialized knowledge that you'd find in a book. However, they can be a quick and effective way to get the basics down so that when you read a book on that topic, you'll have a sense of perspective and be able to focus on the details, since you'll have a firm grasp of the key concepts.
So, if you want to go into depth on any given topic, I'd recommend to either read a book or two, or do a course on it. The videos have a more supportive role and it is more useful if they are seen as such.
Recently I decided to make another video on cyber security, a topic I'm quite fond of. This time, I tackled Cryptography, which is a truly intriguing field independent but similar in some ways to data science. So, as of today this video is available on Safari (you need to have subscription to the portal in order to view the whole of it). Now, it's just an introductory video, so don't expect it to make you an expert in this. However, after viewing it, you'll have a solid understanding of what Cryptography is, how it is useful, what methods it includes, and some practical tips on how you can make use of it in your everyday life. Enjoy!
Being part of a tech start-up is a more intimate kind of work, since you are more involved in the decisions of the company, while at the same time collaboration is more direct and sincere. Of course there are still politics, but they are significantly less impactful in your career as a tech entrepreneur. Because if you are part of the founding team of a start-up, you are an entrepreneur, period. So, why would someone leave such a company, esp. if it’s still in its growing phase? There are many reasons and they greatly depend on the company and the team dynamics of it. Here is my story, in a company called MAXset.
MAXset started as an NLP company with the mission to automate the structuring of text data, for any given corpus. Originally it was decided to use the state-of-the-art programming paradigm (functional programming) and a custom-built framework for knowledge representation. Basically, the goal was efficiency and innovation, so as to facilitate text analytics, particularly related to data science and business intelligence. Great idea, yet ideas that are good are a dime a dozen. Implementing this idea was a whole different ball game, one that required a lot of sacrifices and dirty compromises.
MAXset's Framework Implementation
Implementing a novel framework like that wasn’t easy. All the conventional text analytics systems were insufficient and embarrassingly suboptimal. Eventually we decided we had to build everything from scratch. This was great for me, since prototyping in a functional language was fairly easy and fast, while at the same time we were building a unique code base that could be featured as IP for the company, an asset of sorts. We even examined the possibility of filing a patent, at one point.
However, even though all the scripts I developed were fine, they were not used in practice since the framework was poorly defined and was changing constantly. It was like trying to optimize a fitness function that was different every time you looked at it. Also, at one point a decision was made to use a certain Python’s package, since the developer we had hired was not comfortable with using a functional language like Julia (even though that was a condition for hiring him). Of course, if you are hiring someone without giving them a salary, you have to make compromises like that, otherwise things will never take off.
Other Issues of MAXset
Technology and ideas aside, MAXset had other serious issues, that were highly incompatible with what investors would call a promising start-up. For example, there was no clear product definition, no clear market / audience, and no clear strategy for how this great idea would eventually make money. Investors may be very keen on spreadsheets and plots, but they are also intelligent enough to see beyond these and tend to have a pretty good BS detector. After all, there are so many other options for putting their hard-earned cash, especially in a tech city like Seattle. So, needless to say the idea never got the anticipated traction in the angel investment and VC community.
Also, the fact that there were no regular meeting locations (usually in the study rooms of libraries, or sometimes in coffee shops), didn't help the situation either. Apart from the obvious issue of lack of privacy, the logistics of the meetings were a constant problem. One of the team members had a good contact in a shared office space and he was certain he could get a really good deal for an office there. Yet, this never materialized for various reasons.
Regarding the team, we were originally 4 people, each one having a sizeable part of the company’s equity. There were also people having advisory roles, like a very talented cloud systems expert who I personally looked up to. Naturally you don’t expect everyone who is in the company in its first stages to linger, since not everyone is that patient, even if they are vested in the company’s success. Even one of Apple’s founders left within the first couple of years, leaving Steve Jobs and Steve Wosniac the only major stakeholders of the start-up they had all created. However, if most of the founders leave, that’s not a good sign. That’s what happened in MAXset. I was the last original founder other that the CEO who was around, when I sent my resignation letter. Perhaps I was less experienced than the other two gentlemen who made the same choice months ago. Or maybe I was too optimistic. Whatever the case, I eventually had to go, since it was no longer cost-effective for me to stay there.
Innovation Wasn't That Great
As for the innovation factor, MAXset prides itself to be an A.I. company, employing fringe data science methods for NLP applications. However, upon closer look, if you manage to see beyond the convoluted framework of its main product, it is merely a knowledge representation system. Also, prior to it busking for investors' money, everyone there was oblivious regarding the fact that there are several other companies out there that do the same thing, though with a different technology. Perhaps the technology in MAXset is unique, but this does not make the product innovative necessarily. Needless to say, most investors who flirted with the idea of investing in the company didn't take long to figure that out and keep their distance from MAXset.
Disrespect Towards People Outside the Company
It's one thing not liking someone because they are a competitor, or a former employee, and it's quite another dissing them. MAXset was notorious for the latter. Also, even people who would be considered potential collaborators, people who had a very positive attitude toward the company and wanted to help, were often treated with disrespect. For example, there was a marketing guy who had an appointment with the CEO one day at a local Starbucks. The CEO had double-booked himself that morning so he didn't show up for the meeting with that guy. He didn't even bother to reschedule or let him know, so that guy called the CEO asking him where he was. The CEO apologized of course, but at that moment I felt really embarrassed for his sake.
It is quite normal in start-ups to have to work without getting paid much. However, you would expect that the compensation would reflect the amount of work you've put and how vested you are in the company. That wasn't the case with MAXset. During one of the main payments, the compensation was hugely disproportionate to the amount of work or time invested in the company. This wasn't just for me, as there was another person too who was paid much less than he had worked. Also, another person got more than either one of us, even though he had been recruited recently. In general, the cash-flows in the company were managed so poorly that I wouldn't be surprised if there is an embezzlement fiasco in the news about this company (if it doesn't file for Chapter 11 in the meantime).
Start-ups are evolving creatures, so it is natural to change and adapt to circumstances, in order to survive and prosper. However, this kind of change tends to be gradual and in relation to some external factor that needs to be reckoned with. MAXset would change in a very whimsical fashion, shifting programming platforms, data analytics frameworks, and even product objectives like most people would change their clothes. This kind of work is not conducive to sustainable professional development, in my view, and highly incongruent to my values as a tech professional. Although it is good to be flexible, if the requirements of a system change bi-weekly, it is really hard to produce anything worthwhile. Also, the lack of any sort of solid plan about the company's strategy is not a good sign either.
Although I still feel like this whole gig was a waste of my time, time I could have spend creating more videos, or engaging in other data science projects, I find that even from this kind of experience it is possible to learn and hone one’s skills, while at the same time broaden one's perspective. There is a very nice Greek saying that goes “he who sits and hasn’t sat uncomfortably, doesn't sit comfortably.” Perhaps some people need to undergo through these harsh experiences in order to appreciate other companies. These companies may be less innovative and perhaps less exciting than a Seattle start-up, yet they are more viable and more useful to the world, since they have a definite objective and a clear plan on how to achieve it. So, I focus on that part of my experience and sincerely hope that if you pursue employment in a tech start-up, you never work in a place like MAXset.
A few weeks ago I created a video on DB frameworks, from a data science perspective. Somehow it didn't get into the production pipeline, but now it surfaced and is available on the Safari platform. You can view it here. Enjoy!
Recently I had a nice chat with a fellow data scientist who works at LinkedIn. After bouncing some ideas off him, I decided to make another video, based on a topic of mutual interest, partly for demonstrating to him how straight-forward the process is, once you have done the research on the topic. This video is now published on Safari here (subscription required). Enjoy!
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.