“I have never let my schooling interfere with my education.” (quote believed to be originally by Mark Twain)
People talk about education a lot these days, particularly in a data science setting. However, we need to discern between actual education and training. Both are essential, but it is the former that holds the most value. The latter is easier and oftentimes faster, but it may not be a good investment of your time if it is not accompanied by the former.
Education is all about mindset development and the ability to feel inspired from knowledge, thereby developing a healthy yearning for it. It is what happens when you teach a child how to play a game, or do a specific task. Although it’s more of a state of mind than anything else, education also has a formal aspect to it which is related to courses, seminars, workshops and talks, geared towards enhancing one’s understanding and comprehension of the topic at hand.
Training on the other hand is more geared towards techniques, methods, and the technical details of the topic taught. This is useful, of course, since every data scientist needs to know all these things. That’s why there are so many data science books and videos out there! However, knowing how to build an SVM or a neural network doesn’t make someone a competent data scientist. In fact, in some cases it doesn’t make him even an employable one.
Perhaps there is a reason why most companies require X years of experience in their recruits. Some things in data science you can only learn through time, by practicing them and by developing an intuition for the data and how it is processed. Although the idea that a data scientist has to have X years of experience to be worthy is something that remains debatable (why X and not Y?), this trend shows that hiring managers can spot a difference between someone who knows data science from a book (or videos) and someone who knows the craft because she has worked the data and has developed a bunch of models, through lots of trials and the inevitable mistakes that ensue.
Education is therefore something that can be attained through experience, not just reading and watching data science material on the Safari platform. The latter can be a great start, but you still need to get your hands dirty and also think about the whole thing, instead of just following recipes, from a data science cookbook. It’s important to know techniques, no doubt, but unless you have developed an understanding that allows you to go beyond these techniques and explore alternative features and alternative models, you may never grow beyond the advanced beginner stage.
Even someone who has spend most of his life in data science can still learn about this field, as it's a) very diverse and wide-spread, and b) always evolving. Personally, I still find that I’m learning new things as I delve deeper into the field and as I converse with other data scientists and A.I. professionals, of all levels. This too can be a form of education, not any less valuable than the education of creating a new data analytics method, or a new data product. The moment someone starts looking down on education and thinks that he knows “enough” is the moment he begins becoming obsolete.
Just wanted to clarify something about the videos I post on Safari Books Online. Each one of these videos is not an audio-visual version of a book on the topic, but more of an overview of it.
I have specific requirements about the duration, so it is infeasible to go into much depth on any one of the topics, especially those topics that are more general. So, if you decide to watch a video of mine, please manage your expectations accordingly. None of these videos will make you an expert or provide you with the specialized knowledge that you'd find in a book. However, they can be a quick and effective way to get the basics down so that when you read a book on that topic, you'll have a sense of perspective and be able to focus on the details, since you'll have a firm grasp of the key concepts.
So, if you want to go into depth on any given topic, I'd recommend to either read a book or two, or do a course on it. The videos have a more supportive role and it is more useful if they are seen as such.
Recently I decided to make another video on cyber security, a topic I'm quite fond of. This time, I tackled Cryptography, which is a truly intriguing field independent but similar in some ways to data science. So, as of today this video is available on Safari (you need to have subscription to the portal in order to view the whole of it). Now, it's just an introductory video, so don't expect it to make you an expert in this. However, after viewing it, you'll have a solid understanding of what Cryptography is, how it is useful, what methods it includes, and some practical tips on how you can make use of it in your everyday life. Enjoy!
Being part of a tech start-up is a more intimate kind of work, since you are more involved in the decisions of the company, while at the same time collaboration is more direct and sincere. Of course there are still politics, but they are significantly less impactful in your career as a tech entrepreneur. Because if you are part of the founding team of a start-up, you are an entrepreneur, period. So, why would someone leave such a company, esp. if it’s still in its growing phase? There are many reasons and they greatly depend on the company and the team dynamics of it. Here is my story, in a company called MAXset.
MAXset started as an NLP company with the mission to automate the structuring of text data, for any given corpus. Originally it was decided to use the state-of-the-art programming paradigm (functional programming) and a custom-built framework for knowledge representation. Basically, the goal was efficiency and innovation, so as to facilitate text analytics, particularly related to data science and business intelligence. Great idea, yet ideas that are good are a dime a dozen. Implementing this idea was a whole different ball game, one that required a lot of sacrifices and dirty compromises.
MAXset's Framework Implementation
Implementing a novel framework like that wasn’t easy. All the conventional text analytics systems were insufficient and embarrassingly suboptimal. Eventually we decided we had to build everything from scratch. This was great for me, since prototyping in a functional language was fairly easy and fast, while at the same time we were building a unique code base that could be featured as IP for the company, an asset of sorts. We even examined the possibility of filing a patent, at one point.
However, even though all the scripts I developed were fine, they were not used in practice since the framework was poorly defined and was changing constantly. It was like trying to optimize a fitness function that was different every time you looked at it. Also, at one point a decision was made to use a certain Python’s package, since the developer we had hired was not comfortable with using a functional language like Julia (even though that was a condition for hiring him). Of course, if you are hiring someone without giving them a salary, you have to make compromises like that, otherwise things will never take off.
Other Issues of MAXset
Technology and ideas aside, MAXset had other serious issues, that were highly incompatible with what investors would call a promising start-up. For example, there was no clear product definition, no clear market / audience, and no clear strategy for how this great idea would eventually make money. Investors may be very keen on spreadsheets and plots, but they are also intelligent enough to see beyond these and tend to have a pretty good BS detector. After all, there are so many other options for putting their hard-earned cash, especially in a tech city like Seattle. So, needless to say the idea never got the anticipated traction in the angel investment and VC community.
Also, the fact that there were no regular meeting locations (usually in the study rooms of libraries, or sometimes in coffee shops), didn't help the situation either. Apart from the obvious issue of lack of privacy, the logistics of the meetings were a constant problem. One of the team members had a good contact in a shared office space and he was certain he could get a really good deal for an office there. Yet, this never materialized for various reasons.
Regarding the team, we were originally 4 people, each one having a sizeable part of the company’s equity. There were also people having advisory roles, like a very talented cloud systems expert who I personally looked up to. Naturally you don’t expect everyone who is in the company in its first stages to linger, since not everyone is that patient, even if they are vested in the company’s success. Even one of Apple’s founders left within the first couple of years, leaving Steve Jobs and Steve Wosniac the only major stakeholders of the start-up they had all created. However, if most of the founders leave, that’s not a good sign. That’s what happened in MAXset. I was the last original founder other that the CEO who was around, when I sent my resignation letter. Perhaps I was less experienced than the other two gentlemen who made the same choice months ago. Or maybe I was too optimistic. Whatever the case, I eventually had to go, since it was no longer cost-effective for me to stay there.
Innovation Wasn't That Great
As for the innovation factor, MAXset prides itself to be an A.I. company, employing fringe data science methods for NLP applications. However, upon closer look, if you manage to see beyond the convoluted framework of its main product, it is merely a knowledge representation system. Also, prior to it busking for investors' money, everyone there was oblivious regarding the fact that there are several other companies out there that do the same thing, though with a different technology. Perhaps the technology in MAXset is unique, but this does not make the product innovative necessarily. Needless to say, most investors who flirted with the idea of investing in the company didn't take long to figure that out and keep their distance from MAXset.
Disrespect Towards People Outside the Company
It's one thing not liking someone because they are a competitor, or a former employee, and it's quite another dissing them. MAXset was notorious for the latter. Also, even people who would be considered potential collaborators, people who had a very positive attitude toward the company and wanted to help, were often treated with disrespect. For example, there was a marketing guy who had an appointment with the CEO one day at a local Starbucks. The CEO had double-booked himself that morning so he didn't show up for the meeting with that guy. He didn't even bother to reschedule or let him know, so that guy called the CEO asking him where he was. The CEO apologized of course, but at that moment I felt really embarrassed for his sake.
It is quite normal in start-ups to have to work without getting paid much. However, you would expect that the compensation would reflect the amount of work you've put and how vested you are in the company. That wasn't the case with MAXset. During one of the main payments, the compensation was hugely disproportionate to the amount of work or time invested in the company. This wasn't just for me, as there was another person too who was paid much less than he had worked. Also, another person got more than either one of us, even though he had been recruited recently. In general, the cash-flows in the company were managed so poorly that I wouldn't be surprised if there is an embezzlement fiasco in the news about this company (if it doesn't file for Chapter 11 in the meantime).
Start-ups are evolving creatures, so it is natural to change and adapt to circumstances, in order to survive and prosper. However, this kind of change tends to be gradual and in relation to some external factor that needs to be reckoned with. MAXset would change in a very whimsical fashion, shifting programming platforms, data analytics frameworks, and even product objectives like most people would change their clothes. This kind of work is not conducive to sustainable professional development, in my view, and highly incongruent to my values as a tech professional. Although it is good to be flexible, if the requirements of a system change bi-weekly, it is really hard to produce anything worthwhile. Also, the lack of any sort of solid plan about the company's strategy is not a good sign either.
Although I still feel like this whole gig was a waste of my time, time I could have spend creating more videos, or engaging in other data science projects, I find that even from this kind of experience it is possible to learn and hone one’s skills, while at the same time broaden one's perspective. There is a very nice Greek saying that goes “he who sits and hasn’t sat uncomfortably, doesn't sit comfortably.” Perhaps some people need to undergo through these harsh experiences in order to appreciate other companies. These companies may be less innovative and perhaps less exciting than a Seattle start-up, yet they are more viable and more useful to the world, since they have a definite objective and a clear plan on how to achieve it. So, I focus on that part of my experience and sincerely hope that if you pursue employment in a tech start-up, you never work in a place like MAXset.
A few weeks ago I created a video on DB frameworks, from a data science perspective. Somehow it didn't get into the production pipeline, but now it surfaced and is available on the Safari platform. You can view it here. Enjoy!
Recently I had a nice chat with a fellow data scientist who works at LinkedIn. After bouncing some ideas off him, I decided to make another video, based on a topic of mutual interest, partly for demonstrating to him how straight-forward the process is, once you have done the research on the topic. This video is now published on Safari here (subscription required). Enjoy!
With so many options for publishing videos online nowadays, someone may wonder “why would I want to go through hoops to get something published on Safari?” This is a valid question, and it’s equivalent to asking “why should I get published through a publishing house when I can self-publish on Amazon, or some other platform?” Although there is merit in self-publishing, there are two main issues with it: quality assurance (QA), and marketing.
Before I get into the details of all this, let me inform you that I've been down the self-publishing path and it wasn't as glamorous as people make it out to be. I published not just 1, but 3 e-books, created a website for them, and even hired people to help promote them. A few years later the only real benefit I've seen through all this was the experience I’d gained through the whole process. So, if this is your sole motivation, that’s fine. If you however want to make enough money to make the whole thing worthwhile, then there are better options out there.
Getting published on Safari (or any other professional video platform) ensures a certain quality standard. Of course not all videos there are great, but at least you won’t find many that are a total waste of time or riddled with inaccurate information, like you would on YouTube, for example. The reason is that for a video to get on the Safari site, it first goes through some QA process. If there is an issue about it, you will need to revise it. This doesn't happen often, if you know what you are doing, but it’s a good fail-safe.
Marketing is another matter where platforms like Safari excel. If something is on Safari, people will see it and may watch/read it. If you have a video on YouTube, few people will notice it and even fewer will watch the whole thing. Especially now with the new strict policies that YouTube has adopted, content creators have it hard. Unless you create a lot of content regularly, your exposure on YouTube is bound to be very limited. Of course, if you create a lot of content, the quality is bound to drop, but YouTube doesn't seem to care much about this. As long as they get lots of people watching the videos they host, and keep the ad money rolling, they are fine. And if your vid gets flagged because some oversensitive person finds it problematic for whatever reason, that’s your problem, not YouTube’s.
I’m not trying to say that YouTube is bad. Every video hosting platform has its use cases. However, for quality content that you expect to at least pay for the effort you've put into creating it, a more professional platform like Safari makes more sense. You can create a promo video and put it on YouTube, or Vimeo. But if you spend a week creating a data science or A.I. video, you are better off publishing it through proper channels, like Safari.
To give you an idea of the profits that a Safari video can yield, last year I published a book. I spent about 9 months writing it and editing it. It was considered successful and helped me get some traction in the field, while also promote the programming language it was about. One of the videos I created and published for Safari yielded about the same revenue. It had taken me about a week to create it and edit it, while I also enjoyed it more, since it felt more like a creative endeavor, rather than work. Since I don’t have a huge following, I doubt that the same video could yield the same revenue if it were published on YouTube or some other open platform.
If you find that you have content you wish to share with the world, in a professional manner, I’d recommend you consider Safari as an option. If you find that it entails too much work and you are unsure as to where you need to start, you can always go through a publisher, like Technics Publications, like I did. As Nelson Mandela eloquently said, “it always seems impossible until it's done.”
Recently someone on LI recommend that I bring more JOY to the world instead of merely complain about it (I wasn’t complaining but apparently she thought I were!). I’m not an entertainer, nor a psychology expert, but perhaps you don’t need to be in these lines of work in order to bring joy to the people you interact with. I thought about it and decided that perhaps data science could be a source of joy to other people. However, for this to happen, it needs first and foremost to be joyful to you.
Deriving joy from a challenging and oftentimes frustrating procedure such as a data science project is not easy. In fact, many people can’t stand that largest part of the work such a project entails. However, with the right mindset, even the more tedious aspects of the work can be enjoyable (i.e. be conducive to joy). So, what is this mindset that turns boredom to beauty and drudgery to delight?
Although there is no magic formula for making things more enjoyable in data science, if you have the attitude of the data science amateur when you approach a problem, your chances of enjoying it are better. This doesn’t mean being sloppy and checking Stackoverflow or Quora every 5 minutes. The amateur’s attitude is, as the word amateur implies, an attitude based on love for what you are doing. The amateur doesn’t care if they get paid for their work. They may even never get paid, but they do it anyway because they find it fulfilling. It’s like a hobby for them.
However, a data scientist still needs to be professional about her work. There are deadlines, meetings with stakeholders, and of course debugging scripts that throw errors at the worst possible time! Handling these matters takes professionalism, but it doesn’t need to be a mechanical and draining process. If you see part of your work as a data scientist (even the debugging stage) as a learning experience and have what is known in Zen as the beginner’s mind, you are bound to find everything a bit more enjoyable. It’s the joy that comes from detachment and lack of rigid expectations from your work, something that every professional knows.
Remembering all this, especially on a Monday morning, is not as straight-forward as it may seem when you think of it. However, being joyful is a matter of perspective and at the end of the day a matter of habit. Aristotle famously said that “virtue is a matter of habit” and some could argue that joy is a kind of virtue. Maybe not something you would put on your resume or talk about in an interview, but definitely something worth keeping in mind in those long mornings when you may be tempted to question your career choices. After all, if you could be joyful about data science as a field once, you can be joyful about data science work too. And if you still feel that you need some help to get your enthusiasm flowing, invigorating a joyful mindset, you can always read my book Data Science – Mindset, Methodologies, and Misconceptions. :-)
When people nowadays talk about A.I., they usually refer to the deep learning methodology and other ANN frameworks. This is great, considering that ANNs were almost considered a dead-end once, due to the inability of technology to help them exhibit their potential. Yet, now computers are more powerful than ever and GPUs are commonplace as add-ons, enabling deep learning and other ANN-based system to function at greater scales. However, there are some other A.I. methodologies that are equally valid and actually predate ANNs. These I refer to as the “hipsters of A.I.” since they were part of the A.I. field before A.I. was cool.
The A.I. hipster methodologies are A.I. frameworks that are not ANN-related. These are systems like Fuzzy Logic (FL), which came about years before ANNs reached a level of development that made them worth using in machine learning. FL systems were used heavily in data analytics, while they were even implemented in hardware. At one point, researchers even experimented with a hybrid system that is part FL and part ANN (this was called ANFIS and was in essence an Artificial Neural network that optimized the membership functions of a Fuzzy Inference System).
Another hipster methodology is the family of optimization methods. These are systems like Genetic Algorithms, Simulated Annealing, and Particle Swarm Optimization (as well as its many variants). Although the scope of these A.I. fields is limited to finding optima of particular functions (aka fitness functions), their usefulness covers a variety of fields. Even dimensionality reduction processes sometimes make use of GAs or some other optimization tool. Note that these system are not the same as the analytical optimization methods known from Calculus, since they tackle very complex search spaces, with oftentimes dozens of variables, and use a stochastic process in the back-end.
If there is one take-away from these hipster A.I. systems it is that there is more than meets the eye when it comes to artificial intelligence. That’s not to say that deep learning systems are not worth your while, but it’s good to keep an open mind about other A.I. systems that may not be as popular today, but may have played (and still play) an important role in the evolution of the field.
Also, having a solid understanding of A.I. through its various methodologies, allows us to be able to think forward in a creative way. Instead of merely trying to extend the methodologies we know, we may come up with new ones, enriching A.I. in ways that we wouldn't be able to fathom if our understanding were limited to a single A.I. framework. Isn't that what A.I. is about, finding novel ways to solve problems, leveraging clever heuristics and imaginative architectures?
So, my latest video is now available online at the Safari portal. I didn't post this yesterday, as I had already published an article for the blog. As I have been writing more articles that I can get published on DSP, I had to resort to this blog again. Also, I am not currently working on a book, so I have more time for writing for other channels (e.g. this blog, beBee, etc.).
Anyway, if you have a subscription for Safari, check out my video. I’m certain it would be worth your time. As always, I’m open to feedback via the “contact” page of this blog.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.