When I started my life-long journey in the world of data analytics (which morphed into Data Science and modern AI-based predictive analytics systems), it was through academia. I even did a post-doc at one point, which although paid the bills, it was the worst-paying job I’ve ever had during my career. Yet, as long as there were things to learn and challenges to overcome, I was willing to see past that.
As I matured, I realized that the only thing that mattered in that strange world, if you were to have a career in it, was publications. As I enjoyed writing, I gave it a shot. However, the needlessly long waiting time for any feedback, the low quality of that feedback, and the overall time it took for something to get published, put me off eventually. After that, I decided to pursue a career, any career, in the real-world, as at least here there is more meritocracy and smaller waiting times, enabling a much faster growth.
A few months ago, I was approached by a big-time academic publishing house for an article in their encyclopedia of big data. I was surprised to see that after so many years they had come to be more progressive about the whole publications related business. As the topic was down my alley, I decided to accept their offer. At the time I felt that this would be my way of giving back to the data science programming community. I only asked that the companies I work with get mentioned in the article so that they can at least justify my being distracted by this project. The academic publisher accepted and said that these companies would be mentioned as my affiliations. I even provided their location details afterwards, so that they were going to be represented fully.
Months later, I got some feedback, some really minor corrections, that I took care of promptly. Finally, last month the article was published. I was pleased, for a couple of minutes, till I realized that the affiliations were all screwed up. Up to this day I am not sure how this could happen. It would take a whole new level of incompetence to mess up such a simple task, more than I was used to seeing through my academic life. Of course, mistakes happen and since I’m not perfect either, I politely asked for corrections on this part of the article. I had to do this twice, since apparently the first time they must have forgotten about it (apparently these corrections were not a priority to them). Up to this day, the article remains uncorrected, since clearly this 2-minute task is just too much for them to handle, or perhaps there isn’t much of a motivation.
If there was a slight chance of me ever working in an academic setting again, e.g. by writing articles like that one or academic papers, this is gone as this event proved what a colossal waste of time it is working with this sort of bureaucracy. Perhaps for you it’s different because you have higher tolerance or lower self-esteem (or maybe both) and you can put up with these clowns. However, if you are on a crossroad in your career in our field, be sure to explore your options wisely before being tempted to compromising with an academic publication gig. More often than not, it would not be worth your time, while all the other alternatives would be more rewarding.
Randomness, Uncertainty, Complex Systems, and Applications Video Now Online + Shout-out to a Viewer of This Blog
This past week I decided to do a vid on an experimental topic, involving different fields, an interdisciplinary topic if you will. I understand the risks of such a video, since randomness is not particularly easy as a subject, while complex systems are a bit niche as a field. However, I tried to bring about a more intuitive approach to all this and introduce a new feature for such videos: mini-quizzes so that you can test your understanding while you watch the video. Anyway, feel free to check out this introductory video to this topic by visiting the corresponding Safari page. Warning: some of the stuff covered in this video veers aways from conventional approaches to this topic. Also, the video is very light on the math aspect of the topic as otherwise it would be too long and it's already over 30 minutes in length...
Also, recently a viewer of this blog, S.M., contacted me with some suggestions on how to tackle certain typo-related issues he had found. Big thanks to S.M. for his contribution!
Bias-Variance Trade-Off for Data Science & Backing Up and Wiping Out Sensitive Data Videos Are Now Online
This past week I've had some time off work as my CEO was on vacation. As a result I did 2 videos, not just 1. Here they are:
The Bias-Variance Trade-Off: when you have a model that favors a certain class or a certain set of values, you have high bias, while you have a model whose predictions are all over the place, you have high variance. Could you find a compromise between the two? And how does all this relate to the model's fitness? This video includes a few examples too, for classification and regression problems, to cement the concepts introduced.
Backing Up and Wiping Out Sensitive Data: you probably have heard of this topic and perhaps even apply it to some extent, since taking care of sensitive data is a good cyber-security habit to have, plus it's not new either. However, there is much more to it than that, like which storage media are best for back-up, how you can handle sensitive data on your computer without leaving a trace, and what software is out there that helps make that happen.
It seems like yesterday when I came up with this encryption system, for which I even wrote on this blog about. I never expected to create a video on it, but what better way to share it with the world, at least its core aspects of it. As there is no reason why I'd consider my implementation of this idea the best possible, I leave the viewer to experiment on his/her own on that matter, after I explain each aspect of the method and showcase a couple of examples of it. Anyway, check out the video on Safari when you get the chance and let me know here what you think of it. Enjoy!
Why Articles on Social Media about Programming for Data Science Seem to Be Straight Out of a Time Capsule
Data science related topics sell, no doubt about that. This is great is you are interested in the field and want to learn more about it, especially practical things that can offer you some orientation in the field. Since programming is a key component of data science, it makes sense to pay attention to material along these lines, particularly if you are new to this whole matter.
How the Situation Is Today
Fortunately there is an abundance of articles on this topic, especially on the social media. However, not everyone who writes such articles is up-to-date on this subject since many of these “expert” tech writers are not forward thinking data scientists themselves. Best case scenario, they have spend a few minutes on the web, probably focusing on the results on the first page of a search engine for the bulk of their material. And shocking as it may be, this material may be geared more towards what’s more popular rather on what’s more accurate. Alternatively, they may have relied on what some data science guru once said on the topic, information that may no longer be particularly relevant. Apart from that, the writers who delve into the production of this sort of articles (or infographics in some cases) have their own biases. Probably they took a programming course at university so if a particular programming platform comes up on their “research” they may be more likely to highlight it. After all, this would make them knowledgeable since they have hands-on experience on that platform, even if it’s not that useful to data science any more. What’s more, many people who write about these topics don’t want to take risks with newer things. It’s much safer to mention languages that everyone knows about and which have a large community around them, than mention newer ones that may be despised by the hardcore users of older coding platforms.
Hope for the Future
For better or for worse, an article on the social media has a limited life span. After all, its purpose is mainly to get enough people to click on a particular link where a given site serves ads, so that the people owning the site can get some revenue from said ads. Therefore, if the article is forgotten in a week, its producers won’t lose any sleep over it. Books and subscription-based videos are not like that though. Neither are technical conferences. So, since the new trends are geared more towards this kind of platforms to become well-known, they are not that much hindered by social media misinformation. After all, if a programming language is good, this is something that will eventually show, even if the fan-boys of the more traditional languages would sooner die than change their views on their favorite coding platforms.
What You Can Do
So, instead of getting swayed by this or the other “expert” with X thousand followers (many of whom are probably either bots or bought followers), you can do your own research. Check out what books are out there on the various programming languages and if they hint towards applicability in data science. Check out videos on Safari and other serious educational platforms. Look at what new language conferences are out there and how they cover data science related topics. And most importantly, try some of these languages yourself. This way you’ll have some more reliable data when making a decision on what language is most relevant and most future-proof in our field, rather than blindly believe whatever this or the other “expert” on the social media says.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.