Although I really enjoy Julia and other new programming languages, I also use Python, SQL, etc. for various projects. Beyond that, I tend to mentor students in these more conventional languages, as it's part of their curriculum. Learning these languages, however, often comes at a cost and if someone has already shelled thousands of dollars in a course, it's doubtful they would be willing to spend more to go deeper on these subjects.
Fortunately, AIgents has you covered. This Data Science and AI platform for practitioners and learners in these fields recently launched a learning branch on its website, featuring a selection of useful resources for learning these technologies (these live on various sites, which you can find on your own but AIgents saves you time by do this tedious task for you and curating them to some extent). The best part is that all of these resources are free while there is also a community this platform has, to facilitate further such initiatives. You can check it out here. Cheers!
Remember all those videos I used to make for Safari / O'Reily's Learning platform? Well, most of them are now gone but the best ones (according to the publisher) are still available for you, in a pay-per-view mode. Learn more about them through this 1-minute video I made about them. It's just six of them at the moment, but I may get more of them out there in the months to come. Cheers
It's been about 7.5 months since I signed the contract for this book and now it's already on the bookshelves (so to speak). At least the book is available to buy at the publisher's website, in print format, PDF format, or a bundle of both. If you have been paying attention through this blog, you may be aware that you can get it with a 20% discount if you use a certain 4-letter code, when buying it from the publisher (hint: the code it DSML).
I put a lot of work into this book because it's probably going to be my last (technical) book, at least for the foreseeable future. Its topic is one that I've been very passionate about for many years and continue to delve into even today. Even though the book is very hands-on, accompanied by code notebooks (or codebooks as I often call them), you can read it without getting into the code aspects of it. Also, the concepts covered in it are applicable in any programming language used in data science and A.I.
If you want to learn more about the book, feel free to attend the (free) event on Friday, September 9th at 10 am ET (register through this link). I hope to see you then!
Update: I've made a short video about this, which I encourage you to share with anyone who might be interested: https://share.vidyard.com/watch/d58yZn9Y1cnWq6rQAsrotq?
As much as I'd love to write a (probably long) post about this, I'd rather use my voice. So, if you are interested in learning more about this topic, check out the latest episode of my podcast, available on Buzzsprout and a few other places (e.g., Spotify). Cheers!
Mentoring is one of those subjects I can talk about till the cows come home (the other such subjects are the Julia programming language, Data Science, and Cybersecurity). What makes it different, however, is that it's something that appeals to all sorts of professionals, not just data science and cybersecurity ones. In this article, I'll attempt to illustrate that through a series of questions and answers, for easier navigation and hopefully better understanding.
So, first of all, what is mentoring? In a nutshell, it's the formal manifestation of the most natural relationship in our species, that of passing on knowledge. This knowledge transfer is usually done from parents to children (and vice versa when it comes to the latest apps and gadgets!), from the elders to the younger individuals, and among peers with different levels of growth in a particular field. It's the most natural thing in the world to share one's knowledge and experiences with other people, often just for the sake of it. In the business world, where time is valued differently, this relationship usually takes the form of a professional relationship where money is involved, while there is a certain structure about it (e.g., regular meetings, a preassigned means of communication, etc.)
Well, we all have blind spots and gaps in our knowledge, plus we need to learn from others (what I refer to as dynamic learning) since solitary learning strategies are sometimes inadequate. Also, mentoring is often a powerful supplement to one's established learning strategies, enabling that person to deal with practical issues and questions that often arise from the new material. It's no coincidence that anyone in academia pursuing a challenging project, such as a dissertation, is often required to have a mentor of sorts to supervise his/her work. In some cases, such as a multi-disciplinary research project, two mentors are assigned to the learner. That was my experience during my Ph.D. at the University of London.
Anyone intending to learn something or hone their skills is a candidate for a mentee/protege. As for mentors, anyone you can learn from systematically and helpfully qualifies for that role. Of course, there is also the matter of availability, since many people are quite busy these days, so that's a requirement too. Practically, you cannot be a mentee or a mentor if your schedule is jam-packed. It takes time to invest for such a relationship to have a chance, just like anything worthwhile in our lives.
Mentoring usually makes use of a rhythm in the series of meetings involved. It doesn't have to be frequent, but having a rhythm is useful nevertheless. You can use the mentoring meetings to discuss
1. new topics the learner is interested in and often tackling individually,
2. problems the learner is facing, such as those related to the new material as well as its applications,
3. specific applications of the new material to understand how it applies in practice,
4. new ideas that extend the learning material and may be the product of the learner’s creativity,
5. anything else that the learner deems necessary or useful such as career-related matters
As with any product or service out there, there is a price tag involved (be very careful when someone is offering mentoring or access to "mentors" for free, as this is likely to be a scam). In general, the more you value the mentoring process, the more you're willing to pay for it. Sometimes, you can even work out an exchange kind of deal, where you offer a product or service for the mentoring you receive. More often than not, however, there is money involved, while there is also an intermediary to handle the transactions and take care of the logistics of the process.
Well, there is no better time than now, or at least, as soon as you can. Waiting for the perfect mentor or for a time when you have enough time to focus on mentoring is futile. You can always adjust your mentoring rhythm to the circumstances of your life if needed. I've had to change the weekly meetings I have with my mentees a few times because they were either dealing with a personal situation or a work-related matter.
Anywhere with a good internet connection (even a mobile internet connection) or, ideally, within proximity of the mentor. I remember having been paired with a mentor during my time in Microsoft and I've mentored people in person through the "Get Online" program in Greece, back in the day when the internet was a new thing and local business people were eager to utilize it for their businesses. However, most of the mentoring these days take place over a VoIP system, such as Zoom, or even over the phone. Generally, a VoIP system is preferable since it allows you to share your screen with the mentor and enable them to understand the problem better, facilitating a potential resolution.
All this sounds nice and dandy, but so what? The bottom line of all this is that through mentoring, you get to improve your skills (or develop new ones if you are a newcomer in a field), refine your mindset, and even upgrade your life status over time. Many people take on mentoring to shift careers or get a better job in their line of work, while others do it to become better at their current job. Every person is unique, and mentoring addresses that uniqueness, building on it.
Shameless self-promotion part
If you have been following my work or my blog, you're probably already aware of the fact that I'm involved in mentoring for several years. Lately, I've decided to take it to the next level and start mentoring people on other platforms too as well as one-on-one (no intermediary platform). Although I usually deal with the main currencies of the world (e.g., USD, British pounds, and Euros), I'm also open to cryptocurrencies too. You can learn more about my mentoring endeavors on the corresponding page of this blog. Cheers!
Lately, I've been preparing a podcast on the topic of (data) Analytics and Privacy. Having completed the first few episodes, I've decided to make it available at Buzzsprout. Alternatively, you can get the RSS feed to use with either a browser add-on or some specialized program that handles RSS links: https://feeds.buzzsprout.com/1930442.rss
The podcast deals with various topics related to privacy, usually from an analytics angle, or vice versa. However, it appeals to anyone who is interested in these subjects, not just specialized professionals. Clocked at around 20 minutes each, the episodes of this podcast are ideal for your daily commute or any other activity that doesn't require your full attention.
Feel free to check out these links and, if you like the podcast, share these links with friends and colleagues. Cheers!
For about a month now, I’ve been working on a new technical book for Technics Publications. This is a project that I've been thinking about for a while, which is why it took me so long to start. Just like my previous book, this one will be hands-on, and I'll be using Julia for all the code notebooks involved. Also, I'll be tackling a niche topic that hasn't been done before in this breadth and depth, in non-academic books. Because of this book, I won't be writing on this blog as regularly as before.
If you are interested in technical books from Technics Publications, as well as any other material made available from this place, you can use the DSML coupon code to get a 20% discount. This discount applies to most of the books there and the PebbleU subscriptions. So, check them out when you have a moment!
Recently a new educational video platform was launched on the web. Namely, Pebble U (short for Pebble University) made its debut as a way to provide high-quality knowledge and know-how on various data-related topics. The site is subscription-based, while it requires a registration for watching the videos and any other material available on it (aka pebbles). On the bright side, it doesn't have any vexing ads! Additionally, you can request a short trial of it, for some of the available material, before you subscribe to it. Win-win!
Pebble U has a unique selection of features that are very useful when consuming technical content. You can, for example, make notes and highlight parts and add bookmarks, on the books you read. As for the videos, many of them are accompanied by quizzes to embed your understanding of the topic covered. The whole platform is also available as an app for both Android and iOS devices.
The topics of Pebble U cover data science (particularly machine learning and A.I., though there are some Stats related videos too), Programming (particularly Python), and Business, among other categories. As the platform grows, it is expected to include additional topics and a larger number of content creators. All the videos are organized in meaningful groups called disciplines, making it easy to build on your knowledge. Of course, if you care for a particular discipline only, you can subscribe to material of that area only, saving you some money.
In the screenshot above, you can see some of my own material that are available on PebbleU right now. Many of them are from my Safari days, but there are also some newer ones, particularly on the topic of Cybersecurity.
By the way, if you find the subscription price a bit steep, remember that you can use the coupon code DSML I've mentioned in previous posts, to get a 20% discount. So, check it out when you have some time. This may be the beginning of something great!
I've talked about mentoring before and even mentioned it a few times in my books and videos. After all, it's an integral part of learning data science and A.I., among other fields. However, not all mentoring is created equal, and that's probably one of the most valuable lessons to learn in education. Unfortunately, to learn such a lesson you usually have to rely on your own experiences (since not many people want to talk about this matter).
Nowadays, everyone can sign up to particular sites and pretend to be a mentor. Sites like that often offer this for free since they know that charging for such a low-quality service would make them liable to lawsuits. However, the learner of data science and other fields often lacks the discernment to see such places for what they are in reality. Fortunately, however, there are much better alternatives.
Across the web, some sites provide proper mentoring, usually at a reasonable price, for all sorts of disciplines, including data science and A.I., among other fields. Many of these sites incorporate mentoring as part of their educational services, which include online classes too. However, that's not always the case. Someone can mentor you in your field of choice without having to follow a curriculum. This option is particularly appealing to professionals and people who have a relatively full schedule.
Proper mentoring involves various things, such as the following:
* career advice
* putting together a good resume or CV
* interview practice (particularly technical interviews)
* feedback on hands-on projects
Ideally, mentoring is a long-term process, though you can also opt for a handful of sessions to tackle specific problems you need help with. As long as you have an open mind, a willingness to learn, and value your mentor’s time, you are good to go. Naturally, bringing a specific task into the mentoring session can also be very useful, as it can help make it focused and productive.
By the way, if you wish to work with me as a mentee, I have some availability these months. What's more, I have set up a way to schedule these sessions efficiently using Calendly and have established collaboration with a freelance platform to handle payments and such (my Kwork link). So, if you are up for some proper mentoring, feel free to give me a buzz. Cheers!
I was never into Clustering. My Ph.D. was in Classification, and later on, I explored Regression on my own. I delved into unsupervised learning too, mostly dimensionality reduction, for which I've written extensively (even published papers on it). For some reason, Clustering seemed like a solved problem, and as one of my supervisors in my Ph.D. was a Clustering expert (he had even written books on this subject) I figured that there isn't much for me to offer there. Then I started mentoring data science students and dug deeper into this topic. At one point, I reached out to some data scientists I'd befriended over the years asking them this same question. The best responses I got were that DBSCAN is mostly deterministic (though not exactly deterministic if you look under the hood) and that K-means (along with its powerful variant, K-means++) was lightweight and scalable. So, I decided to look into this matter anew and see if I could clean up some of the dust it has accumulated with my BROOM.
Please note that when I started looking into this topic, I had no intention to show off my new framework nor to diminish anyone's work on this sub-fiend of data science. I have great respect for the people who have worked on Clustering algorithms, be it in research or their application-based work.
With all that out of the way, let's delve into it. First of all, deterministic Clustering is possible even if many data scientists will have you believe otherwise. One could argue that any data science algorithm can be done deterministically though this wouldn't be an efficient approach. That's why stochastic algorithms are in use, particularly in challenging problems like Clustering. There is nothing wrong with that. It's just frustrating when you get a different result every time you run the algorithm and have to set a random seed to ensure that it doesn't change the next time you use that code notebook where it lives. So, deterministic is an option, just not a popular one.
What about being lightweight? Well, if it's an algorithm that requires running a particular process again and again until it converges (like K-means), maybe it's lightweight, but probably not so much since it's time-consuming. Also, most algorithms worth their salt aren't as simple as K-means, which though super-efficient, leaves a lot to be desired. Let's not forget the assumptions it makes about the clusters and its reliance on distance, which tends to fail when several dimensions are present. So, in a multi-dimensional data space, K-means isn't a good option, and just like any other clustering algorithm, it struggles. DBSCAN struggles too, but for a different reason (density calculations aren't easy, and in multi-dimensional space, they are a real drag).
So, where does that leave us? Well, this is quite a beast that we have to deal with (the combination of a deterministic process and it being lightweight), so we'll need a bigger boat! We'll need an enormous boat, one armed with the latest weapons we can muster. Since we don't have the computational power for that, we'll have to make do with what we have, something that none of the other brilliant Clustering experts had at their disposal: BROOM. This framework can handle data in ways previously thought impossible (or at least unfeasible). High dimensionality? Check. Advanced heuristics for similarity? Check. An algorithm that features higher complexity without being computationally complex? Check. But the key thing BROOM yields that many Clustering experts would kill for is the initial centroids. Granted that they are way more than we need, it's better than nothing and better than the guesswork K-means relies on due to its nature.
In the toy dataset visualized above, I applied the optimal clustering method I've developed based on BROOM, there were two distinct groups in the dataset across the approximately 600 data points located on a Euclidean plane. Interestingly, their centers were almost the same, so K-means wouldn't have a chance to solve this problem, no matter how many pluses you put after its name. The initial centroids provided by BROOM were in the ballpark of 75, which is way too high. After the first phase of the algorithm, they were reduced to 7 (!) though even that number was too high for that dataset.
After some refinement, which took place in the second phase of the algorithm, they were reduced to 2. The whole process took less than 0.4 seconds on my 5-year-old laptop. The outputs of that Clustering algorithm included the labels, the centroids, the indexes of the data points of each cluster, the number of data points in each cluster, and the number of clusters, all as separate variables. Naturally, every time the algorithm was run it yielded the same results since it's deterministic.
Before we can generalize the conclusions that we can draw from this case study, we need to do further experimentation. Nevertheless, this is a step in the right direction and a very promising start. Hopefully, others will join me in this research and help bring Clustering the limelight it deserves, as a powerful data exploration methodology. Cheers!
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.