Two weeks have passed since I launched the podcast and so far the number of downloads have exceeded my expectations (with over 2100 downloads so far). Yet, regardless of all this, I continue humbly with my efforts to raise awareness about the whole Privacy matter and how it's relevant to Analytics work. In the latest episode of the podcast, published just this morning, I interview Steve Hoberman, the data modeling professional and lecturer at the Columbia University I've been working with since the beginning of my career in data science. Without getting too technical, we talk about various topics related to the relationship of Analytics professionals and the Business, as well as how privacy factors in all this. This is the only episode of this podcast that doesn't contain a sponsor ad, for obvious reasons. Check it out when you have a moment!
0 Comments
Lately, I've been preparing a podcast on the topic of (data) Analytics and Privacy. Having completed the first few episodes, I've decided to make it available at Buzzsprout. Alternatively, you can get the RSS feed to use with either a browser add-on or some specialized program that handles RSS links: https://feeds.buzzsprout.com/1930442.rss The podcast deals with various topics related to privacy, usually from an analytics angle, or vice versa. However, it appeals to anyone who is interested in these subjects, not just specialized professionals. Clocked at around 20 minutes each, the episodes of this podcast are ideal for your daily commute or any other activity that doesn't require your full attention. Feel free to check out these links and, if you like the podcast, share these links with friends and colleagues. Cheers! For about a month now, I’ve been working on a new technical book for Technics Publications. This is a project that I've been thinking about for a while, which is why it took me so long to start. Just like my previous book, this one will be hands-on, and I'll be using Julia for all the code notebooks involved. Also, I'll be tackling a niche topic that hasn't been done before in this breadth and depth, in non-academic books. Because of this book, I won't be writing on this blog as regularly as before.
If you are interested in technical books from Technics Publications, as well as any other material made available from this place, you can use the DSML coupon code to get a 20% discount. This discount applies to most of the books there and the PebbleU subscriptions. So, check them out when you have a moment! Recently a new educational video platform was launched on the web. Namely, Pebble U (short for Pebble University) made its debut as a way to provide high-quality knowledge and know-how on various data-related topics. The site is subscription-based, while it requires a registration for watching the videos and any other material available on it (aka pebbles). On the bright side, it doesn't have any vexing ads! Additionally, you can request a short trial of it, for some of the available material, before you subscribe to it. Win-win! Pebble U has a unique selection of features that are very useful when consuming technical content. You can, for example, make notes and highlight parts and add bookmarks, on the books you read. As for the videos, many of them are accompanied by quizzes to embed your understanding of the topic covered. The whole platform is also available as an app for both Android and iOS devices. The topics of Pebble U cover data science (particularly machine learning and A.I., though there are some Stats related videos too), Programming (particularly Python), and Business, among other categories. As the platform grows, it is expected to include additional topics and a larger number of content creators. All the videos are organized in meaningful groups called disciplines, making it easy to build on your knowledge. Of course, if you care for a particular discipline only, you can subscribe to material of that area only, saving you some money. In the screenshot above, you can see some of my own material that are available on PebbleU right now. Many of them are from my Safari days, but there are also some newer ones, particularly on the topic of Cybersecurity. By the way, if you find the subscription price a bit steep, remember that you can use the coupon code DSML I've mentioned in previous posts, to get a 20% discount. So, check it out when you have some time. This may be the beginning of something great! Every year, there is a data modeling conference that takes place around the world. Its name is Data Modeling Zone, or DMZ for short (not to be confused with the DMZ in Korea, which isn't that good a place for data professionals!). Just like last year and the year before that, this year, I'll be participating in the conference as a speaker, talking about data science- and AI-related topics.
Namely, I'll talk about the common misconceptions about Machine Learning, something you may remember from my previous books. Still, this talk will cover the topic in more depth and help even newcomers to the field distinguish between the hype and the reality of machine learning. After my presentation, there will be some time for Q & A, so if you have any burning questions about this topic, you have a chance to have them answered. Just like last year, DMZ is going to be online this year, making it super easy for you to attend, regardless of where you are. Also, there are plenty of interesting talks on various data-related topics, as you can see from the conference’s program. I hope to see you there this November 18th! The latter has been something I've been looking into for a while now. However, my skill-set hasn't been accommodating for this until recently, when I started working with GUIs for shell scripting. So, if you have a Linux-based OS, you can now use a GUI for a couple of methods in the Thunderstorm system. Well, given I'll release the code for it someday.
Alright, enough with the drama. This blog isn't FB or some other overly sensational platform. However, if you've been following my work since the old days, you may be aware of the fact that I've developed a nifty cipher called Thunderstorm. But that's been around for years, right? Well, yes, but now it's becoming even more intriguing. Let's see how and why this may be relevant to someone in a data-related discipline like ours. First of all, the code base of Thunderstorm has been refactored significantly since the last time I wrote about it. These days, it features ten script files, some of which are relevant in data science work, too (e.g., ectropy_lite.jl) or even simulation experiments (e.g., random.jl, the script, not the package!). One of the newest additions to this project is a simple key generation stream (keygen) based on a password. Although this is not true randomness, it's relatively robust in the sense that no repeating patterns have emerged in any of the experiments on the files it produced. Some of the key files were several MB in size. So, even though these keys are not as strong as something made using true randomness (a TRNG method), they are still random enough for cryptographic tasks. What's super interesting (at least to me and maybe some open-minded cryptographers) is a new method I put together that allows you to refresh a given key file. Naturally, the latter would be something employing true randomness, but the particular function would work for any file. This script, which I imaginatively named keys.jl, is one I've developed a GUI for too. Although I doubt I'll make Thunderstorm open-source in the foreseeable future (partly because most people are still not aware of its value-add in the quantum era we are in), I plan to keep working on it. Maybe even build more GUIs for the various methods it has. The bench-marking I did a couple of months back was very promising for all of its variants (yes, there are variants of the cipher method now), so that's nice. In any case, it's good to protect your data files in whatever way you can. What better way than a cipher for doing this, especially if PII is involved? The need for protecting sensitive data increases further if you need to share it across insecure channels, like most web-based platforms. Also, even if something is encrypted, lots of metadata from it can spill over since the encrypted file's size is generally the same as that of the original file. Well, that's not the case with the original version of Thunderstorm, which tinkers with that aspect of the data too. So, even metadata mining isn't all that useful if a data file is encrypted with the Thunderstorm cipher. I could write about this topic until the cows come home, so I’ll stop now. Stay tuned for more updates on this cryptographic system (aka cryptosystem) geared towards confidentiality. In the meantime, feel free to check out my Cybersecurity-related material on WintellectNow, for more background information on this subject. Cheers! It's a hectic week I have, so I didn't have a chance to post an article this past Monday. Probably I won't be posting anything till next week. You can take the time to check out some of the older articles of mine that you didn't have a chance to read yet. Anyway, I'm working on some cool projects these days, a couple of which I'll be posting about in the weeks to come, so stay tuned. Thank you for your patience! I've talked a lot about GPUs and their value in data science and AI work, but let's look at the numbers. In this article I learned about recently, various servers equipped with state-of-the-art graphics cards are tested for certain common AI-related tasks. If it piques your interest, you can check out the actual server leasing options Hostkey offers. More information on that, here or in the corresponding page of this site. Cheers! Just like other pieces of hardware, graphic cards continuously evolve, becoming more efficient and more powerful. This constant evolution is partly due to the need for more computing power, whatever the form it can take. Graphic cards are no exception, and lately, NVIDIA has come to dominate this technology scene. This month, a new set of graphic cards by this company (the GeForce RTX 30 series) is making its debut. Naturally, this is bound to have ripple effects in data science, among other fields. In this article, we'll look at just that and see how you can benefit from this new development. Graphic cards have been used in cloud computing successfully and other machines (e.g., regular PCs). The idea is to use their GPUs for crunching the numbers, instead of just standard CPUs, like those found on a typical computer. Although a GPU is part of a graphics card and designed to handle graphics data, it can be leveraged to handle complex mathematical calculations in various data science models. Particularly AI-related models, such as deep learning networks, have a lot to benefit from GPUs for various reasons. Not only are they fast and efficient (i.e., not consuming much power), but also they are inexpensive and can setups using them scale up very well. So, if you want to train that deep learning network quickly and without costing the moon, GPUs are your best option. The new GPU servers based on the new NVIDIA graphics card can do this task even better. Featuring speeds up to twice as high as those of the previous generation cards (i.e., RTX 20), they are genuinely efficient. Additionally, they feature more memory (up to 24 GB GDDR6X, which is more than twice as much as the previous generation) and a different architecture altogether (Ampere vs. Turing previously). All this translates into a better experience for the user, particularly if that user has high graphics card demands. As more and more people use graphics cards in their AI-related work, these cards' manufacturers try to address this requirement in their new products. Of course, not all of them succeed, but those that do are big successes. Perhaps that's why NVIDIA has become a name you'd hear not only among gamers but also among data scientists and AI professionals. Hostkey is one of those companies that have figured out the edge such state-of-the-art graphics cards can offer in cloud computing. That's why it boasts such GPU-powered servers among the various services it offers. Geared mostly towards data scientists, Hostkey has various packages available, many of which involve GPU servers. What's more, lately, it has started to offer servers with one of the latest NVIDIA cards on them (GeForce RTX 3080), which we expect to see released next week. Not only that, but it has a raffle for a free one-month subscription to this service, involving such a GPU server. Check out the company's website for more information. Cheers! It may seem surprising that a page like this would exist on this blog. After all, this is a blog on data science and A.I. Well, regardless of our field, we all need to write from time to time, be it for a blog, a report, or even the documentation that accompanies our work. Since writing in a grammatically correct way, void of typos doesn't come naturally to most of us, an online service like Grammarly can come in handy.
I was recommended this service about a year back by a fellow writer. Although my texts were pretty decent, I found that I'd still make some mistakes from time to time, or build sentences that weren't easy to follow. So, I took up the suggestion and started using Grammarly for some of my articles. The result in terms of engagement was evident from the very beginning. As a result, I've been using Grammarly ever since. At the same time, it's now part of my pipeline when it comes to publishing articles on this blog. So, when I promote this service, it's out of my empirical understanding of its value and an appreciation of the tech behind it. For example, did you know that it uses deep learning and natural language processing (NLP) on the back-end? It also evaluates text based on different styles and objectives, giving you an overall score, all while pinpointing errors and points of improvement. For each one of these mistakes, it provides suggestions of how you can correct them and a rationale so that you learn from them. What more can you ask for? I invite you to try it out on your browser using this affiliate link and if you see merit in it, register for the paid version. Using this link, you can also contribute to this ad-free blog by helping cover some of its expenses. Cheers! |
Zacharias Voulgaris, PhDPassionate data scientist with a foxy approach to technology, particularly related to A.I. Archives
December 2022
Categories
All
|