Every year, there is a data modeling conference that takes place around the world. Its name is Data Modeling Zone, or DMZ for short (not to be confused with the DMZ in Korea, which isn't that good a place for data professionals!). Just like last year and the year before that, this year, I'll be participating in the conference as a speaker, talking about data science- and AI-related topics.
Namely, I'll talk about the common misconceptions about Machine Learning, something you may remember from my previous books. Still, this talk will cover the topic in more depth and help even newcomers to the field distinguish between the hype and the reality of machine learning. After my presentation, there will be some time for Q & A, so if you have any burning questions about this topic, you have a chance to have them answered.
Just like last year, DMZ is going to be online this year, making it super easy for you to attend, regardless of where you are. Also, there are plenty of interesting talks on various data-related topics, as you can see from the conference’s program.
I hope to see you there this November 18th!
The latter has been something I've been looking into for a while now. However, my skill-set hasn't been accommodating for this until recently, when I started working with GUIs for shell scripting. So, if you have a Linux-based OS, you can now use a GUI for a couple of methods in the Thunderstorm system. Well, given I'll release the code for it someday.
Alright, enough with the drama. This blog isn't FB or some other overly sensational platform. However, if you've been following my work since the old days, you may be aware of the fact that I've developed a nifty cipher called Thunderstorm. But that's been around for years, right? Well, yes, but now it's becoming even more intriguing. Let's see how and why this may be relevant to someone in a data-related discipline like ours.
First of all, the code base of Thunderstorm has been refactored significantly since the last time I wrote about it. These days, it features ten script files, some of which are relevant in data science work, too (e.g., ectropy_lite.jl) or even simulation experiments (e.g., random.jl, the script, not the package!). One of the newest additions to this project is a simple key generation stream (keygen) based on a password. Although this is not true randomness, it's relatively robust in the sense that no repeating patterns have emerged in any of the experiments on the files it produced. Some of the key files were several MB in size. So, even though these keys are not as strong as something made using true randomness (a TRNG method), they are still random enough for cryptographic tasks.
What's super interesting (at least to me and maybe some open-minded cryptographers) is a new method I put together that allows you to refresh a given key file. Naturally, the latter would be something employing true randomness, but the particular function would work for any file. This script, which I imaginatively named keys.jl, is one I've developed a GUI for too.
Although I doubt I'll make Thunderstorm open-source in the foreseeable future (partly because most people are still not aware of its value-add in the quantum era we are in), I plan to keep working on it. Maybe even build more GUIs for the various methods it has. The bench-marking I did a couple of months back was very promising for all of its variants (yes, there are variants of the cipher method now), so that's nice.
In any case, it's good to protect your data files in whatever way you can. What better way than a cipher for doing this, especially if PII is involved? The need for protecting sensitive data increases further if you need to share it across insecure channels, like most web-based platforms. Also, even if something is encrypted, lots of metadata from it can spill over since the encrypted file's size is generally the same as that of the original file. Well, that's not the case with the original version of Thunderstorm, which tinkers with that aspect of the data too. So, even metadata mining isn't all that useful if a data file is encrypted with the Thunderstorm cipher.
I could write about this topic until the cows come home, so I’ll stop now. Stay tuned for more updates on this cryptographic system (aka cryptosystem) geared towards confidentiality. In the meantime, feel free to check out my Cybersecurity-related material on WintellectNow, for more background information on this subject. Cheers!
It's a hectic week I have, so I didn't have a chance to post an article this past Monday. Probably I won't be posting anything till next week. You can take the time to check out some of the older articles of mine that you didn't have a chance to read yet. Anyway, I'm working on some cool projects these days, a couple of which I'll be posting about in the weeks to come, so stay tuned. Thank you for your patience!
I've talked a lot about GPUs and their value in data science and AI work, but let's look at the numbers. In this article I learned about recently, various servers equipped with state-of-the-art graphics cards are tested for certain common AI-related tasks. If it piques your interest, you can check out the actual server leasing options Hostkey offers. More information on that, here or in the corresponding page of this site. Cheers!
Just like other pieces of hardware, graphic cards continuously evolve, becoming more efficient and more powerful. This constant evolution is partly due to the need for more computing power, whatever the form it can take. Graphic cards are no exception, and lately, NVIDIA has come to dominate this technology scene. This month, a new set of graphic cards by this company (the GeForce RTX 30 series) is making its debut. Naturally, this is bound to have ripple effects in data science, among other fields. In this article, we'll look at just that and see how you can benefit from this new development.
Graphic cards have been used in cloud computing successfully and other machines (e.g., regular PCs). The idea is to use their GPUs for crunching the numbers, instead of just standard CPUs, like those found on a typical computer. Although a GPU is part of a graphics card and designed to handle graphics data, it can be leveraged to handle complex mathematical calculations in various data science models. Particularly AI-related models, such as deep learning networks, have a lot to benefit from GPUs for various reasons. Not only are they fast and efficient (i.e., not consuming much power), but also they are inexpensive and can setups using them scale up very well. So, if you want to train that deep learning network quickly and without costing the moon, GPUs are your best option.
The new GPU servers based on the new NVIDIA graphics card can do this task even better. Featuring speeds up to twice as high as those of the previous generation cards (i.e., RTX 20), they are genuinely efficient. Additionally, they feature more memory (up to 24 GB GDDR6X, which is more than twice as much as the previous generation) and a different architecture altogether (Ampere vs. Turing previously). All this translates into a better experience for the user, particularly if that user has high graphics card demands. As more and more people use graphics cards in their AI-related work, these cards' manufacturers try to address this requirement in their new products. Of course, not all of them succeed, but those that do are big successes. Perhaps that's why NVIDIA has become a name you'd hear not only among gamers but also among data scientists and AI professionals.
Hostkey is one of those companies that have figured out the edge such state-of-the-art graphics cards can offer in cloud computing. That's why it boasts such GPU-powered servers among the various services it offers. Geared mostly towards data scientists, Hostkey has various packages available, many of which involve GPU servers. What's more, lately, it has started to offer servers with one of the latest NVIDIA cards on them (GeForce RTX 3080), which we expect to see released next week. Not only that, but it has a raffle for a free one-month subscription to this service, involving such a GPU server. Check out the company's website for more information. Cheers!
It may seem surprising that a page like this would exist on this blog. After all, this is a blog on data science and A.I. Well, regardless of our field, we all need to write from time to time, be it for a blog, a report, or even the documentation that accompanies our work. Since writing in a grammatically correct way, void of typos doesn't come naturally to most of us, an online service like Grammarly can come in handy.
I was recommended this service about a year back by a fellow writer. Although my texts were pretty decent, I found that I'd still make some mistakes from time to time, or build sentences that weren't easy to follow. So, I took up the suggestion and started using Grammarly for some of my articles. The result in terms of engagement was evident from the very beginning. As a result, I've been using Grammarly ever since. At the same time, it's now part of my pipeline when it comes to publishing articles on this blog.
So, when I promote this service, it's out of my empirical understanding of its value and an appreciation of the tech behind it. For example, did you know that it uses deep learning and natural language processing (NLP) on the back-end? It also evaluates text based on different styles and objectives, giving you an overall score, all while pinpointing errors and points of improvement. For each one of these mistakes, it provides suggestions of how you can correct them and a rationale so that you learn from them. What more can you ask for?
I invite you to try it out on your browser using this affiliate link and if you see merit in it, register for the paid version. Using this link, you can also contribute to this ad-free blog by helping cover some of its expenses. Cheers!
In a previous article we talked about the value of data modeling and how it is related to data science as a field. Now let’s look at some great ways to learn more about this field.
Specifically, Technics Publications offers a few classes/workshops on data modeling this Autumn:
What’s more, you can get a 20% discount on them, if you use the coupon code DSML. You can use the same code for most of the books available on that site. Check it out!
Being an author has many benefits, some of which I’ve mentioned in a previous article. After all, an author (particularly a technical author) is more than just a writer. The former has undergone the scrutiny of the editing process, usually undertaken by professionals, while a writer may or may not have done the same. Also, an author has seen a writing project to its completion and has gotten a publisher to put his or her stamp of approval on that manuscript, before making it available to a larger audience. This raises the stakes significantly and adds a great deal of gravity to the book at hand.
Being an author is its own reward (even though there are other tangible rewards to it too, such as the royalties checks every few months!). However, there is a benefit that is much less obvious although it is particularly useful. Namely, an author can appreciate other authors more and learn from them. This is something that I have come to learn since my first book, yet this appreciation has reached new heights since then. This is especially the case when it comes to veteran authors who have developed more than one book.
All this leads to an urge to read more books and get more out of them. This is due to the value an author puts into these books. Instead of just a collection of words and ideas, he views a book as a sophisticated structure comprising of many layers. Even simple things like graphics take a new meaning. Of course, much of this detailed view of a book is a bit technical but the appreciation that this extra attention contributes to is something that lingers for long after the book is read.
Nevertheless, you don't need to be an author to have the same appreciation towards other people's books. This is something that grows the more you practice it and can evolve into a sense of discernment distinguishing books worth having on your bookshelf from those that you are better off leaving on the store! At the very least this ability can help you save time and money since it can help you focus on those books that have the most to offer to you.
In my experience, Technics Publications has such books worth keeping close to you, particularly if you are interested in data-related topics. This includes data science but also other disciplines like data modeling, data governance, etc. There is even a book on Blockchain, which I found very educational when I was looking into this technology, which goes beyond its cryptocurrency applications. Anyway, since good books come at a higher cost, you may want to take advantage of a special promo the publisher is doing, which gives you a 20% discount for all books, except the DMBOK ones. To get this discount, just use the DSML coupon code at the checkout (see image below).
Note that this coupon code applies to virtual classes offered by Technics Publications (i.e. the virtual training courses in the ASK series). This, however, is a topic for another article. Cheers!
Hi everyone. Since these days I explore a different avenue for data science education, I've put together another webinar that's just 3 weeks away (May 18th). If you are interested in AI, be it as a data science professional or a stakeholder in data science projects, this is something that can add value to you. Also, you'll have a chance to ask me questions directly and if the time allows, even have a short discussion on this topic.
Note that due to the success of previous webinars in the Technics Publications platforms, the price of each webinar has risen. However, this upcoming webinar, which was originally designed as a talk for an international conference in Germany, is still at the very accessible price of $14.99. Feel free to check it out here and spread the word to friends or colleagues. You can also learn about the other webinars this platform offers through the corresponding web page. Cheers!
These days I didn't have a chance to prepare an article for my blog. Between helping out a friend of mine and preparing for my webinar this Thursday, I didn't have the headspace to write anything. Nevertheless, one of the articles I wrote for my friend's initiative, related to mentoring, is now available on Medium. Feel free to check it out!
As for the webinar, it's about the data science mindset, a topic I talked about on all of my books, particularly the Data Science Mindset, Methodologies, and Misconceptions one. At the time of this writing, there are still some spots available for the webinars, so if you are interested, feel free to register for it here.
On another note, my latest book is almost ready for the review stage so I'll be working on that come Friday. Stay tuned for more details in the weeks to come...
That's all for now. I hope you have a great week. Stay healthy and positive!
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.