January 28th, 2021
In a nutshell, Quantum Computing is the computing paradigm that uses quantum properties in computer systems, such as superposition and quantum tunneling. Experts consider quantum computing quite advanced and the state-of-the-art of computing today, even if the specialized hardware it uses makes it a bit of a niche case study. Despite its numerous merits, quantum computing is not a panacea, even though it is considered relevant in our field, particularly in A.I. This article will explore this relationship, where Q.C. is right now, and where you can access it.
Quantum computers' performance is usually measured in Qubits (quantum bits) instead of traditional bits. Each qubit is a quantum particle in superposition and corresponds to the more rudimentary piece of data a quantum computer can handle. Qubits are not easy to maintain, and when they work in tandem, it's quite probable for the superposition to collapse unexpectedly, resulting in errors in the computations involved. So, having a certain number of qubits (the larger, the better) in a computer is quite an accomplishment. Larger numbers enable quantum computer users to tackle more challenging problems, potentially adding more value to the project at hand.
Right now, quantum computing is at a stage where the number of qubits they can handle is in the two digits. For example, IBM's quantum machine boasts 65 qubits, although the company has plans for much larger numbers soon (they expect to have a quantum machine with 1000+ qubits by 2023). However, it's important to note that each company uses a somewhat different approach, meaning that the qubits in each computer they produce are not directly comparable to each other.
Now, what about the potential disruption in data science and A.I. work due to quantum computing? Well, since our field often involves lots of heavy computations, some of them around NP-hard problems in combinatorics (e.g., selected the optimal set of features from a feature set), it can surely gain from quantum computers. That's not to say, however, that every data science or A.I. project can experience a boost from the Q.C. world. Simpler models and standard ETL work are bound to remain the same while using a quantum machine for them would waste these pricy computational resources. So, it's more likely that a combination of traditional computing and quantum computing will be normal once quantum machines become more commonplace. Additionally, for optimization-related problems, particularly those involving many variables, quantum computing may have a lot to offer. Still, whether it's worth the price is something that needs to be determined on a case-by-case basis.
Let’s now look at the various quantum computing vendors out there. For starters, we have Amazon with its AWS Quantum Computing center at Caltech. Microsoft is also a significant player, with its Azure Quantum service, utilizing a specialized language (Q#). IBM is a key player, too, along with D-Wave Systems, the two being the first to develop this technology. Google Research is Alphabet's division for this tech and is now also a player in this area. What's more, there are hardware companies too in this game, such as Intel, Toshiba, and H.P. Naturally, all the companies that have developed their Q.C. product enough to make it available do so via a cloud, since it's much more practical this way. For those who like the cloud but don't have the budget or the project that lends itself to quantum machines, the Hostkey cloud provider relies on conventional computers, including some with GPUs onboard.
You can learn more this and other relevant topics to A.I. and data science, though my book A.I. for Data Science: Artificial Intelligence Frameworks and Functionality for Deep Learning, Optimization, and Beyond. In this book, my co-author and I cover various aspects of data science work related to A.I., as well as A.I.-specific topics, such as optimization. What’s more, the book has a hands-on approach to this subject, with lots of code in both Python and Julia. So, check it out when you can. Cheers!
Graphic cards deal with lots of challenging operations related to the number-crunching of image and video data. Since the computer's CPU, which traditionally manages this sort of task, has lots of stuff on its plate, it's usually the case that the graphics card has its own processor for handling all the data processing. This processor is referred to as GPUs (a CPU specializing in graphics data) and plays an essential role in our lives today, even when we don't care about the graphics on our computer. As we've seen in the corresponding book I've co-authored, it's crucial for many data science and AI-related tasks. In this article, we'll look at the latest information on this topic.
First thing's first: data science and A.I. needing GPUs is a modern trend, yet it's bound to stick around for the foreseeable future. The reason is simple: many modern data science models, especially those based on A.I. (such as large-scale Artificial Neural Networks, aka, Deep Networks), require lots of computing power to train. This additional computing requirement is particularly the case when there is lots of data involved. As CPUs come at a relatively higher cost, GPUs are the next best thing, so we use them instead. If you want to do all the model training and deployment on the cloud, you can opt for servers with extra GPUs for this particular task. These are referred to as GPU servers and are a decisive factor in data science and A.I. today.
What's the catch with GPUs, though? Well, first of all, most computers have a single graphics card, meaning limited GPU power on them. Even though they are cheaper than CPUs, they are still a high cost if you have large DNNs in your project. But the most critical impediment is that they require some low-level expertise to get them to work, even though it's simpler than building a computer cluster. That's why more often than not, it makes more sense to lease a GPU server on the cloud rather than build your own computer configuration utilizing GPUs. Besides, the GPU tech advances rapidly, so today's hot and trendy may be considered obsolete a couple of years down the road.
Beyond the stuff mentioned earlier, there are some useful considerations that are good to have in mind when dealing with GPUs in your data science work. First of all, GPUs are not a panacea. Sometimes, you can get by with conventional CPUs (e.g., standards cloud servers) for the more traditional machine learning models and the statistical ones. What's more, you need to make sure that your deep learning framework is configured correctly and leverages the GPUs as expected. Additionally, you can obtain extra performance from GPUs and CPUs if you overclock them, which is acceptable as a last resort if you need additional computing power.
For GPU servers that are state-of-the-art yet affordable, you can check out Hostkey. This company fills the GPU server niche while providing conventional server options for your data science projects. Its key advantage is that it optimizes the performance/cost metric, meaning you get a bigger bang for your buck in your data models. So, check it out when you have a moment. Cheers!
Handling PII in Data Projects
Personally Identifiable Information, or PII for short, is an essential aspect of data science work today. It involves sensitive data that can compromise the identity of at least some of the people involved in a dataset (e.g., someone's name, financial data, address, phone number, etc.). PII is particularly important today as it's protected by law in many countries, and any violation of this sort of data can fetch huge fines. What's more, PII is often essential in data science projects as it carries useful information that can bring about a sense of personalization to the data products developed.
Due to various factors, such as using multiple data streams, datasets used in data science today are full of PII. Note that PII can result from a combination of variables since there isn't an infinite amount of people. As a result, given enough information-rich variables, you can predict several PII variables with reasonable accuracy. This ability to predict PII makes the problem even more severe since PII can be a serious liability if it leaks. As there is plenty of it in modern datasets, the risk of this happening grows with the more data you gather for your data science project.
Of course, you could remove PII from your dataset, but it's not always a good option. After all, much of this PII is useful information that can help with the models built. So, even if you can eliminate certain variables, the bulk of PII will need to be retained for the models at hand to be useful and a value-add to your data science project. As for obscuring the PII variables (e.g., through a method like PCA), this is also a valid option. However, with it, any chance of transparency in your models goes out the window.
Fortunately, you can protect PII with various cybersecurity methods, without compromising your models' performance or transparency. Encryption, for example, is one of the most widely used techniques to keep data secure as it's turned into gibberish when not in use. In some cases, even in that gibberish state, you can perform some operations for additional security. However, in most cases, the protection is there for the time the data is in transit, which is when it's also the most vulnerable.
Since nowadays the use of the cloud for both storing and processing data is commonplace, the risk of exposing PII is more significant than ever. Fortunately, it's not too difficult to have security even in these situations, as long as the cloud provider offers this protection level. It just needs to have that in the platform it uses and in all the network connections involved.
Hostkey is a Dutch company providing cloud services, targeted towards data science professionals. Just like most modern cloud providers, it offers high-quality cybersecurity for all the data handled. At the same time, if you are super serious about this matter, you can also lease a dedicated server from it. Additionally, Hostkey offers GPU servers, which are a bigger bang for your buck when it comes to high-performance data models, such as deep learning ones. So, check out this cloud company and see how you can benefit from its services. Cheers!
Cloud Computing is the use of external computing resources for storage and computing tasks via the internet. As for the cloud itself, it's a collection of servers dedicated to this task and are made available, usually through some paid licensing, to anyone with an internet connection. Naturally, these servers are scalable, so you can always increase the storage and computing power you lease from a cloud provider. Although the cloud was initially used for storing data, e.g., for back-up purposes, it's used for various tasks, including data science work.
There are various kinds of machines used for cloud computing, depending on the tasks outsourced to them. For starters, there are the conventional (CPU) servers, used for storing and light computation. Most websites use this cloud computing option, and it's the cheapest alternative for cases where more specialized servers are utilized. However, for small-scale data science projects, especially those employing basic data models, these servers work well.
Additionally, there are the GPU servers that are more affordable for the computational resources they provide. Although GPUs are geared towards graphics-related work (e.g., the rendering of a video), they are well-suited for AI-based models. The latter make use of a lot of computations for the training phase of their function. As more and more data becomes available, this computational cost can only increase. So, having a scalable cloud computing solution that uses this type of server is the optimal strategy for deploying such a model.
Finally, there are also servers with large amounts of RAM, like the regular servers, but with plenty of extra RAM. Such servers are ideal for any use case where lots of data is involved, and it needs to be processed in large chunks. Many data science models fall into this application category since RAM is a valuable resource when large datasets are involved. Multimedia data, in particular, requires lots of memory to be processed at a reasonable speed, even for models that don't need any training.
Cloud computing has started to dominate the data science word lately. This phenomenon is partly due to the use of all-in-one frameworks, which take care of various tasks. These frameworks usually run on the cloud since they require many resources due to the models they build and train. As a result, unless there is a reason for data science work to be undertaken in-house, it is usually outsourced on the cloud. After all, most cloud computing providers ensure high-level encryption throughout the pipeline. The presence of cybersecurity mitigates the risk of data leaks or the compromise of personally identifiable information (PII) that often exists in datasets these days.
A great place that offers cloud computing options for data science is Hostkey. Apart from the conventional servers most hosting companies offer, this one provides GPU servers too. What's more, everything is at a very affordable price tag, making this an ideal solution for the medium- and the long-term. Check out the company's website for more information. Cheers!
Just like other pieces of hardware, graphic cards continuously evolve, becoming more efficient and more powerful. This constant evolution is partly due to the need for more computing power, whatever the form it can take. Graphic cards are no exception, and lately, NVIDIA has come to dominate this technology scene. This month, a new set of graphic cards by this company (the GeForce RTX 30 series) is making its debut. Naturally, this is bound to have ripple effects in data science, among other fields. In this article, we'll look at just that and see how you can benefit from this new development.
Graphic cards have been used in cloud computing successfully and other machines (e.g., regular PCs). The idea is to use their GPUs for crunching the numbers, instead of just standard CPUs, like those found on a typical computer. Although a GPU is part of a graphics card and designed to handle graphics data, it can be leveraged to handle complex mathematical calculations in various data science models. Particularly AI-related models, such as deep learning networks, have a lot to benefit from GPUs for various reasons. Not only are they fast and efficient (i.e., not consuming much power), but also they are inexpensive and can setups using them scale up very well. So, if you want to train that deep learning network quickly and without costing the moon, GPUs are your best option.
The new GPU servers based on the new NVIDIA graphics card can do this task even better. Featuring speeds up to twice as high as those of the previous generation cards (i.e., RTX 20), they are genuinely efficient. Additionally, they feature more memory (up to 24 GB GDDR6X, which is more than twice as much as the previous generation) and a different architecture altogether (Ampere vs. Turing previously). All this translates into a better experience for the user, particularly if that user has high graphics card demands. As more and more people use graphics cards in their AI-related work, these cards' manufacturers try to address this requirement in their new products. Of course, not all of them succeed, but those that do are big successes. Perhaps that's why NVIDIA has become a name you'd hear not only among gamers but also among data scientists and AI professionals.
Hostkey is one of those companies that have figured out the edge such state-of-the-art graphics cards can offer in cloud computing. That's why it boasts such GPU-powered servers among the various services it offers. Geared mostly towards data scientists, Hostkey has various packages available, many of which involve GPU servers. What's more, lately, it has started to offer servers with one of the latest NVIDIA cards on them (GeForce RTX 3080), which we expect to see released next week. Not only that, but it has a raffle for a free one-month subscription to this service, involving such a GPU server. Check out the company's website for more information. Cheers!
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.