As you may have noticed, data analytics is always evolving as a field, so it's not surprising to see data science changes from year to year. What was hot and trendy in 2020 may not be as prominent soon and vice versa. That's not to say that you should expect to see drastic changes in 2021, but it's good to adapt your expectations, taking into account the latest trends. In this article, we'll explore all that and see how you can benefit from these insights for your career in data science.
It's no secret that deep learning is gaining even more popularity, particularly in time series analysis. So, RNNs are bound to rise in demand as a skill, particularly if you are involved in the field's forecasting part. Additionally, healthcare seems to be becoming more aligned with this tech, so it is expected that more medicine-related organizations are going to be looking for data scientists to join their ranks. What's more, IoT is expected to incorporate AI, making our work more relevant to infrastructure projects. Moreover, we should be expecting Reinforcement Learning (RL) to grow further as to use cases of it, such as chat-bots, are growing in popularity. Finally, it seems that more and more people are becoming aware of data science and AI's benefits, so it's easier than ever to make the business case for advanced analytics in a company. Simultaneously, the cloud provides a viable solution for the hardware required, something that's bound to stick in the coming years.
Based on the above, it’s reasonable to deduce that the data science specializations more likely to be more relevant this coming year are AI expert, data engineer (particularly the one geared towards machine learning, aka, machine learning engineer), and those data scientists with domain knowledge in healthcare and IoT. Naturally, Natural Language Processing experts are bound to remain in demand, particularly if they possess chat-bot know-how.
Beyond all that, it’s important to remember that the one thing that’s bound to remain relevant in the years to come, regardless of these trends, is the data science mindset. This mindset involves various aspects, such as problem-solving skills, creativity applied in analytics work, meeting deadlines, and collaborating with other data professionals, to name a few. The data science mindset is our attitude towards the data science problems we have to solve. As such, it's something essential and perhaps more relevant than whatever skill is in vogue at any given time.
You can learn more about the data science mindset and other relevant topics in this field through one of my books, titled Data Science Mindset, Methodologies, and Misconceptions. There I explore the various aspects of the field without getting too technical, all while highlighting those skills that make up the data science mindset. I include some soft skills and some hard ones that are still relevant today, even if some of the tools have evolved since then. So, check it out when you have a moment. Cheers!
Data science work entails a large number of tasks, spanning across the data analytics spectrum, but with an emphasis on predictive analytics. Also, it involves a lot of investigation of the data at hand (Exploratory Data Analysis), the use of advanced math (e.g. Graph Analytics and Optimization), as well as some understanding of the domain to facilitate communication with the stakeholders of the data science project. It also includes querying databases, combining data from various sources (sometimes in real-time), and putting all the findings together in a narrative that's jargon-free and easy to follow. Oftentimes, this is not possible to do with a single individual, which is why data science teams are commonplace, particularly in larger organizations.
Data science consultancy is performing these tasks (or at least some of them) on a project basis, without being part of the organization. In this case, the data scientist is a guest star of sorts, working with analysts, data architects, BI professionals, or whoever manages products like this in that organization. She needs to ask questions to understand the problem at hand, what is required of her, the bigger picture of the project, and the data involved. This is something that can take several months and usually starts with a proof-of-concept project, particularly if the organization is new to data science. Naturally, because of the overheads of consultancy, a data scientist like that will be paid more, while it's not uncommon for the organization to cover logistical costs and other expenses.
It would make more sense for the consultant data scientist to be at the organization permanently, cutting down the costs, right? Well, although theoretically, that's true, not every organization out there has the budget for an in-house data scientist. Oftentimes, the managers involved are not convinced regarding the value-add of data science, which is why they are more willing to work with consultants, even if that means paying more in the short-term. Besides, a data science consultant is bound to be better value for money since they focus on quality and good customer service. Many data science consultants have vastly more experience and broader domain knowledge too, making them a valuable asset, particularly if you need something done swiftly. Also, note that certain organizations have a strict policy when it comes to recruiting, so it's much easier for someone to hire a data science consultant than to go through the whole process of getting a full-time employee on board, especially if they aren't sure about their budget in the years to come.
Since this is a very broad topic and it’s hard to do it justice in a single article, I decided to focus on the highlights of it. If you wish to learn more about the data science work in practice, along with other business-related matters relevant to the role of data scientist, I invite you to check out the Data Scientist Bedside Manner book I co-authored earlier this year. In it, we cover this topic from various angles along with some practical advice as to how you can make the bridging of the technical and the non-technical world smoother and effective. Cheers!
The holiday season is upon us, something that translates for many of us to more free time. That’s why I decided to keep this article light and perhaps fun. After all, we all deserve a break after a year like this one! Regardless of your plans for the holidays, there are certain things you can do that are both enjoyable and educational.
For starters, if you are interested in programming (particularly recreational programming), you can check out the Exercism.io site. Exercism is an educational non-profit that aims to help people pick up a new programming language, including some of the more esoteric ones like bash. The site comprises of a series of exercises, some of which are on the language track while others are self-paced. As you proceed with the track, you unlock new exercises and explore new concepts in your language of choice. Also, there is some mentoring aid if you choose that option, helping you when you get stuck and/or showing you better ways to solve the exercises through useful tips and hints.
Another thing you can do is watch some videos on data science, or even take a course on the subject of data science or A.I. I know it may seem like a lot, but there is a lot of good material out there if you know where to look, which can help you augment your skills and know-how. Also, the cost of all this is fairly low, compared to what it used to be, so this sort of material is more accessible than ever before.
If you are up for more hands-on activities, you can play around with some data and do a mini data science project. Pick a dataset you are interested in and see what insights you can dig out from it. The project doesn't have to be 100% covered in explanatory text, but even without it, it can be good practice for you. Bonus points if you use a new technique or method.
Furthermore, you can check out some data science articles to be more up to speed on the latest trends or view certain topics from a different perspective. This blog can be a good place to start. Of course, if you want to read something that covers the subject in more depth, you can check out my data science books. You can find all of them at the publisher’s website, along with other technical books on similar subjects (esp. data modeling). Also, if you were to apply the coupon code DSML at the checkout, you can receive a 20% discount on whatever book you buy from that site.
So, there you have it. With these suggestions, you can now make good use of your time, without stressing. Besides, when you learn something this way, it tends to stick longer. Who knows, some of these activities may bear good fruits that you can leverage in the new year. Happy holidays!
Graphic cards deal with lots of challenging operations related to the number-crunching of image and video data. Since the computer's CPU, which traditionally manages this sort of task, has lots of stuff on its plate, it's usually the case that the graphics card has its own processor for handling all the data processing. This processor is referred to as GPUs (a CPU specializing in graphics data) and plays an essential role in our lives today, even when we don't care about the graphics on our computer. As we've seen in the corresponding book I've co-authored, it's crucial for many data science and AI-related tasks. In this article, we'll look at the latest information on this topic.
First thing's first: data science and A.I. needing GPUs is a modern trend, yet it's bound to stick around for the foreseeable future. The reason is simple: many modern data science models, especially those based on A.I. (such as large-scale Artificial Neural Networks, aka, Deep Networks), require lots of computing power to train. This additional computing requirement is particularly the case when there is lots of data involved. As CPUs come at a relatively higher cost, GPUs are the next best thing, so we use them instead. If you want to do all the model training and deployment on the cloud, you can opt for servers with extra GPUs for this particular task. These are referred to as GPU servers and are a decisive factor in data science and A.I. today.
What's the catch with GPUs, though? Well, first of all, most computers have a single graphics card, meaning limited GPU power on them. Even though they are cheaper than CPUs, they are still a high cost if you have large DNNs in your project. But the most critical impediment is that they require some low-level expertise to get them to work, even though it's simpler than building a computer cluster. That's why more often than not, it makes more sense to lease a GPU server on the cloud rather than build your own computer configuration utilizing GPUs. Besides, the GPU tech advances rapidly, so today's hot and trendy may be considered obsolete a couple of years down the road.
Beyond the stuff mentioned earlier, there are some useful considerations that are good to have in mind when dealing with GPUs in your data science work. First of all, GPUs are not a panacea. Sometimes, you can get by with conventional CPUs (e.g., standards cloud servers) for the more traditional machine learning models and the statistical ones. What's more, you need to make sure that your deep learning framework is configured correctly and leverages the GPUs as expected. Additionally, you can obtain extra performance from GPUs and CPUs if you overclock them, which is acceptable as a last resort if you need additional computing power.
For GPU servers that are state-of-the-art yet affordable, you can check out Hostkey. This company fills the GPU server niche while providing conventional server options for your data science projects. Its key advantage is that it optimizes the performance/cost metric, meaning you get a bigger bang for your buck in your data models. So, check it out when you have a moment. Cheers!
Data analytics is the field that deals with the analysis of data, usually for business-related objectives, though its scope covers any organization. A data analyst handles data with various tools, such as a spreadsheet program (usually MS Excel), a data visualization program (e.g. Tableau), a database program (e.g. PostgreSQL), and a programming language (usually Python), and then presents her findings using a presentation program (e.g. MS PowerPoint). This aids business decisions and provides useful insights into the state of an organization. It's akin to the Business Intelligence role, though a bit more hands-on and programming-related.
What about Statistics though? Well, it is a potent data analytics tool but whether it's something a data analyst actually needs is quite debatable. Apart from some descriptive stats that you are bound to use in one way or another, the bulk of Statistics is way too specialized and irrelevant to a data analyst. It doesn't hurt knowing it but it would highly biased to promote this sort of knowledge (in most cases it doesn't even classify as know-how) when there are much more efficient and effective tools out there. For example, being about to handle the data coming from various sources and organize it, be it through an ETL tool or some data platform, is far more of a value-add than trying to do something that's often beyond the scope of your role as an analyst (e.g. an in-depth analysis of the data at hand, through advanced data engineering or predictive modeling).
Having said that, Statistics is useful in data science, particularly if you are not well-versed in more advanced methodologies, such as machine learning. That’s why most data science courses start with this part of the toolbox along with the corresponding programming libraries. Also, most time-series analysis models are Stats-based and data scientists are often required to work with them, at least as a baseline before proceeding to build more complex models. Moreover, if you want to test a hypothesis (something quite common in data science work), you need to make use of statistical tests.
In data analytics, however, where the objective is somewhat different, Stats seems to be a somewhat unnecessary tool. Perhaps that's why most data analysts focus on other more practical and guaranteed ways to add value, such as dashboards, intuitive spreadsheets, and useful scripts, rather than building statistical models that few people care about. Besides, if someone needs something more in-depth and scientifically sound, they can always hire a couple of data scientists to work alongside with the analysts.
Regardless of your role, you can learn more about data science and the mindset behind it in my book Data Science Mindset, Methodologies, and Misconceptions. Although it was published about 4 years ago, it remains relevant and can shed a lot of light on Statistics' role in this field, as well as other methodologies and tools used by data scientists. Check it out when you have some time. Cheers!
As you may already know, Julia is a functional programming language geared towards scientific computing. It is particularly useful in data science nowadays as there are many specialized libraries for this. Simultaneously, it's a fast and easy-to-work-with language, enabling you to create useful scripts for data science tasks quickly. Additionally, it's similar to Python while there are bridge packages for the two languages, making it possible to jump from one to another, leveraging code from both languages in your data science projects.
As for machine learning, this is the part of data science that deals with the creation, refining, and deployment of specialized data models, based on the data-driven approach to data analytics. It involves systems like K-means (for clustering), Support Vector Machines (for predictive analytics), and various heuristics (for specific tasks such as feature evaluation) to facilitate all kinds of data science work. Much like Statistics, it is versatile, though contrary to Stats, it doesn't rely on probabilistic reasoning and distributions for analyzing the data. That's not to say that you need to pick between the two frameworks, however. A good data scientist uses both for her work.
Julia and machine learning are a match made in heaven. Not only does Julia offer direct support for machine learning tasks (e.g., through its various packages), but it also makes it easy for a data scientist (having just basic training in the language) to write high-performance scripts for processing the data at hand. You can even use Julia just for your data engineering tasks if you are already vested in another programming language for your data models. So, it's not an "either-or" kind of choice, but more of an add-on situation. Julia can be the add-on, though once you get familiar with it, you may want to translate your whole codebase in this language for the extra performance it offers.
This prediction isn't some optimistic speculation, by the way. Julia has been evolving for the past few years at a growing rate, even though other programming languages have also been coming about. Furthermore, it has the backing of a prestigious university (MIT), while there is a worldwide community of users and Julia-specific events, such as JuliaCon, happening regularly. So, if this trend continuous, Julia is bound to remain relevant for the years to come, expanding in functionality and application areas. Naturally, if machine learning continues on its current trajectory, it will also stick around for the foreseeable future.
If you want to learn more about Julia and machine learning, especially from a practical perspective, please check out my book Julia for Machine Learning, published in the Spring of 2020. There, you can learn more about the language, explore how it's useful in machine learning, learn more about what machine learning entails and how it ties in the data science pipeline, and experiment with various heuristics not so well known (some of them are entirely original and come with the corresponding Julia code). So, check this book out when you have the chance. Cheers!
A data scientist A.I. is an A.I. system that can undertake data science work end-to-end. Systems like AutoML are in this category and it seems that the trend isn’t going to go away any time soon. After all, a data scientist A.I. is better value for money and a solution that can scale very well. Amazon, for example, makes use of such systems to ensure that you view a personalized page based on your shopping and viewing history on the well-known e-commerce website.
But how feasible is all this for those not having Amazon’s immense resources? Well, A.I. systems like this are already available to some extent with the only thing missing is data. There are even people in data science who - in lack of another way to describe them - are short-sighted enough to implement them, effectively irreversibly destroying our field. So, once a critical mass of users have enough data to get such a system working well, the question of feasibility would give way to things like "how profitable is it?" or "will it be able to handle the work of other data professionals too?"
What about responsibility and liability matters though? Well, A.I. systems may do a great many things but taking responsibility is not one of them. As for when things get awry and there are legal issues, they cannot be held liable. As for the companies that developed them, well, they couldn’t care less. So, if you are using an A.I. system as a data scientist, you are effectively shouldering all the responsibility yourself, all while insulating yourself from any intervention capabilities. In other words, you need to trust the damn thing to do its job right, all while the data you give it may be biased in various ways, something no A.I. system has managed to handle yet.
So, what’s the bottom line in all this? Well, A.I. systems may undertake a lot of data science work successfully, but they cannot (at least at this time) be data scientists, no matter what the companies behind these systems promise. There is no doubt that A.I. is a useful tool in data science work, but you still need a human being, particularly one with some understanding of how an organization works, to be held responsible for the data science projects. Even if A.I. is leveraged in these projects at least there is someone to answer for the results, particularly if there are privacy violations or biases involved.
If you want to learn more about data science and AI’s role in it, feel free to check out the AI for Data Science book I’ve co-authored a couple of years ago. This book, published by Technics Publications, covers a series of AI-related methodologies related to data science, such as deep learning, as well as others that are more generic, such as optimization. The book is supplemented by Jupyter notebooks in Python and Julia and lots of examples. Check it out when you have a moment!
Nowadays, many people think that data science is as simple as most of its evangelists claim, so they end up making some avoidable mistakes in their work. Every data-related profession needs focus and attention, even more so if it involves complex problems as data science does. In this article, we'll explore some of the most common mistakes data scientists make in their day-to-day work and some suggestions as to how you can remedy them. If this article is popular, I may write another one on this topic, exploring additional mistakes data scientists make.
First of all, many data scientists carry the illusion that the world is like Kaggle competitions, where the data is relatively clean and tidy. Simultaneously, the only thing that matters is a performance metric, such as accuracy or mean squared error. Although there is merit in practice through such a competition, data science projects involve much more when you are learning about data models. So, paying attention to data engineering, mainly data cleaning and data exploration, is vital for data science work. Additionally, communicating your findings through a report/presentation and comments or any other text accompanying your code is crucial.
Another mistake many data scientists make is using models without understanding them enough. This superficial knowledge is especially the case with machine learning models, particularly AI-based ones (i.e., various artificial neural networks). Of course, the libraries at your disposal will do most of the weight-lifting, but you still need to know what they are doing, how the various hyper-parameters come into play, and what the outputs mean. Having a solid understanding of how they work under the hood can also be helpful, as it can make troubleshooting more straightforward and more efficient. So, for best results, learn about the models' theory, maybe even code one from scratch yourself, and use them properly.
Moreover, many data scientists don't understand the business side of things as they focus too much on the field's coding and math aspects. As a result, they don't always solve the problems they need to solve since they misinterpret the data project's requirements. This miscommunication or misaction is particularly severe as a mistake if the project has a tight deadline, since revisions may be grossly limited. So, instead of just looking at the technical aspects of data science, you can learn to "speak the business language" better and set more accurate goals for your data science project and corresponding tasks. This refined communication is a valuable transferable skill, by the way.
Finally, going through the motions as if it's a mechanical process, void of curiosity and creativity, can be a deadly mistake. It's not that it will kill you, but it will make the whole experience in the data science field lifeless and uninteresting. It's hard to build a career in it under these circumstances. The mistake is more like a symptom of a deeper problem, namely shallow learning of the field, particularly the mindset. If you view things in data science mechanically, you probably didn't understand it well enough to appreciate it and, to some extent, feel inspired by it. Remedying this mistake involves going into depth in its various methodologies and cultivating a genuine interest in it, starting with a real curiosity about it.
Although not a panacea, learning more about data science and the right mindset behind it can help alleviate most of the mistakes commonly made by data scientists. This knowledge and know-how can also help you lay strong foundations for your data science work and enable you to develop your skill-set effectively and efficiently. So, check out my book "Data Science Mindset, Methodologies, and Misconceptions" and spread the word about it if you are so inclined. Cheers!
I've talked a lot about GPUs and their value in data science and AI work, but let's look at the numbers. In this article I learned about recently, various servers equipped with state-of-the-art graphics cards are tested for certain common AI-related tasks. If it piques your interest, you can check out the actual server leasing options Hostkey offers. More information on that, here or in the corresponding page of this site. Cheers!
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.