So, about 18 months ago I created a video on Safari about how A.I. could benefit Data Science (DS and AI). Even though at that time I was still figuring things out regarding how educational videos work, the vid was immensely popular and even today still attracts lots of views. Considering that all of my recent videos are (much) better than that one, at least technically, this is quite intriguing.
Anyway, fast forward to September last year. As I was walking in the streets of suburban Seattle, thinking about what to do next (my Data Science Mindset, Methodologies, and Misconceptions book had just been released), I decided to write another book, one about A.I. since this topic continued to fascinate me, while it was becoming a popular topic among various data scientists. So, I pitched the idea of a new book to Steve Hoberman and after sorting out the details, we got a contract going. However, due to various reasons we decided to start the book in January.
The whole project was quite a turbulent one, with my co-author dropping out around March, leaving me in a very difficult situation. Yet, I decided that the book was worth completing. Fortunately, another data scientist / A.I. expert decided to join me in this endeavor, Yunus E. Bulut, who I got acquainted with through Thinkful. Long story short, after a few discussions about the project he had a contract of his own as a co-author.
Three months later, the first draft was complete. Of course the book went through a lot of revisions since then, partly because the technology was changing and partly because there were a lot of topics in this book, which was difficult to coordinate and merge into a coherent whole. Also, at one point Julia reached adulthood as a programming language (v. 1.0) so we had to update the code for the chapters that had programs in Julia.
So, after a feverish summer, plagued by heat waves and other obstacles, we finished the edits (at least the most important ones, since a book is never really finished!) and the book went to the press. Now, it is finally available for you to buy at whatever vendor you prefer. Check out the publisher's site for more details. Cheers!
With all the plethora of material out there for data science education, it is easy to get overwhelmed and even confused about what to study and how much time, money, and effort to put into it. Enter evaluation of data science material, a concise strategy for tackling this issue. In this 24 minute video, I talk about the various aspects of data science material, criteria for evaluating it, the matter of resources required to delve into this material, and some useful things to have in mind in your data science education efforts. Whether you are a newcomer to the field or a more seasoned data scientist, you have something to learn about data science (I know I do!) and this video can hopefully aid you in that. You can find it on Safari.
Note that in order to be able to view this video in its entirety, you'll need a subscription for the Safari platform. Also, it's important to remember that this video can offer you a framework for evaluating the data science material; you'll still need to find that material though and put the effort to study it, in order to make the most of it. The video can only help you organize your efforts more efficiently. Enjoy!
Recently I read about some “research project” that Google’s A.I. branch conducted on the behavior of AIs as they tackle a certain simple scenario (a game of sorts). Various AIs were tested, including some more advanced ones, and the conclusion these researchers jumped to was that advanced AIs tend to be aggressive.
Let’s assume for a moment that this was a scientifically valid research experiment and that the people involved followed science protocols closely. I know this is a big assumption but bear with me for a while. Can we accurately deduce the aggressiveness of an AI using this kind of setting? Or is there some inherent bias in the research question asked to start with?
It’s important to note that the problem the AIs were tested on involved picking apples from an orchard and that the objective was to pick as many apples as possible. Naturally, there was a finite amount of apples to start with though in the beginning the orchard appeared abundant. Also, there were two AIs tested at a time and they were equipped with lasers, capable of stopping the other player for a while, so that more apples could be picked.
So, after the AIs were deployed they went about their apple-picking endeavors. They took all the cash they could gather and politely lined up at an Apple store, all while contemplating what products to buy. Sorry, wrong experiment! In Google’s experiment the apples were actual fruits, not related to the tech giant who brought us the iPhone! Anyway, the AIs were given the option to collaborate or adopt an adversarial strategy (i.e. be trigger-happy when it comes to its laser pistol). Naturally they chose the latter, particularly when the number of apples was waning. The more advanced AIs adopted this course of action even sooner, probably because they could “see” further ahead.
So, based on this experiment, one can conclude that an AI is bound to be more aggressive, in order to accomplish its objective, much like an animal would (e.g. a dog that feels that its territory is being threatened by some other dog that decided to pee there for some reason). In other words, intelligence can advance all it wants, but at the end of the day, its bearer is bound to act like an animal, since it only cares about winning its game (i.e. optimizing its objective function). This is sound reasonable, right?
Well no. This is a particular case where an AI is given only two options and a very rigid objective, while its perception is limited to the two dimensional data of the game and a score. So, one could argue that the whole scenario is oversimplified and unrealistic. Plus what would the AI do with all these apples? Does it account for the fact that some of them may go bad or that if it decides to sell them in some form (e.g. an apple pie), there is the law of diminishing returns in the ROI of this whole endeavor? What about AI politics? What would other AIs think if it exhibits such aggressive behavior? Would anyone ever want to collaborate with it for another project? Naturally, the AIs involved in Google’s experiment don’t think about these things (like a human would probably do), since they have a one-track mind, caring only about the number of apples they collect. In such a scenario, no matter how advanced the AI is, it’s bound to seek actions that optimize the corresponding objective function, attacking anything that comes in its way, much like a short-sighted beast.
Perhaps instead of taking the word of some “expert” as gospel, it would be more fruitful for someone to ponder on this matter himself. Also, if so inclined, one can build her own AI experiments and explore other alternatives in the AIs’ pursuit of apples (or some other measurable objective). After all, things are not so simple when it comes to AI, so it makes sense to examine this matter with sufficient depth of thought, unless of course we just opt for some sensational result to drive home a point, which may or may not bear any scientific validity.
So, recently I decided to make a couple of videos on niche topics, namely the Business Aspect of A.I. in Data Science and Extreme Learning Machines (ELMs). These vids are now available on Safari (here and here). Enjoy!
Note that in order to view these vids in their entirety you'll need a subscription to the platform. The latter enables you to view other materials, including a large variety of technical books as well as all my other videos. Cheers!
Last week I’ve finished my part of the final corrections stage of the new technical book I’d been working on for the past few months. My co-author, Yunus, has done the same, so the book should be in the press later this month! Hopefully, you should be able to purchase it soon, either from the publisher’s site, or from some other vendor (e.g. Amazon). Just wanted to share that with you all. Once the book is out there, I’ll be sure to make an announcement about it here on this blog. Cheers!
As a famous Chinese sage once said, "a car is more than the sum of its parts." It's intriguing how this applies not just to ancient vehicles in the Orient, but also to a special kind of data science models called ensembles. So, if you want to learn more about this fascinating topic and how it is useful in a data science setting, check out my latest video on the Safari platform.
Note that you will need a subscription to the Safari system in order to view this vid in its entirety. However, with such a subscription you'd be able to access a lot of other material on a variety of technical topics, including all my other videos. Cheers!
A famous scientist from the Quantum Physics school of thought once said “asking the right question is more than halfway towards finding the answer.” Although it’s been years since I read this quote (which I may be paraphrasing, by the way), it still echoes a deep truth and helps guide my (non-academic) research in the data science and A.I. fields. So, I few weeks ago I put forward the question “what would a statistical framework framed around possibilities be like?”
At first glance, such a question may seem nonsensical since from an early age we’ve all be taught the core aspects of Stats and how it’s all about probabilities. There is no doubt that the probabilistic approach to modeling uncertainty has yielded a lot of fruits as the field grew, but all developments of Statistical methods were bound by the limitations of the assumptions made, mirrored by the various distributions used. In other words, if you want results with conventional Stats, you’ve got to use this or the other distribution and keep in mind that if the data you have doesn’t follow the distribution assumed, the results may not be reliable. What if the field of Stats was void of such restrictions by assuming a membership function instead of a distribution, to describe the data at hand?
I’m not going to describe in length where this rabbit hole leads, but suffice to say that the preliminary results of a framework based on this alternative approach exceeded my expectations. Also, there is no Stats process that I looked at which could not be replicated with the possibilistic approach. What’s more, since the possibilistic approach to data analytics is one of the oldest forms of A.I., it is sensible to say that such a statistical framework would be in essence AI-based, though not related to deep learning, since that’s a completely different approach to A.I. that has its own set of benefits. Nevertheless, I found that having a statistical framework that borrows an A.I. concept in its core, can provide an interesting way to bridge the gap between Stats-based data analytics and modern / A.I. based.
What’s even more interesting is that this can be a two-way street, with A.I. also being able to benefit from such a nexus between the two fields. After all, one of the biggest pain points of modern A.I. is the lack of transparency, something that’s a freebie when it comes to Stats modeling. So, an A.I. system that has elements of Stats at its core may indeed be a transparent one. However, this idea is still highly experimental, so it would be best to not discuss it further here.
Whatever the case, I have no doubt that the possibilistic approach to data has a lot of merit and hasn’t been explored enough. So, it is possible that it has a role to play in more modern data analytics systems. The question is, are you willing to accept this possibility?
Being quite international, I often travel and as lately I got a bit restless I decided to travel more. So, these months I’m on the road, so to speak, as I work remotely. The fact that most of my work activities lately revolve around my new book (co-authored with Yunus E. Bulut), for Technics Publications, I can work for anywhere and do so fairly easily. So, for this month or so I’m in Lisbon, Portugal.
Working remotely isn’t easy but if you are adaptable and flexible, it’s quite feasible. Besides, the companies I work with are quite trusting and flexible, so working for them remotely is not only feasible but preferable. Although it’s much easier in places like the US or the UK, where internet connections are reliable and fairly fast, it is possible to work in other places too, as long as I feel comfortable enough with the language and the everyday routine. Basically, the main thing one needs is a temporary office and a good internet connection, as well as places to hang out and make the most of one’s free time. Fortunately Lisbon offers that.
At first I looked at co-working spaces but I decided against it afterwards. The one I liked the most (at least on paper) was quite challenging to get to (you have to take the elevator from the nearby building, walk down a long corridor, climb some stairs, and then hope you’ll be let in the office space itself. The fact that the people there didn't make much of an effort to help with any of that (they somehow assumed you’d intuitively find your way in, as if you are a detective in training!) discouraged me from using that space. Also, the fact that they didn't reply to my email made me think that they weren't really that professional. I did find another co-working space where people were more professional, but it was quite far from where I’m staying and I didn't want to take a cab every day to get there. So, I ended up working from a nice coffee shop in a trendy spot of the city instead.
Even though co-working spaces were not a viable option for me in Lisbon, I still found the city very enjoyable so far. It’s much cooler than Bologna (temperature-wise), people are very friendly, and well, there is access to the ocean. What more could someone ask of a city if he’s staying there for a month? Now, I don’t know how the place is in the winter time, but I’d rather keep it this way. The houses here are not so great with insulation, while it seems that most of the people visiting Lisbon do so in the summertime, so I’d expect it to be less bustling with activity. Nevertheless, since it’s quite South, it’s bound to be warmer and sunnier than other parts of the continent.
The internet connections here are surprisingly good. At least they are good enough for a video conference and that’s good enough for me. If you want to upload or download really large files it may take a while, but here the pace of life is slower, so it doesn't seem much of a problem if you need to wait a few more minutes for syncing some files with the cloud.
Lately I came across various digital nomads who live and work in Lisbon. Some of them were more on the expats side of the spectrum, but all of them were very interesting and fun to talk to. It's also interesting that they were in a variety of professions, so the idea that you have to be a developer in order to have this lifestyle doesn't hold any water.
With remote work becoming more and more acceptable in various data science related organizations, staying at cool destinations is a more appealing options. If you find yourself on that boat, Lisbon is definitely a place to consider, especially if you are big on cities with character and natural beautiful scenery, especially during the summer time.
Recently I attended JuliaCon 2018, a conference about the Julia language. There people talked about the various cool things the language has to offer and how it benefits the world (not just the scientific world but the other parts of the world too). Yet, as it often happens to open-minded conferences like this one, there are some unusual ideas and insights that float around during the more relaxed parts of the conference. One such thing was the Nim language (formerly known as Nimrod language, a very promising alternative to Julia), since one Julia user spoke very highly of it.
As I’m by no means married to this technology, I always explore alternatives to it, since my commitment is to science, not the tools for it. So, even though Julia was at an all-time high in terms of popularity that week, I found myself investigating the merits of Nim, partly out of curiosity and partly because it seemed like a more powerful language than the tools that dominate the data science scene these days.
I’m still investigating this language but so far I’ve found out various things about it that I believe they are worth sharing. First of all, Nim is like C but friendlier, so it’s basically a high-level language (much like Julia) that exhibits low-level language performance. This high performance stems from the fact that Nim code compiles to C, something unique for a high-level language.
Since I didn’t know about Nim before then, I thought that it was a Julia clone or something, but then I discovered that it was actually older than Julia (about 4 years, to be exact). So, how come few people have heard about it? Well, unlike Julia, Nim doesn’t have a large user community, nor is it backed up by a company. Therefore, progress in its code base is somewhat slower. Also, unlike Julia, it’s still in version 0.x (with x being 18 at the time of this writing). In other words, it’s not considered production ready.
Who cares though? If Nim is as powerful as it is shown to be, it could still be useful in data science and A.I., right? Well, theoretically yes, but I don’t see it happening soon. The reason is three-fold. First of all, there are not many libraries in that language and as data scientists love libraries, it’s hard for the language to be anyone’s favorite. Also, there isn’t a REPL yet, so for a Nim script to run you need to compile it first. Finally, Nim doesn’t integrate with popular IDEs such as Jupyter and Atom, and as data scientists love their IDEs, it’s quite difficult for Nim to win many professionals in our field without IDE integration.
Beyond these reasons, there are several more that make Nim an interesting but not particularly viable option for a data science / A.I. practitioner. Nevertheless, the language holds a lot of promise for various other applications and the fact that it’s been around for so long (esp. considering that it exists without a company to support its development) is quite commendable. What’s more, there is at least one book out there on the language, so there must be a market for it, albeit a quite niche one.
So, should you try Nim? Sure. After all, the latest release of it seems quite stable. Should you use it for data science or A.I. though? Well, unless you are really fond of developing data science / A.I. libraries from scratch, you may want to wait a bit.
The previous week has been intense as I was working on a part of the proposal for a new project, attending a conference, and figuring out some things about my publication-related endeavors. With all that in mind, it was natural that I didn’t post anything on the blog, even though I wanted to. However, as my focus is always on quality, I didn’t want to just publish a rushed post or a simple announcement. That’s why I waited until now to get a new post out.
The Event of the Decade
On 8/8/18 the new release of Julia came out. This wasn’t just any release though, but the big one: 1.0. It is really hard to overestimate the importance of this release, even if the most conservative Julia users still feel that it would take a few months before the full force of v. 1.0 will reach the world. After all, just because Julia is now production ready, it doesn’t mean that everyone using it can benefit from this the same way, since the packages people depend on may take some time before they are fully compatible with the new release. Nevertheless, those who prefer to rely on our own code primarily can experience the benefits of Julia right now. Whatever the case, the fact is that Julia has now entered a new era, since it has proven itself to be robust and even faster than ever before.
To give you an example of that, in the conference there was a talk about how Julia is applied in Robotics, via a specialized package some Robotics researcher developed recently. Even though this guy had worked with C++ before for the same project, he eventually shifted to Julia for the vast majority of the code, since it was good enough (i.e. sufficiently fast and reliable) to perform challenging optimization-related tasks in real-time. To be exact, the operations were 36% faster than real-time, enabling a robot operation frequency of 1000 Hz, at least in the simulations he was conducting. At the time of this writing, no other language has accomplished that, without having significant dependencies on C libraries.
Ramification of Version 1.0 in Data Science and A.I.
But how does all this affect us, as data science and A.I. professionals? Well, Julia isn’t evolving merely on the Base package or the fairly niche application of Robotics. In fact, there are now full-fledged packages that cover a variety of data science related applications, including deep learning models. In the conference there was a talk about the Knet package, for example, which is a deep learning package built entirely on Julia. Personally I don’t know any other deep learning tool that has been built entirely on a data science language (I don’t consider C++ to be such a language by the way, since data scientists tend to use high-level languages mainly). What’s more, this deep learning tool has comparative performance with other more established frameworks, while in one of the benchmarks it outperformed all of them.
But data science is not just deep learning. There is a significant part of it that has to do with more conventional methods, mainly deriving from Statistics. What about Julia’s role in all that? Well, Julia has a number of fairly mature packages in Stats, including Bayesian Stats. What’s more, there is a new book being written right now on Stats with Julia, by a couple of academics who teach Stats in a university in Australia. So, it’s safe to say that Julia is pretty evolved in this aspect of data science too.
More specialized parts of data science, such as Graph Analytics also have corresponding packages in Julia, while the LightGraphs package I talked about in my Julia for Data Science book, is still out there, now better than ever. Data engineering packages also exist, while there are several packages on optimization too, something data science can benefit from greatly, for the more challenging problems tackled.
From all this, I believe it’s fair to say that the age-old argument that “Julia is not ready for DS / A.I. because x, y, z” is now as ridiculous as the belief that the number of available libraries is what makes a language more suitable for data science. Sure, packages can help, but it’s mostly due to their quality, not their quantity, while how fast a language runs is an important factor when analyzing the truckload of factors in a modern data model. That’s not to say that Python, Scala, and other data science languages are not useful any more, but ignoring the value of Julia in the data science / A.I. arena is silly and to some extent unprofessional.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.