(image taken from thearmitageeffect.wordpress.com)
We covered the Thunderstorm Coding Sytem (TCS) in a previous post. Now, let’s look at it from a different angle. After all, that’s the foxy thing to do, esp. if you are serious about establishing its efficiency. So, what would happen if we tried to employ DS in order to perform cryptanalysis on a file that has been coded with this system?
The short answer is “nothing”. Even though data science can unravel the most obscure mysteries lurking in the data, penetrating all the noise so that whatever signal that’s hidden in it can be made apparent, it doesn’t have a chance against an encryption system like TCS. The reason is simple; TCS, like other modern encryption system employs a combination of scrambling and shuffling when it encrypts the data of the file. Also, the way this is done is quite chaotic (though reversable). If there is a recoverable signal in the encrypted file that could pinpoint to the plaintext, it would take an insanely large number of features (enough to warrant the need for a supercomputer the size of a Google’s data center) to just be able to model the problem. Whether all these features would be enough to render a useful data model, however, is another story. Such a task would require an insanely large amount of computing power (another Google data center OR the use of a quantum compturer the likes of which we have never seen), to be able to reduce the dimensionality of the feature set to something manageable and dense enough in terms of information, that we can feed into a machine learning system.
“What about AI?” I can hear you asking. Well, AI is very powerful in data science, but even the most sophisticated AI systems (I.e. deep learning networks) are not a panacea. For them to be effective they need a lot of data, more than the cyphertext of the file we are given to decrtypt. Also, should we attempt to apply a deep network on the aforementioned feature set and allow it to derive the best meta-features that can effectively solve the problem, we would need even more computing power (probably more that is available on a conventional cloud) to get this system to avoid halting.
Of course, that’s all assuming that there is some kind of repetition in the key used for the encryption. If the key is large enough, not even all the computing power in the world could decrypt the file! That’s not to say that nothing would come out of such an endeavor. In fact, I would recommend you give it a shot, even if it is practically hopeless. Experiments like these can help one realise the limitations of the data science discipline and gain a better sense of perspective on what is and what isn’t possible (leading to a sense of humility, a rare quality among the professionals of our craft!).
We all like to think that everything is possible, given enough memory and enough computing power. However, unless you are part of a Hollywood movie, this just doesn’t hold true. Some problems are just too hard to be solved (look into the various cyphers that have remain uncracked over the years if you don’t believe me). We’ll just need to find a way to be comfortable with that and focus on things that make business sense, such as deriving actionable insights from large amounts of data.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.