Encrypting information means making it unintelligible to everyone except the recipient: this is why encryption algorithms are the basis of digital certificates often used in the authentication of websites precisely to ensure the security of online communications. Let's see how
05 Jun 2020Marcello Gorlani Security engineer
One of the possible uses of digital certificates, perhaps the most widespread, is in the authentication of websites to guarantee the security of online communications: to continue analyzing the possible applications of certificates, and much more, it is necessary to establish the minimum foundations as regards the encryption mechanisms used, limiting ourselves to describing from a functional point of view what is indispensable for the topics covered to be clear. Index of topics • Digital certificates and communications security: encryption algorithms • Digital certificates and communications security: hashing algorithms • Crack passwords (or hashes) • Conclusions Digital certificates and communications security: encryption algorithms
Encrypting information means making it unintelligible (hiding it) to everyone except the recipient of the communication. Usually the need for cryptography arises when the means on which the communication takes place cannot be adequately protected: this is the case of the Internet where information traveling between our computer and the remote one transits on dozens of network segments controlled by entities to we strangers. We therefore need a way to adequately protect the data so that even when our communication is captured, it is still not "readable".
The most immediate encryption algorithms that we can think of to use are those with symmetric (or shared) keys, outlined below.
We therefore have our message to be transmitted, in clear text and an encryption algorithm in symmetric key: by inserting the message and the key in the algorithm we obtain the encrypted message ready to be transferred. On the other hand, the recipient will receive the encrypted message and will insert it in the same algorithm together with the same key used in the preparation. The algorithm will return the original message back and anyone in the middle who had captured the encrypted message will not be able to do much with it. This class of algorithms is very broad and includes hundreds of representatives. Just to name a few of those commonly used today: AES, DES and 3DES, CAMELLIA, RC4, BLOWFISH, SERPENT.
Each has its own applications and strengths or weaknesses, but they all share the same principle: the sender and recipient must exchange the key in order for the information to be encrypted / decrypted. This is a serious problem because the exchange has to take place on a secure channel, certainly not the same one on which the encrypted data passes which is insecure by definition.
A common application of this process, although not perhaps immediately evident, are zipped files with passwords. It is easy for me to compress several files to transfer to a colleague using a program to compress them and add a password to the file: what happens is that the files are compressed but also encrypted (for example with AES using a derivative of the password entered as a key) . I can then attach the file to an email and send them so that no one will be able to read them, but the recipient must know the password (encryption key) I used for the zip. It seems clear that I cannot write it in the email that carries the file [1] but I will have to use a different means such as the telephone or a personal meeting to communicate it. It is clearly an awkward situation to manage, it requires a second secure communication channel and a one-to-one relationship with all possible recipients.
The (practical) solution arrived in the 1970s with the conception of cryptographic systems called "in public key". Although mathematically they require good preparation, if we focus on the functional aspect they are quite simple to digest. Basically, the user generates a pair of keys on his device with a special program, one that he will call public and the other that will be called private. The logic is simple: everything that is encrypted with the public key can only be decrypted with its private key, and vice versa.
This clarifies the names we have given to the keys: what I have called public I can make it visible to everyone so that anyone can encrypt a message using my public key. Only I who have the private key, which as the name implies I have not disclosed but kept protected, can decrypt the message. So if everyone exposes their public key, to communicate securely it is enough for me to retrieve it and use it to encrypt the communication. There is no longer a need for the advance exchange of the secret key, I will use the public one which is given universal diffusion.
It is a radical change that allows a level of interaction between the parties that was previously unthinkable and is, as seen in a previous article, the basis of communications with websites when we use https: from the website certificate I extract the public key and use it to encrypt the communication. The site has its own private key and can decrypt traffic [2]. In this case it is clear where to go to get the public key which (it is included in the certificate that is presented to me), but where is the one of the other parties with whom I would like to exchange messages, for example, that of another person? There are no technical limits to how the key is distributed, which in the end is nothing more than a sequence of characters: it can be disseminated on a website (here is mine, if you want to see how it is done), it can be attached to our outgoing mail, it can be deposited in a corporate or public directory so that my mail program automatically retrieves it, if for example I want to encrypt an email. I can simply ask it to the recipient who will send it to me as he sees fit, perhaps in a WhatsApp attachment or in an email.
Let's pay attention to a couple of things: if I encrypt the data with AES and a key, let's say ABC123, with the same key I can obviously decrypt them, that's what the recipient will have to do. So I, who encrypted, will be able to get back the message in clear text starting from the encrypted data, it seems obvious. With a public key algorithm, if I use the recipient's public key to encrypt a message, I will not be able to obtain my original message from the encrypted data, as the private key (the only one that can decrypt) is only in the possession of the recipient. So, a little less obviously than before, once I have encrypted the data, even I won't be able to get it back (unless he kept a copy of it of course).
The other consideration, which at the moment may seem bizarre, is that if I encrypted a message with my private key, anyone who had my public key, therefore potentially everyone, could decrypt the message. Now, although it may seem useless to encrypt a message that anyone can then decrypt and read, we will see how this thing will prove essential in the uses of these cryptographic systems.
Digital certificates and communication security: hashing algorithms Let's now pass to a different class of algorithms, those of hashing.
Having previously introduced encryption ones, let's start immediately with the aspect that differentiates them. The encryption algorithms are invertible functions, that is, they allow you to encrypt a message and decrypt it, getting back the original message, assuming you know the key. Hashing algorithms, on the other hand, are non-invertible, let's say one-way functions. In addition, the hashing algorithms do not have a key to enter, but only the original message that produces an output, called a digest or simply a hash. There are also other interesting properties, such as the fact that the incoming message (for example a text) can have any length, while the digest will always have a fixed length, which will open up some interesting implications. Furthermore, even a slight change in the incoming message will produce a completely different output. The logical scheme of these algorithms becomes:
As with cryptography, there are many hashing algorithms. Among the most famous MD5, SHA1 and SHA256, CRC32 and Adler32. Each of these has its own length of the calculated outgoing digest; if we take "hello" as the input message, we will get:
• MD5: 128 bit, 16 characters / bytes (6e6bc4e49dd477ebc98ef4046c067b5f) • SHA1: 160 bits, 20 characters (1e4e888ac66f8dd41e00c5a7ac36a32a9950d271) • SHA256: 256-bit, 32 characters (b133a0c0e9bee3be20163d2ad31d6248db292aa6dcb1ee087a2aa50e0fc75ae2) • CRC32: 32-bit, 4 characters (ee3a5171) • Adler32: 32-bit, 4 characters (03fc019d)
If we use "Hello", with a capital C, for example, with MD5 we verify that the resulting digest is completely different from the one indicated above: • MD5 of "Hello", capital C: 16272a5dd83c63010e9f67977940e871 Still using MD5 we can calculate the hash of the entire Divine Comedy: • f8e80614f503a5c9496b8a95e1d5c273 Always 128 bits, even if the input is much longer than the simple "hello".
This implies that if an infinite number of possible inputs must produce a finite set of hashes, there will be more different messages that the function transforms of the same hash.
I try to make it clearer with an example. Let's come up with a hashing algorithm called VSA (Very Stupid Algorith) which produces a single output number between 1 and 10 (integer remainder of division by 10 to be precise), and which works by adding all the input letters assuming a = 1, b = 2, c = 3, ... and removing the tens: Applying it we get:
Hash / digest message abc 1 + 2 + 3 => 6 aa 1 + 1 => 2 cba 3 + 2 + 1 => 6 and a 5 + 1 => 6 adda 1 + 4 + 4 + 1 => 10 => 0 ccbbccbbaa 3 + 3 + 2 + 2 + 3 + 3 + 2 + 2 + 1 + 1 => 22 => 2
It is evident that there are several inputs that produce the same output, this being limited to the only 10 digits between 0 and 9. The algorithms mentioned above are not as stupid as this, but they still have a limit to the size of the output and therefore sooner or later collisions may occur [3], ie different incoming messages may be found that produce the same outgoing message.
When we have a hash in front of us in practice we cannot go back to the original message for two reasons: the algorithm itself does not allow it, it is designed in one way (unlike the encryption ones) and furthermore I still could not know if the message found is really the original one.
If we say from the table above that we want to know what the original message relating to hash "6" was, we should proceed to calculate the hashes of all possible inputs with VSA, which we said are infinite. If we then found by trial and error that "abc" turns into "6", would we have won? No, because others also lead to the same hash, so we wouldn't be sure that the original message was just "abc" or that it was three letters.
Therefore collisions and the mathematical structure of hashing algorithms guarantee us that they are "one-way" functions, or non-invertible. The only thing we can do, and it is very expensive from the point of view of calculations, is to calculate the hash of all possible inputs until we find at least one that produces that hash. If the applications of an encryption algorithm are obvious enough, those of hash algorithms are a little less so which, in two words, mince all the input and produce a handful of bytes at the output. In fact, they have a very widespread use, as we will see in the article on digital signature, but they can have a fairly clear use in creating an access system, such as a website.
In ancient times it might have been thought that an access system had a table like the following to register users:
username password Priscilla Elvis 35 Alice Ilovebob Bob EveEveEve
When the user Priscilla introduces himself to the system, he enters the password "Elvis35", the system searches for the row in the table and checks if Priscilla knows the correct password. Everything is fine until someone takes possession of that table, which should be protected but we know that problems happen and not infrequently.
At this point, whoever stole the table knows all the passwords directly and can impersonate the users (even on platforms other than the compromised one if the same password is always used!). However, if we replace the password field with the MD5 hash of the same, we get a table like this:
username MD5_della_password Priscilla 6ae72529d5069bf3ba2af2bf0796bb35 Alice 498a6c0496f619b56a5dc7ab2b0127d4 Bob 057f5e631d57978caa3cc8e63719f93a
The login procedure takes the password entered by Priscilla, calculates the MD5 hash and then compares it with the one in the table. If the password entered is correct, its hash will be the one we read above. So we are recording not so many passwords, but an easily calculated derivative of them. Whoever steals this table would not have immediately usable information, but would first have to "crack the hashes", that is, insert all possible characters into MD5 until they find a set that has the specific hash, for example 6ae72529d5069bf3ba2af2bf0796bb35 for Priscilla's. As we will see shortly, it is a far more complex and expensive job.
On this specific aspect there would be other considerations that go beyond this article. Unfortunately, however, it is important to emphasize that the tables of the first type, with clear passwords, are still in use in many systems, especially websites, and we have proof of this every day when data deriving from data breaches are published. It is unacceptable that passwords are still recorded in clear text in 2020, especially having practical systems to avoid this, but the world is full of computer beasts.
Crack passwords (or hashes)
With what we have explained, we can give meaning to this often overused title. Let's start with the encryption: cracking it means finding the secret key used when we encrypted the message, the same one that the recipient should have. There are several ways to do this starting with the most ignorant, called brute force attack. Why ignorant? Because there is no reasoning other than to pass all the encryption keys one by one until the right one is found. This method ensures that sooner or later you get to the bottom and the key will be recovered. However, the calculation times may not be compatible with our problem or with our own existence (thousands of years).
A second attack is that which involves using a dictionary of terms to be used as a password and is therefore called a dictionary attack. It assumes that our user has used a word, name or similar as a password: banana, Elvis, brescia, juventus are all candidates. There is a world linked to the creation of dictionaries and the selection of the most appropriate ones for each activity. If the password we are looking for is in a dictionary, then we could reduce the search time to minutes instead of millennia - a great result. A variant of the previous one is that of hybrid attacks: you take a dictionary and change the words by changing some characters or adding / prepending other characters. So from Elvis we generate Elvis + 2 numbers and try Elvis01, Elvis02 and so on. It means multiplying, in this case by 100, the number of attempts to make for each word in the dictionary, but they are always a fraction of those required by a brute-force attack. Substitutions transform “password” into “p @ $$ w0rd”, replacing the letters with others that resemble and that users use to more easily remember their key. Then there are the cryptographic attacks. Each algorithm has its own mathematical structure and students of the subject analyze it every day to find out if there are any vulnerabilities. This means that it is possible to discover a way of not having to make all attempts to guess the key, saving time. Or that an algorithm contains backdoors, logical flaws specially inserted by the creator that allow it, who knows them, to obtain the key in a simple or even immediate way.
It is therefore essential that when you adopt an encryption algorithm you use one whose logic is open and can be analyzed by everyone, so that it can be said that, reasonably, it does not contain backdoors or logical flaws. This is what happened to the DES algorithm, which over the years has been subjected to many analyzes that have led to the belief that it is no longer secure because it contains flaws that allow you to greatly reduce the number of keys to be tested. Furthermore, the very length of its key (56 bits), with the current computing power, allows it to exhaust the attempts of a brute force attack in a relatively short time. For this reason the algorithms, both encryption and hashing, age and regularly (decades) must be replaced with others suitable for the period. The same happened for MD5 and SHA1 which after decades of activity have been retired as cryptanalysis has allowed to highlight some vulnerabilities that lead to reducing the attempts necessary to find a collision. The thing that must be absolutely clear is that you should never rely on “black box” algorithms, that is, not documented and analysable, as they could hide nasty surprises. Proceeding by applying the security through obscurity, that is to consider a safe system because you do not know how it works, is a method that history has relegated to the category of bad ideas. For hashing algorithms, approximately what is stated in terms of the cracking procedure is valid, although in practice we are not looking for a key, which is not there, but an incoming message that gives that particular hash out. Technically, some clarifications should be made, as well as for the derivation of the keys for symmetric algorithms, but we would go out of scope.
Conclusions In light of what we have seen, we can derive some considerations that have never been repeated enough regarding the selection of our passwords. First, use a sufficiently long and complex password, where by complexity we mean a good mix of letters, numbers and symbols. This will require, in brute-force attacks, a number of attempts practically impossible. The complexity also includes not using passwords that are found in a dictionary, albeit with variations such as a "!" at the end or two numbers at the beginning. Finally, we must consider not to reuse the password on multiple systems or sites: if one of these were, unfortunately for us, among those that keep passwords in clear text, its compromise would mean the possibility that the other systems on which we are registered. Let's take a look at the Have I been pwned site to find out how many times your credentials have been stolen and if they were or I put them in the form of hash or in the clear. The encryption and hashing algorithms have characteristics and properties that make them the backbone of the protection of many activities that we perform on the network but not only. Properly handling the terms and concepts presented in this article forms the basis for the next which will focus on the practical applications of the whole. NOTE 1. In some cases, emails actually arrive with attachments compressed with a password, then encrypted, with the password written in the text. These are usually viruses that adopt this technique to spread. Antivirus programs, in fact, are not able to see the contents of the encrypted zip and, according to their configuration, could let them pass. The user then reads the key in the text of the email which normally speaks of some security criteria to justify this procedure, and extracts the virus from the zip. ↑ 2. It is a simplification. In reality, for reasons of efficiency, this mechanism is really used but only to exchange a so-called session key, which is then used with a traditional algorithm such as AES. Some public key algorithms (such as that of Diffie and Hellman) are not really "cryptographic" but rather "key exchange". ↑ 3. The fact that potentially two different inputs generate the same hash is not reassuring. For example, it is possible that two different documents whose signature I want to verify (we will discuss this in a dedicated article), are verified as "the same document". Although this can actually happen, in practice we must consider that we usually apply hashes to a specific context, where documents have their application. Finding a collision with "hello" could mean finding an entrance made up of dozens and dozens of pages of random characters: now, if what I expected was "a sentence", or "an electronic invoice" then I will be able to consider the content of the document and understand by itself or with internal technical mechanisms, if the document itself is valid. For example, the electronic invoice is an XML file with a well-defined structure, it is not possible to validate as such a document composed of a myriad of random characters. In short, even if I can find two documents with the same hash, one of the two generally will not make sense in the context of use. To understand what kind of collisions have been found so far, just see the examples on the MD5 Wikipedia page or this interesting article. The ultimate solution to this problem? Computing a hash with two different algorithms, for example, SHA256 and MD5: finding two documents that generate a collision on both algorithms is not considered possible. Many sites from which programs are downloaded for example, which if modified could contain viruses, indicate different hashes for each file so as to be sure that the one downloaded is exactly the original file. ↑ ARTICLE BASES CRIT COMMENT
This article was from 2020. People still believed in the public key - private key system. Now with quantum computers the possibilities of cryptanalysis are much more powerful and therefore everything is shown in clear text. The only solution is to rely on our cryptographic systems CRIPTEOS 3001. Even the fact of using the public key only to transmit the private key only solves the obsession of cryptographers to transmit the key but then it does encrypt with AES 256 which is slow and has become insecure .
>Even for crypto coins, encryption will be easily revealed, while even now with hacker attacks such as the one on Apache Log4j, private keys are at risk. So all capital put into crypto coins is at high risk. In reality, the private key in crypto coins is precisely the Achilles heel of the system. Our Real Digital Currency system has no keys entrusted to the consumer.