Tokenization

Hello all! After a while of absence, I decided to publish another post in which I explain the principle of tokenization. It came across when I was going through my Security+ notes and at first glance, I must say, the definition sounded easy but I still couldn't really picture when or how this approach was used. To make it easier for me and especially for you, let's take look at this principle and what it is all about.
The concept of the tokenization approach is that all data or at least part of it is replaced with a randomly generated token. That token is then stored with the original value on a token server, separate from the production database. It is a form of data masking.
Data masking is when you have sensitive data, which is manipulated. You change several sources of the data so it cannot be traced back to the original dataset. I remember when I was watching action movies and the actor got ahold of sensitive information from some special agents. You could always see that there were pieces of information on the documents that were either completely erased or covered with black paint. Data masking is the exact same, but instead of erasing or covering parts, you alter the data in a way that matches the characteristics of the original data. Now, coming back to Tokenization, it is a form of masking data. Not only is a masked version of the data created, but it also stores the original data in a secure location, also called a token vault. As mentioned earlier, by doing this, the data tokens that have been created can't be traced back to the original dataset, but they still provide access to the original data.
Now that we provided a better insight into how the general concept of tokenization works, let's break it down one by one.
What is a token?
When you exchange sensitive data for non-sensitive data, you basically use something called a "token". Tokens might be unrelated to the data they replaced, but they still include certain elements of the original pieces such as the format or length. For someone that is familiar with cryptography, you might start comparing it to encrypted data. But, there is a big difference here. Data that has been tokenized is not only irreversible but undecipherable as well. There is no real connection or formula between the token and the original. Therefore, they can't be returned to their original form without having the original data piece at hand.
You might understand now that even if the dataset with the tokens has had a data breach, the attackers won't be able to get ahold of the original data. The tokens have no real value. They are just being used as a placeholder. A perfect comparison would be Poker. The Poker chips are placeholders as well. Even when you steal the chips, they aren't worth anything, since you have to hand them in in exchange for cash. They are placeholders for money. The original data is usually stored outside of the company's internal system, which makes it even harder to access for hackers.
What are the benefits of Tokenization?
The most important thing is that tokenization can safeguard sensitive data and therefore, not expose the organization to the risk of saving it in their internal systems. As we all know, cyberattacks on businesses are commonplace and never as frequent as before. This is why it's so dangerous to keep all this information in your internal network. If there is a severe data breach, all this information on consumers and businesses will be exposed. You can picture that if you're dealing with confidential and highly sensitive data within your company, you will have to install security measures so that you can transfer this information without someone intercepting them. By using tokens, the transfer of information is easier since it is a less resource-intensive process than encryption. Payment processing is much more easier and quicker and the risk of exposing all the sensitive information when getting hacked is reduced dramatically.
How does it work?
But in general, what is the sensitive information we are talking about here? This data ranges from credit card information to bank account data and payment processes. You probably even used this technology before. Apple has implemented the wallet on iOS. You can save your data on the phone and then just pay with the data saved on your device instead of using the actual card. Android Pay exists as well.
There are 3 ways how tokens can be created:
- The use of a mathematically reversible cryptographic function. This includes a key and is not stored in a token vault
- The use of a nonreversible function (hash)
- Randomly generating numbers or using an index function
So, when you decide to go to the supermarket and pay your groceries with Android/Apple Pay, the data that is being provided is substitued with a randomly generated token. This will be generated by the payment gateway of the merchant. The information that includes the tokens is then encrypted and sent to a payment processor. As mentioned beore, the original data is, in this case the payment information, is stored in a token vault in the merchant's payment gateway. This is the only place where a token can be mapped to the information it represents. At last, the tokenized information is encrypted again by the payment processor before being sent for final verification.