Organizations subject to PCI DSS compliance validation spend significant amounts of time, effort, and money to maintain and validate their compliance. So, the idea that a common graphics card can threaten compliance or lead to a compromise may at first seem ridiculous. This article will show you why it is not as ridiculous as it seems, and what you can do about it.
This article explores the use of hashing in the context of PCI using examples and results from our experiments to guide you. Each section is explained in detail with highlighted takeaways. And as always, we provide supporting references.
Takeaway: The growth of GPU power and crypto-currency mining technology has made simple hashes of PAN totally insecure even with no known digits. The decade old anti-correlation guidance provides no protection, and the use of simple hashes should be deprecated. Treating hashes of PAN as out-of-scope for PCI DSS is not effective and puts your organization at risk. If you are using credit card hashing in your business or applications, you should review this article in detail and conduct a risk analysis immediately.
- What is a hash?
- Hashing and PCI DSS
- Cracking and Correlating Hashes
- A Nightmare Scenario
- Safe and Unsafe Use Cases
- Migrating Away from Hashes of PAN
What is a Hash?
A hash is a just a large number that stands in as a signature for other, often sensitive, data. Hashes are calculated by a complex “one-way” function that takes an input of any length (e.g. a credit card, a password, a program file, or a document) and calculates a number called a signature. The mathematics is closely related to encryption.
Some things to know about hashes include that the signature is always the same length, the same input always produces the exact same signature, tiny changes in input create very different signatures, the chances of two inputs having the same signature (a collision) is incredibly small, and lastly you can’t undo a hash (that’s the one-way part) like you can encryption.
There are a large variety of different hashing algorithms in common use today (the popular cracking suite, hashcat, easily supports over 100 variations). Hashes, unlike encryption algorithms and protocols, have not been updated by industry. Quite a number of these, such as MD5 and SHA1 are no longer safe, yet they are still used in commercial products.
Hashes can be used to detect changes (e.g. file or message integrity), to validate that a user knows a password (without having to store it or send it over a network), and to “render cardholder data unreadable anywhere it is stored” under PCI DSS.
Takeaway: While you can’t undo a hash, you can often achieve the same thing by guessing the plain-text inputs. This works when there are not too many inputs and the attacker’s computers are fast. Brute force guessing, or "cracking", is commonly used to recover passwords. We will show you that credit card numbers are small enough.
Hashing and PCI DSS
Beginning with DSS v1.0 in 2004, requirement 3.4 introduced the concept of rendering cardholder data “unreadable” using: one-way hashes, truncation, index tokens, or strong cryptography.
Many organizations seized upon this to simplify their compliance. The idea being to remove the data from the scrutiny of PCI DSS. People reasoned that because hashes were one-way, the process would make the data safe. Then the resulting data sets could be de-scoped, exported, and would no longer needs the strong protections of PCI DSS.
Most people can’t tell if a hashing implementation is secure or not. Security, especially hashing security, isn’t immediately obvious and can be tricky to get right. As a result, many applications of card hashing are flawed to this day.
Once companies began to adapt hashing as a strategy to de-scope data, the need for additional guidance started to emerge.
In 2010 PCI DSS v2.0 added a clarifying note to requirement 3.4:
It is a relatively trivial effort for a malicious individual to reconstruct original PAN data if they have access to both the truncated and hashed version of a PAN. Where hashed and truncated versions of the same PAN are present in an entity‘s environment, additional controls should be in place to ensure that the hashed and truncated versions cannot be correlated to reconstruct the original PAN.
And in 2015 PCI DSS V3.1 added an additional sub-requirement to ensure the note was not overlooked:
3.4.e If hashed and truncated versions of the same PAN are present in the environment, examine implemented controls to verify that the hashed and truncated versions cannot be correlated to reconstruct the original PAN.
In 2009, FAQ#1089 tried to address the intent of hashing. While the guidance is dated and could use an update, it includes the following (italics added):
PCI DSS Requirement 3.4 also states that the hash must be strong and one-way. This implies that the algorithm must use strong cryptography (e.g. collisions would not occur frequently) and the hash cannot be recovered or easily determined during an attack. It is also a recommended practice, but not specified requirement, that a salt be included.
Our demonstration (below) clearly shows than PAN can be easily recovered from simple hashes (using minimal or no correlation with truncated PAN) and no longer meets the intent of the PCI DSS.
The future of hashing in PCI DSS is unclear:
- In 2019, PCI introduced the Secure Software Standard v1.0 to replace the aging PA-DSS. This standard no longer allows hashing as a method of rendering cardholder data unreadable!
- PCI DSS 4.0 is currently in development and expected to publish in late 2021, however, the limited amount that has been made public does not include hashing.
- A general FAQ could be added or revised at any time.
Takeaway: The current guidance on hashing has remained unchanged for almost five years. PCI requirements and guidance continues to evolve along with threats and risks. New guidance is possible at any time even if new requirements must wait for updates.
Cracking and "Correlating" Hashes
Hash cracking isn't breaking the cryptography and reversing the "one-way" hash. Cracking commonly uses powerful computer components called Graphical Processing Units (GPUs) to generate and hash the long lists of "guesses" then correlate these with real hashed data. The goal is to recover the original input plaintext values by brute force. When this succeeds, the flaw is not the hash algorithm but the length and complexity of the message.
Demonstration by Experiment
We conducted an experiment to demonstrate how easy is it to "crack" and "correlate" hashed PAN using the industry standard tool hashcat.
First, we took a pair of well known test card numbers and calculated their hashes.
We chose the cards so the first would be found and the second would not and force a full search.
Next, we ran the following "hashcat" commands for the brute force tests on several GPUs:
hashcat64.exe --potfile-disable -m1400 -a3 -D2 -w3 HASHFILE 411111?d?d?d?d?d?d1111 hashcat64.exe --potfile-disable -m1400 -a3 -D2 -w3 HASHFILE ?d?d?d?d?d?d?d?d?d?d?d?d1111
The table below shows how fast a single modern GPU (a 2080Ti - see banner photo) can crack the hash and recover the card.
||5469.1 to 6557.2 MH/s
||17 to 21 d (estimated)
The real power of hash cracking is not that you can recover just one PAN per search but that you can recover every PAN in the same time.
To demonstrate this, we ran a second test (#2) on 1000 hashes, the two above, and the rest starting "411111" generated at random.
hashcat64.exe --potfile-disable -m1400 -a3 -D2 -w3 HASHFILE 411111?d?d?d?d?d?d?d?d?d?d
The 2080Ti didn't even get close to full speed and still recovered the 999 PANs in 43 seconds.
- We made no attempt to optimize or customize the run. Credit cards start with the digits 2 through 6 and pass Luhn checks which could cut the time by up to a factor of 20.
- The hash rate above is only useful to show if the GPU got up to speed. It took test #3 for the GPU to get close to its full speed (benchmark average of 6557.2 MH/s).
- The time shown is elapsed and includes time for the GPU to spin up. Expected calculation time is shown in parenthesis where the GPU was underutilized.
- The problem can be easily broken down into parts and distributed across multiple computers/GPUs.
- Far more powerful cracking rigs are easily available. Custom rigs like the "Brutalis" can be purchased, banks of Cloud based GPUs can be rented, even massively parallel AWS Lambda instances could be used, and botnets are available to criminals.
- Targeted attacks going after specific BIN ranges are possible in minutes even on single laptops.
- Moving to a newer hash like SHA3 doesn't help much. While SHA3 is slower than SHA2, it isn't significantly slower.
- High work factor hashes, such as PBKDF2 and AES CRYPT, are significantly slower but are still vulnerable to smaller searches line #1 & #2 above.
- While GPUs are now used for most cracking, Test #1 is well within the reach of even 20 year old CPUs, and a modern CPU will still recover PAN before your finger finishes pressing the button.
- Much faster speeds are possible if someone felt it worth developing ASIC rigs (like Bitcoin miners).
Takeaway: Given all of this, it's feasible to brute force every possible 16-digit PAN in one day with little more than a good gaming PC and some extra work. Selective cracking is not only feasible but trivial.
A Nightmare Scenario
Given what you now know, imagine the following:
- Back in the day your organization implemented this kind of hash of PAN to support analytics. Marketing made lots of uncontrolled copies thinking the data was safe.
- When 3.4.e was introduced, in an abundance of caution, your teams went to great lengths to remove all of the truncated PAN from the analytics files.
- Criminals break into your network or your marketing department falls victim to ransomware. The criminals are blocked from your CDE but exfiltrate an analytics database with 50M credit card hashes.
- Your incident response team could easily overlook the risk if they think the data is "unreadable" and "irreversible".
- If the criminals discover what they have, your company could soon be in the news as the most recent member of the top 10 credit card breaches of all time club.
Hashing of PAN is not the Silver Bullet that many people thought would let them escape PCI DSS scope and requirements. Most of the use cases for it are either unsafe or require as much (or more) to do securely as encryption does. Tokens and index-pads are often better suited in applications, especially where long term storage is a requirement.
The following fail to meet the intent of PCI DSS:
- Simple hashing of PAN - see above.
- Hashing of PAN with an extra static "secret" (This is often incorrectly referred to a salting). Adding a secret string securely is harder to do well than encryption. The secret must be protected even more so than cryptographic keys (i.e. DSS 3.5 and 3.6). Also, with no standard mechanism for storing or assessing the secret they are often hardcoded in programs. Even if the secret is kept, an attacker may be able to brute force the secret using known inputs. They might get lucky by trying lists of test cards or they might be able to poison the database with known cards. And if the string is ever compromised, the hashes can't be reversed to change the secret.
- Additional permutations or manipulations of PAN before hashing. This just creates a variant of the hash which must be kept secret. If the source code gets out or the program executable is reverse engineered, then we're back to simple hashing.
- Any use of MD5 or SHA1 even with seemingly reasonable levels of salting.
These can still meet the intent of PCI DSS but can be more burdensome to implement correctly and may be more restricted than other options.
- Hashing with "salt" (i.e. data unique to each individual card). Adding unique information, even known information, to a hash is like Kryptonite to cracking rigs. Done properly, it means the cracker must make a new pass for every card. Our experiment looking for "411111" cards that found 999 in 43 seconds would now take several hours. There are still challenges such as, where to get the salt, getting enough "entropy" or extra differentiating information, and getting that extra information consistently. Even if you can make this work changing the hash is now even harder than before unless you squirreled away the PAN in a vault - in which case you are back to encryption and either tokens or index-pads.
- HMAC's are a special construction using a hash function in combination with a cryptographic key. Of course, the key needs to be securely stored (i.e. DSS 3.5 & 3.6 or secure cryptographic device like a PTS POI or HSM). The HMAC is also difficult to change if the key is compromised.
These also work but are equivalent to other methods of protecting card data.
- Hashing with already truncated card data. We've encountered solutions that hash truncated PAN with additional non-sensitive data elements (e.g. EMV tags) as an alternative.
- Encrypting the hash is also possible
Migrating Away from Hashes of PAN
Migrating away from hashes of PAN is a potentially large project. The first step is to conduct your risk analysis. This should consider how secure your current solution is versus if you think it is (or will remain) compliant. You may need assistance from your trusted advisor.
Regardless of your organization's use cases, the planning phases should be similar:
- Look at the business use case so you correctly understand the requirements as this will be critical to the success of any solution
- Map the data flows
- Discover where the data is
- Watch for changes in standards, guidance, and potential breaches
Many solutions are possible, and each will have unique challenges. This phase will be highly dependent on your businesses use cases and requirements. For example, a marketing analytics database will be potentially quite different from a distributed hot-card list or an in/out token used to connect pre-authorizations and completions. Again, depending on the solution and your objectives, you may require assistance from your trusted advisors and QSA's. Some possible approaches:
- Extend PCI DSS Scope to cover the hash databases. This is likely only feasible if the data is already well contained.
- Use strong encryption on the existing hashes. If properly implemented the encrypted hash could be considered out-of-scope.
- Replace the hashes with a token or index-pad which would require a service provider or on-premise vault subject to PCI DSS.
We should caution that there may be open compliance and security questions:
- If the solution allows the data to be taken out of scope?
- Will there be a requirement to know where the new data is?
- Will there be requirements to allow the data to be updated in the future?
- What clarifications or guidance will emerge?