Question: I want to be able to uniquely
identify a file in Java program by its contents. I read the file and use java.utils.zip.CRC32 to
compute the checksum value.
Does exist possibility to have two files with different contents that return the
Answer: Java uses CRC32 which generates a 32-bit
checksum. There are only 2^32 possible values. So when you have more than 2^32
files you're guaranteed to have a collision.
The possible solution to increase security could be to use Message Digest -
technology based on MD5 or SHA.
Message digests are secure one-way hash functions that take arbitrary-sized data
and output a fixed-length hash value.
The size of Message digests is always the same, independent of the size or
content of the message from which it was created - 1024 bits.
Of course, exist possibility that two files can have the same Message Digests as
But the probability that two different files have the same checksums with CRC32
and the same message digests is negligibly low and probably (who can disprove?)
equals almost zero.
"Bulletproof" approach should be:
1. Check for CRC32 checksum for two files
2. If they are the same check the message digests...
96 comments | | Score: 4