Algorithm
hash = file_size + sum_uint64_le(first_64KB) + sum_uint64_le(last_64KB)
The hash calculates: file size + 64-bit checksum of the first and last 64 KB
(even if they overlap because the file is smaller than 128 KB).
All arithmetic is unsigned 64-bit with natural overflow (wrapping).
Data is read as little-endian uint64 values (8192 values per chunk).
File size requirement: minimum 131,072 bytes (128 KB). Files smaller than this cannot be hashed.
History
GenerateOSHash function).
Test Vectors
| File | Size (bytes) | Expected Hash | Download |
|---|---|---|---|
breakdance.avi |
12,909,756 | 8e245d9679d31e12 |
Download (12.3 MB) |
dummy.rar |
4,295,033,890 | 61f7751fc2a72bfb |
Download RAR (2.4 MB) Unpack to get the 4 GB test file |
Always verify your implementation against the breakdance.avi test file before deployment.
Many implementations floating around the internet have subtle bugs, especially with 64-bit overflow handling.
Limitations & Security
Not Cryptographic
OSHash is not a cryptographic hash. It was designed for speed, not security. Do not use it for integrity verification, authentication, or any security-sensitive purpose. Use SHA-256 or BLAKE3 for those.
Trivial Collisions
Two files with the same size, same first 64 KB, and same last 64 KB will produce the same hash, regardless of what's in the middle. This means ~99.99% of the file content is not hashed for typical video files.
Hash Forgery
An attacker can craft a file with any desired hash by manipulating the first or last 64 KB. Since the hash is just addition, finding a preimage is trivial arithmetic, not a computational puzzle. You can also transplant the head/tail of one file onto another.
Second Preimage Attack
Given a file and its hash, creating a different file with the same hash is trivial: keep the same size, copy the first and last 64 KB, and put anything in the middle. This makes the hash unsuitable for verifying file authenticity.
Appropriate Use Cases
OSHash is well-suited for: subtitle database lookups, media library deduplication (combined with file size), and quick file identification in trusted environments. Its O(1) read cost (always 128 KB regardless of file size) is its main advantage.
Performance Profile
Only reads 128 KB total, regardless of file size. Hashing a 50 GB file takes the same time as hashing a 200 KB file. No CPU-intensive cryptographic operations — just integer addition. Typically completes in under 1 ms for local files.