OpenSubtitles Hash (OSHash)

Implementation reference, verified test suite & security analysis

Algorithm

hash = file_size + sum_uint64_le(first_64KB) + sum_uint64_le(last_64KB)

The hash calculates: file size + 64-bit checksum of the first and last 64 KB (even if they overlap because the file is smaller than 128 KB). All arithmetic is unsigned 64-bit with natural overflow (wrapping). Data is read as little-endian uint64 values (8192 values per chunk).

File size requirement: minimum 131,072 bytes (128 KB). Files smaller than this cannot be hashed.

History

~2004
Origin in Media Player Classic. The hash algorithm was first implemented in Media Player Classic (MPC-HC), an open-source media player for Windows created by Gabest (Guliverkli project). The algorithm was designed to quickly identify video files for automatic subtitle matching — prioritizing speed over collision resistance. The name "Gibest hash" (sometimes "Gabest hash") comes from this origin. The original C++ implementation can be found in SubtitlesProvidersUtils.cpp (the GenerateOSHash function).
2006
Adopted by OpenSubtitles.org. OpenSubtitles adopted the hash as the primary file identification method for their XML-RPC API. It became the standard way for media players and subtitle tools to look up subtitles automatically. The hash, combined with file size, creates a lookup key in the OpenSubtitles database. The original hash source codes wiki page collected community implementations.
2006–2020
Widespread adoption. Implementations appeared in dozens of languages. Media players like VLC, Kodi, Plex, Stremio, and many subtitle tools (e.g. Bazarr, Sublight) integrated the hash for automatic subtitle downloads. The algorithm's simplicity made it easy to port, though some early implementations contained bugs (especially around 64-bit overflow handling in PHP, JavaScript, and Perl).
2023+
OpenSubtitles REST API (v2). The newer REST API continues to support OSHash-based lookups alongside moviehash. The hash remains relevant for backwards compatibility and is still the fastest identification method for local files.

Test Vectors

File Size (bytes) Expected Hash Download
breakdance.avi 12,909,756 8e245d9679d31e12 Download (12.3 MB)
dummy.rar 4,295,033,890 61f7751fc2a72bfb Download RAR (2.4 MB)
Unpack to get the 4 GB test file

Always verify your implementation against the breakdance.avi test file before deployment. Many implementations floating around the internet have subtle bugs, especially with 64-bit overflow handling.

Limitations & Security

Not Cryptographic

OSHash is not a cryptographic hash. It was designed for speed, not security. Do not use it for integrity verification, authentication, or any security-sensitive purpose. Use SHA-256 or BLAKE3 for those.

Trivial Collisions

Two files with the same size, same first 64 KB, and same last 64 KB will produce the same hash, regardless of what's in the middle. This means ~99.99% of the file content is not hashed for typical video files.

Hash Forgery

An attacker can craft a file with any desired hash by manipulating the first or last 64 KB. Since the hash is just addition, finding a preimage is trivial arithmetic, not a computational puzzle. You can also transplant the head/tail of one file onto another.

Second Preimage Attack

Given a file and its hash, creating a different file with the same hash is trivial: keep the same size, copy the first and last 64 KB, and put anything in the middle. This makes the hash unsuitable for verifying file authenticity.

Appropriate Use Cases

OSHash is well-suited for: subtitle database lookups, media library deduplication (combined with file size), and quick file identification in trusted environments. Its O(1) read cost (always 128 KB regardless of file size) is its main advantage.

Performance Profile

Only reads 128 KB total, regardless of file size. Hashing a 50 GB file takes the same time as hashing a 200 KB file. No CPU-intensive cryptographic operations — just integer addition. Typically completes in under 1 ms for local files.

Implementations