GitHub uses the SHA-256 hash function to generate a unique identifier for each commit. This identifier is a critical part of how Git tracks changes in a repository. Here’s how GitHub calculates the SHA-256 hash for a commit:
- Commit Content: The commit object itself contains various pieces of information:
- The hash of the commit tree.
- The parent commit hash (or hashes for merge commits).
- The author’s name and email.
- The committer’s name and email.
- The commit message.
- A timestamp.
- Header: Before calculating the hash, Git prepends a header to the commit object. This header includes the word
commit
followed by the size of the commit object (excluding the header itself). - Hashing: The combined header and commit object are then fed into the SHA-256 hashing algorithm. The result is a 256-bit (32-byte) hash value.
The resulting SHA-256 hash acts as a unique fingerprint for that specific commit, considering all its content and metadata. This ensures that even the smallest change in the commit (like altering the commit message) will produce a completely different hash, which helps in maintaining the integrity and history of the repository.