GIT has become the de-facto version control system, but it can get complicated quickly. A look under-the-hood can help with day-to-day use and file recovery. This article explores the files in the object database laying a foundation for more advanced use.
HOW DOES GIT STORE OBJECTS?
GIT stores information in the hidden .git
folder in the root of the project. The folder is created when a repository is initialized using git init
Commits, trees and blobs are the fundamental objects in GIT
They are stored in the .git/objects folder:
- Commits are a point in time reference to a tree
- Trees represent folders
- Blobs represent files*
*Blobs can also represent “hunks” (chunks of a file), but thats a more advanced topic for another article
WHAT’S WITH ALL THE SHA1 HASHES?
GIT creates a SHA1 checksum for each object and stores them in files under the.git/objects folder. The files are named after the SHA1 hash which means objects in the database are immutable - they cannot change. Modified files always result in new objects with a new hash, rather than updating the existing.
GIT uses the hash values to determine which files have been modfied during a commit. New and modified files are added as new blobs. Unchanged files are just referenced, keeping the existing blob.
To avoid storing everything in one folder, git creates subfolders under .git/objects. The subfolder folder names are the first two characters of the SHA1 hash and the objects are grouped in these subfolders. The filename in the subfolder is the remaining characters of the hash.
For example:
❯ ls .git/objects
00
00/24a57c6cee77755693e0514f244b1cfa5e645d
00/5b63f2cf1d596fa3f88834b98272a9d1bf9fc3
00/f823e0b5420e1051c80e0b37922409125e9156
01
01/28a8cb2ac88861ec18599c0b05f9481bdd3600
01/8c65d03d8269df96a7da4c3de1a62cd1d1c0ab
02
02/188f346460de1876df7dac2669360396f84a58
In the above example, there are three subfolders under .git/objects, called “00”, “01” and “02”.
The full SHA1 hash of an object is constructed by adding the parent folder name to the filename.
So the final file listed above has the full hash of 02188f346460de1876df7dac2669360396f84a58
CAN YOU LOOK INSIDE THE OBJECTS?
The objects inside the git database are compressed, but can be viewed with the command git cat-file
Specify the object hash and either:
-t = show the object type
-p = print the contents
NOTE: You only need part of the hash when using most GIT commands (and GIT sometimes truncates the hash in its own output)
WHAT TYPE OF OBJECT IS THIS?
❯ git cat-file 0b4271c56 -t # display the object type
commit
The above object is a commit.
WHAT’S INSIDE A COMMIT OBJECT?
❯ git cat-file 0b4271c56 -p
tree 30b4d42bbe1a39dcc314f7c280b1437a1925585e
parent cc0e10d238e78a57115572360a93deba2554d185
author GD <GD@LOCAL.HOME> 1621179226 +0100
committer GD <GD@LOCAL.HOME> 1621179226 +0100
Updated summaries. Added article
In the above output:
- tree is a hash reference to the root tree (folder).
- parent is the hash of the parent commit (unless this is the first commit)
- Author and committer are the operator who created the commit
- Finally there is the commit message, Updated summaries…
WHAT’S IN A TREE OBJECT?
We can view the contents of a tree object in the same way e.g. using the hash of the tree in the commit above:
❯ git cat-file -p 30b4d42bbe
100644 blob d298be107f27247a24d24f8f78c55d42359007be .gitignore
100644 blob e3720ce5ced245ef02620afca619727c001e85bf 404.html
100644 blob 82b909c8a3de119782d6b66288734f82a4a57d1b about.md
040000 tree 272bc4b082fa15dd84b08712206d2edfe2b41e9a archetypes
040000 tree e305983083fc1872542004d046abdf3a683407e1 config
040000 tree 955f968be02f980640e570874f4c155da51882d4 content
The first three items in the output are references to blobs (files) in the root of the tree (e.g. the .gitignore file). The rest are references to child trees, which can be explored further using the cat-file command.
WHAT’S IN A BLOB?
❯ git cat-file -t d298be107 # get the object type
blob
❯
❯ git cat-file -p d298be107 # get the object contents
public/
The blob finally contains the actual content, rather than a reference.
In this case, it is the .gitignore file that contains a single line to exclude public from the repo
WHAT ABOUT BRANCHES?
Branches are very simple in GIT. They are just references to a commit.
Branch objects aren’t compressed so we can look at the contents of the file directly (without needing git cat-file).
Local branches are stored in the .git/refs/heads folder:
❯ cat .git/refs/heads/main # show the contents of the main file
0b4271c561e6c7ad5dcf788afdc29bebbf11e171
This output is what we expected, the contents of the main branch are a reference to a commit using the SHA1 hash.
If we explore the branch using git cat-file, it gives us information about the commit the branch is pointing to:
❯ git cat-file -t main # show the object type
commit
❯
❯ git cat-file -p main # show the object contents
tree 30b4d42bbe1a39dcc314f7c280b1437a1925585e
parent cc0e10d238e78a57115572360a93deba2554d185
author gbdixg <user@domain.HOME> 1621179226 +0100
committer gbdixg <user@domain.HOME> 1621179226 +0100
Updated summaries. Added article
This is identical to the contents of the commit we looked at earlier, because that was the latest commit on the main branch.
WHAT’S THE HEAD?
Head is a special pointer in GIT. It is a reference to the commit that is currently checked-out. Usually the latest commit on the current branch, but not always.
The contents of HEAD is not a hash. It contains a pointer to the name of a branch or commit.
❯ cat .git/HEAD # show contents of the HEAD file
ref: refs/heads/main
SUMMARY
The git object database is all about references.
- HEAD is a reference to the current commit
- A branch is a reference to a commit
- A commit is a reference to a tree
- A tree is a reference to blobs and child trees
- A blob is the actual content
This article was originally posted on Write-Verbose.com