LEARNING TO CODE

Gitlet

If you studied computer science at UC Berkeley, you already know what this is. I didn't study computer science at UC Berkeley, but I thought this project would be a good exercise in teaching myself to code.

Gitlet is version control system that mimics Git, but is recreated in Java. Much like Git, Gitlet uses a Merkle tree-like data structure to handle commits and branches. Gitlet also uses Java serialization to read, write, and persist files.
LEARNING TO CODE

Gitlet

If you studied computer science at UC Berkeley, you already know what this is. I didn't study computer science at UC Berkeley, but I thought this project would be a good exercise in teaching myself to code.

Gitlet is version control system that mimics Git, but is recreated in Java. Much like Git, Gitlet uses a Merkle tree-like data structure to handle commits and branches. Gitlet also uses Java serialization to read, write, and persist files.
Skills
Product Design
UX Research
Interaction Design
Timeline
2 months (Sep 2023 - Oct 2023)
Tools
Figma, Miro, Maze
Skills
Software Design
Unit Testing
Data Structures
Timeline
1 month
(Sep 2022 - Oct 2022)
Code / Concepts
Java, Mac CLI,  data structures, hashing, content addressing, file persistence
Skills
Software Design
Unit Testing
Data Structures
Timeline
1 month
(Sep 2022 - Oct 2022)
Code / Concepts
Java, Mac CLI, data structures, hashing, content addressing, file persistence
PROJECT BACKGROUND

What is Gitlet?

Gitlet is part of UC Berkeley's CS 61B: Data Structures, and is typically the first assigned project without skeleton code. This was an exercise that exposed me to software design principles including decomposition, abstraction, and code flexibility. Throughout development, I was able to implement concepts like file persistence, graph traversals, and cryptographic hashing so Gitlet could mimic commands of the real-world Git.
A high-level representation of Gitlet which later helped me conceptualize Gitlet's file structure and software design.
LEARNING & DEVELOPMENT

Software design and challenges

I started by creating my own project skeleton by diagramming my own implementation of Gitlet. To understand how Git stores things like head commits, branches, and logs, I explored .git in my own repositories and came up with a modified folder structure:
All repository information, including the current head, staging area, blobs, branches, commits, and remote repositories is saved in .gitlet. Commits and blobs are each assigned a SHA-1 UUID, serialized, and stored in an objects folder. To expedite file traversal, objects are packaged in folders named by the first two digits of their UUID, similar to Git. Once I tested and deployed these capabilities, I implemented each Gitlet command one by one:
init
Creates a Gitlet version control system in the current directory, initializing the master branch and initial commit
add
Stages a file for addition
commit
Saves a snapshot of the staging area and tracked files so they can be restored at a later time
rm
Stages a file for removal and removes it from the working directory
log
Displays information about each commit backwards, starting from the current head commit
global log
Displays information about all commits ever made
find
Prints the ids of all commits that have a given commit message
status
Displays the status of branches & staged, modified, and untracked files in the working directory
checkout
Copies and overwrites files in the head, given commit, or given branch to the working directory
branch
Creates a new branch with the given name, and points it at the current head commit
rm-branch
Deletes the branch with the given name
reset
Checks out the files tracked by the given commit, removes untracked files, and moves the current head to the given commit
merge
Merges files from the given branch into the current branch, similar to git merge
add-remote
Saves a remote repository under the given name
rm-remote
Removes information associated with the given remote name
fetch
Copies commits and blobs from the given remote branch into the local repository, saving fetched commits under a new branch
push
Appends commits from the current branch to the end of the given remote branch
pull
Fetches the remote branch and merges it into the current branch
CHALLENGES I FACED
Since Gitlet was the most challenging software project I had attempted at the time, I ran into several challenges. Because I was attempting this project outside of a classroom, I also got no help from peers or TAs – which meant that rigorous unit testing was crucial to the project's success.

Project design: Organizing Gitlet's software design and finding the most efficient way to implement the commands was not straightforward for me. I found that methodically cycling between learning, planning, and coding helped me locate design flaws, plan ahead, and continuously iterate until my implementation aligned with my intended design.

Checkout, merge, and fetch: These commands were particularly difficult to implement because they alter the working directory differently depending on the commit history. For example, the merge command merges files from a given branch into the current branch, which requires identifying the split point of the two branches. To make these commands work, I treated commit histories as directed acyclic graphs so I could traverse them in reverse level order. My implementation involved using hash maps, linked lists, and unordered sets to retrieve and store information during traversals.
An example of latest-common-ancestor (LCA) dependency in the merge command. Gitlet finds the LCA of two commits by running DFS to find all parents of commit A and matching those commits with each parent of commit B. Every shared commit is considered a split point, and the LCA is identified as the split point of the least depth. Gitlet avoids calculating node depths when executing a merge by memoizing depth each time a new commit is saved.
Testing and debugging: This project was also challenging because each command relies upon the proper functioning of previous commands (ex. fetching a remote branch is impossible if the remote repository is improperly configured.) I frequently unit tested to ensure that early bugs did not permeate future code, and found myself spending as much time writing tests as I did developing the project itself. My poorly-written tests often led me astray; my well-written ones were vital to clarify key invariables and manage project complexity.
PROJECT TAKEAWAYS

Key learnings from this project

This was my first introduction to software design. Instead of coding to meet a list of predefined requirements, I needed to preemptively think about designing in a way that is structured, well-organized, fast, and flexible in order to develop features that build on top of each other. I spent (and wasted) a lot of time working and reworking Gitlet's design, which ultimately helped me build a strong mental framework for setting up larger, more complex projects in the future.
NEXT PROJECT
EXIF Mapper: Digital Journeys