DGit: GitHub’s distributed, sw-defined storage


GitHub announced a new redundancy-based technology built on core git version control techniques for a more reliable, highly available and performance oriented storage to repositories – DGit. While git is distributed by design (any copy of a repository contains the entire history), it doesn’t support mirroring by itself. DGit stores the data of each repo in 3 different servers.

DGit uses git protocols internally to keep the mirrors in sync. This is achieved at the application level instead of at the storage layer, hence providing more flexibility and options for maintenance.

“If a file server needs to be taken offline, DGit automatically determines which repos are left with fewer than three replicas and creates new replicas of those repos on other file servers. This “healing” process uses all remaining servers as both sources and destinations.” – the announcement says. The file servers store all data on local SSD drives, ensuring lower latency and higher throughput. Git in itself has techniques to perform better when accessing faster disks.

Features and benefits

  • Topolygy of file servers become more flexible
  • When a server fails, DGit quickly makes new copies of the repos that it hosted and automatically distributes them throughout the cluster
  • Routing around failure gets much less disruptive – simply stop routing traffic to the affected server
  • No need of hot spares: every CPU and all memory is available for handling user traffic. Read and write operations can be served from different servers.
  • Load-balancing between servers: one big repository can’t affect the performance of other repos on the same server
  • Decoupling of replicas: replicas of a repository can live on different availability zones, or even in different data centers

Rollout status

After migrating the repos of DGit devs, GitHub owned private repos and 3 months of stability testing, GitHub moved GitHub owned pubic repos. Next in line were the large public repos like Ruby, Rails, Bootstrap, D3. At the time of writing, 58% of repositories and 96% of Gists, representing 67% of Git operations, are completely migrated to DGit.

Source: Introducing DGit

One thought on “DGit: GitHub’s distributed, sw-defined storage”

Leave a Reply

Your email address will not be published. Required fields are marked *