The world is powered by open source software. It is a hidden cornerstone of modern civilization and the shared heritage of all humanity. The mission of the GitHub Archive Program is to preserve open-source software for future generations.
GitHub is partnering with the Long Now Foundation, the Internet Archive, the Software Heritage Foundation, Arctic World Archive (AWA), Microsoft Research, the Bodleian Library, and Stanford Libraries to ensure the long-term preservation of the world’s open-source software. We will protect this priceless knowledge by storing multiple copies, on an ongoing basis, across various data formats and locations, including a very-long-term archive designed to last at least 1,000 years.
Future historians will be able to learn about us from open source projects and metadata. They might regard our age of open source ubiquity, volunteer communities, and Moore’s Law as historically significant. We are already partnering with Stanford Libraries to help archive curated repositories along with the cultural and another context in which they are set, as key elements of wide-ranging historical and social research and analysis.
How the cold storage will last 1,000 years
Svalbard is regulated by the international Svalbard Treaty as a demilitarized zone. Home to the world’s northernmost town, it is one of the most remote and geopolitically stable human habitations on Earth.
The AWA is a joint initiative between Norwegian state-owned mining company Store Norske Spitsbergen Kulkompani (SNSK) and very-long-term digital preservation provider Piql AS. AWA is devoted to archival storage in perpetuity. The film reels will be stored in a steel-walled container inside a sealed chamber within a decommissioned coal mine on the remote archipelago of Svalbard. The AWA already preserves historical and cultural data from Italy, Brazil, Norway, the Vatican, and many others.
While Svalbard is affected by climate change, it’s likely to affect only the outermost few meters of permafrost in the foreseeable future. Warming is not expected to threaten the stability of the mine. The mine’s proximity to the famous Global Seed Vault, only a mile away, reinforces Svalbard’s status as a stable, very-long-term archive site for humanity’s collective knowledge.
What’s in the 02/02/2020 snapshot
The 02/02/2020 snapshot archived in the GitHub Arctic Code Vault will sweep up every active public GitHub repository, in addition to significant dormant repos. The snapshot will include every repo with any commits between the announcement at GitHub Universe on November 13th and 02/02/2020, every repo with at least 1 star and any commits from the year before the snapshot (02/03/2019 – 02/02/2020), and every repo with at least 250 stars. The snapshot will consist of the HEAD of the default branch of each repository, minus any binaries larger than 100KB in size—depending on available space, repos with more stars may retain binaries. Each repository will be packaged as a single TAR file. For greater data density and integrity, most of the data will be stored QR-encoded, and compressed. A human-readable index and guide will itemize the location of each repository and explain how to recover the data.