Libraries, GitHub, and Open Source Software

The Knight Foundation
The Knight Foundation

Last week, the Knight Foundation held a competition to find and fund new proposals on the prompt: “How might we leverage libraries as a platform to build more knowledgeable communities?” The Knight News Challenge is now closed for entries, and their site has offered up the submissions they received for the public’s perusal. Browsing through them, I noticed a common goal: furthering use of open-source programming by LIS professionals. In her eloquent proposal, Simmons GSLIS alum Andromeda Yelton explained why this is important: “When librarians have programming skills, they can build better services for patrons, save time, and customize their software tools for their local mission.”

One of the most helpful tools librarians are using for programming projects is the open-source development application GitHub. GitHub is a cloud-based service which is used to organize efforts among multiple programmers working together on the same open-source software project. Similarly to Dropbox, it connects local files on your computer with copies of those same files stored on the cloud.

GitHub is capable of much more than storage, though. The main draw of GitHub is its implementation of source code management. GitHub can 2014-01-30-githubtrack many aspects of the project’s code, both in the relationships between files and within the files themselves. It can work with all major text-based coding files, including HTML, LaTeX, XML, SQL, config, and more. It allows for version control, which gives the user the ability to revert parts of a project, or an entire project, to any previous version through the history of the project. This means that if a glitch is introduced in February but you don’t notice it until June, you could roll back the project to February’s version in order to fix it. GitHub keeps track of which files have been edited by which users and when, and it automatically syncs these edits into one common project folder. This functionality also allows for file conflict resolution: if two programmers edit the same file at the same time, then GitHub will point out which lines in the code were changed, and which lines were edited by both users. This means that multiple users can work in tandem without worrying about inconsistencies because of the fail-safes GitHub puts in place. This also allows for “branching” the project so that two versions of the project can be worked on independently from the same root. This comes in handy when developers keep one branch of the project as a “stable” version while working on an “unstable” branch, which gets more frequent updates but is also more prone to unexpected glitches.

In his article “The Librarian’s Arsenal: Git & GitHub”, Topher Lawton points out a use case of GitHub that has proven very helpful to LIS professionals: “GitHub expands the branching abilities of Git into “forking,” which allows users to clone code into their own repository . . . Forking code makes it possible for librarians to tailor other projects to the specifications we need. It’s a shared, open-source way of co-creating content that librarians should take advantage of.” He then mentions a salient example of this in the field: the LibraryBox.

The Library Box, in the wild.
The Library Box, in the wild.

The LibraryBox project was originally forked from another GitHub project, David Darts’ “PirateBox”. The PirateBox is a portable hardware device which allows for the anonymous local spread of digital files independently from the internet. It’s a small box which any devices in range can connect with wirelessly to download files from it or upload files to it. During the PirateBox’s development, librarian Jason Griffey created a fork from the project where he endeavored to work from the PirateBox’s source code in order to create a new tool for libraries. The LibraryBox does the same things as the PirateBox, but it locks down control over the device to one user, so that it’s more useful for distribution than back-and-forth filesharing. This way, an institution running a LibraryBox can use it to distribute digital items such as ebooks to patrons in areas that lack consistent internet access without worrying about bad actors uploading copyrighted works or objectionable material.

Project GITenbergAnother GitHub-based LIS project, “Project GITenberg”, is currently a part of Knight News Challenge’s submissions roster. They’re using GitHub to crowdsource metadata for Project Gutenberg’s 45,000 public domain ebooks. In this case, the work that needs to be done isn’t especially difficult, but there’s a whole lot of it and it’s difficult to coordinate. GitHub’s unique ability to organize many contributors’ efforts under one project makes this challenge much less daunting. If their ambitions are met, then it will be a great deal easier for libraries to offer these ebooks to their patrons.

As the ability to program becomes more and more essential, it’s little wonder that a good number of libraries are currently making use of GitHub for many kinds of projects. Code4Lib has collected a handy list of them here. If librarians can adopt open source programming on a large scale, then one can only imagine the breadth of innovations ahead of us.

 

 

(Post by Derek Murphy)

Leave a Reply

Your email address will not be published. Required fields are marked *