Home_Icon2 Learning Center Home

Version Control

Version Control is an important component in software development which manages changes to documents, websites, code, or other digital information. Version control can save you when code changes break things. Web hosting of your code repositories lets you share and work on code together and save your work in the event of a hardware failure.

The most commmon version control software in data science are git, svn, cvs, and bzr.

Given the limited amount of time we have this week, we are only going to cover Git and one web-based hosting service (GitHub) in this camp.

Decision Making

  • Determine what your community is using, and learn that first.
  • The most popular software for data science applications are git and svn. Both are used in Integrated Development Environments (IDE) like RStudio and Jupyter Lab

Git ready

  1. Install Git
  2. Create a Github Account
    • Sign Up
    • If you are an academic and have a @ .edu email address you can get Educational Account benefits

Command line versus User Interfaces

Some users are more comfortable on a command line environment, and can use git without ever looking at a browser tab or file explorer.

Other users are more comfortable in a graphic user interface, like a web browser or stand alone program.


Git cheat sheet

Here is a list of the most important commands in Git:

Git Task Command Description
Set up your profile locally git config --global "Cy Unicorn" Set your user name
  git config --global Set your email address
Create a Repository locally git init Initialize a folder as a git repository
Get an existing repository from a web service git clone ssh://[username]/[repository-name].git Create a local copy of a remote repository
Branching & Merging Command Description
  git branch List branches (the asterisk denotes the current branch)
  git branch -a List all branches (local and remote)
  git branch [branch name] Create a new branch
  git branch -d [branch name] Delete a branch
  git push origin --delete [branch name] Delete a remote branch
  git checkout -b [branch name] Create a new branch and switch to it
  git checkout -b [branch name] origin/[branch name] Clone a remote branch and switch to it
  git checkout [branch name] Switch to a branch
  git checkout - Switch to the branch last checked out
  git checkout -- [file-name.txt] Discard changes to a file
  git merge [branch name] Merge a branch into the active branch
  git merge [source branch] [target branch] Merge a branch into a target branch
  git stash Stash changes in a dirty working directory
  git stash clear Remove all stashed entries
Sharing & Updating Projects Command Description
  git push origin [branch name] Push a branch to your remote repository
  git push -u origin [branch name] Push changes to remote repository (and remember the branch)
  git push Push changes to remote repository (remembered branch)
  git push origin --delete [branch name] Delete a remote branch
  git pull Update local repository to the newest commit
  git pull origin [branch name] Pull changes from remote repository
  git remote add origin ssh://[username]/[repository-name].git Add a remote repository
  git remote set-url origin ssh://[username]/[repository-name].git Set a repository’s origin branch to SSH
Inspection & Comparison Command Description
  git log View changes
  git log --summary View changes (detailed)
  git diff [source branch] [target branch] Preview changes before merging


Github for Education

GitHub is (as of spring 2019) the largest and most popular platform for working with git.

The use of GitHub could become the most central point of software supporting your science lab. Reproducible research requires you to host your analysis code, copies of the software (with version), operating system, and language kernels used to complete the analysis, in addition to the actual data.

GitHub allows you to support your open science lab by creating ‘repositories’ where you can host each of these components of your data science workflows. It also allows you to make copies (clones) or new branches of a master repository to test out new analyses or code changes, and to merge these back in.

  • Basic Workflows
  • Repositories
  • Branches
  • Collaboration

Other more powerful uses of GitHub include the integration with other web services, like container registries (DockerHub), websites (ReadTheDocs, GitHub Pages, continuous integration (CircleCI, Jenkins, `Travis<>`_).


In this workshop, we’re working with GitHub, but there are other services, like GitLab or Bitbucket which might fit your needs better.

Your first Repository

  1. Log into GitHub
  2. Create a new Repository and initiate it with a file
  3. Edit the by clicking on the pencil icon.
  4. Create a new file by clicking on the “Create New File” icon.
  5. Type in LICENSE - note that a “Choose a License Template” icon has been activated.
  6. Click on the License Template icon and choose your preferred license.

Issue Tracking

Development teams use tracking software, like Jira, or GitHub Issues to track their development progress.

ZenHub and Jira use a Kanban style board for organizing issues.

We’re going to use ZenHub because it is free and works off of GitHub Issues.


ZenHub is agile program management software which uses GitHub issues.

You log into ZenHub using your GitHub user name.

Self-Paced Lessons

The Carpentries host numerous tutorials on using git. You can take time on your own to explore these lessons, find a workshop, or request one be taught at your local institution.

Using Github on your own or for your classes

Fix or improve this documentation: