Version Control is an important component in software development which manages changes to documents, websites, code, or other digital information. Version control can save you when code changes break things. Web hosting of your code repositories lets you share and work on code together and save your work in the event of a hardware failure.
The most commmon version control software in data science are
- Determine what your community is using, and learn that first.
- The most popular software for data science applications are
svn. Both are used in Integrated Development Environments (IDE) like RStudio and Jupyter Lab
Command line versus User Interfaces¶
Some users are more comfortable on a command line environment, and can use
git without ever looking at a browser tab or file explorer.
Other users are more comfortable in a graphic user interface, like a web browser or stand alone program.
USE WHAT YOU LIKE BEST AND IS MOST PRODUCTIVE FOR YOU!
Git cheat sheet¶
Here is a list of the most important commands in Git:
|Set up your profile locally||
||Set your user name|
||Set your email address|
|Create a Repository locally||
||Initialize a folder as a
|Get an existing repository from a web service||
||Create a local copy of a remote repository|
|Branching & Merging||Command||Description|
||List branches (the asterisk denotes the current branch)|
||List all branches (local and remote)|
||Create a new branch|
||Delete a branch|
||Delete a remote branch|
||Create a new branch and switch to it|
||Clone a remote branch and switch to it|
||Switch to a branch|
||Switch to the branch last checked out|
||Discard changes to a file|
||Merge a branch into the active branch|
||Merge a branch into a target branch|
||Stash changes in a dirty working directory|
||Remove all stashed entries|
|Sharing & Updating Projects||Command||Description|
||Push a branch to your remote repository|
||Push changes to remote repository (and remember the branch)|
||Push changes to remote repository (remembered branch)|
||Delete a remote branch|
||Update local repository to the newest commit|
||Pull changes from remote repository|
||Add a remote repository|
||Set a repository’s origin branch to SSH|
|Inspection & Comparison||Command||Description|
||View changes (detailed)|
||Preview changes before merging|
GitHub is (as of spring 2019) the largest and most popular platform for working with
The use of GitHub could become the most central point of software supporting your science lab. Reproducible research requires you to host your analysis code, copies of the software (with version), operating system, and language kernels used to complete the analysis, in addition to the actual data.
GitHub allows you to support your open science lab by creating ‘repositories’ where you can host each of these components of your data science workflows. It also allows you to make copies (clones) or new branches of a master repository to test out new analyses or code changes, and to merge these back in.
- Basic Workflows
Other more powerful uses of GitHub include the integration with other web services, like container registries (DockerHub), websites (ReadTheDocs, GitHub Pages https://pages.github.com/), continuous integration (CircleCI, Jenkins, `Travis<https://travis-ci.org/>`_).
Your first Repository¶
- Log into GitHub
- Create a new Repository and initiate it with a
- Edit the README.md by clicking on the pencil icon.
- Create a new file by clicking on the “Create New File” icon.
- Type in LICENSE - note that a “Choose a License Template” icon has been activated.
- Click on the License Template icon and choose your preferred license.
ZenHub and Jira use a Kanban style board for organizing issues.
We’re going to use ZenHub because it is free and works off of GitHub Issues.
The Carpentries host numerous tutorials on using
git. You can take time on your own to explore these lessons, find a workshop, or request one be taught at your local institution.
Using Github on your own or for your classes
Fix or improve this documentation: