Version Control¶
Version Control is an important component in software development which manages changes to documents, websites, code, or other digital information. Version control can save you when code changes break things. Web hosting of your code repositories lets you share and work on code together and save your work in the event of a hardware failure.
The most commmon version control software in data science are git
, svn
, cvs
, and bzr
.
Given the limited amount of time we have this week, we are only going to cover Git and one web-based hosting service (GitHub) in this camp.
Decision Making
- Determine what your community is using, and learn that first.
- The most popular software for data science applications are
git
andsvn
. Both are used in Integrated Development Environments (IDE) like RStudio and Jupyter Lab
Git ready¶
Command line versus User Interfaces¶
Some users are more comfortable on a command line environment, and can use git
without ever looking at a browser tab or file explorer.
Other users are more comfortable in a graphic user interface, like a web browser or stand alone program.
USE WHAT YOU LIKE BEST AND IS MOST PRODUCTIVE FOR YOU!
Git cheat sheet¶
Here is a list of the most important commands in Git:
Git Task | Command | Description |
---|---|---|
Set up your profile locally | git config --global user.name "Cy Unicorn" |
Set your user name |
git config --global user.email Cy1@cyverse.org |
Set your email address | |
Create a Repository locally | git init |
Initialize a folder as a git repository |
Get an existing repository from a web service | git clone ssh://git@github.com/[username]/[repository-name].git |
Create a local copy of a remote repository |
Branching & Merging | Command | Description |
---|---|---|
git branch |
List branches (the asterisk denotes the current branch) | |
git branch -a |
List all branches (local and remote) | |
git branch [branch name] |
Create a new branch | |
git branch -d [branch name] |
Delete a branch | |
git push origin --delete [branch name] |
Delete a remote branch | |
git checkout -b [branch name] |
Create a new branch and switch to it | |
git checkout -b [branch name] origin/[branch name] |
Clone a remote branch and switch to it | |
git checkout [branch name] |
Switch to a branch | |
git checkout - |
Switch to the branch last checked out | |
git checkout -- [file-name.txt] |
Discard changes to a file | |
git merge [branch name] |
Merge a branch into the active branch | |
git merge [source branch] [target branch] |
Merge a branch into a target branch | |
git stash |
Stash changes in a dirty working directory | |
git stash clear |
Remove all stashed entries |
Sharing & Updating Projects | Command | Description |
---|---|---|
git push origin [branch name] |
Push a branch to your remote repository | |
git push -u origin [branch name] |
Push changes to remote repository (and remember the branch) | |
git push |
Push changes to remote repository (remembered branch) | |
git push origin --delete [branch name] |
Delete a remote branch | |
git pull |
Update local repository to the newest commit | |
git pull origin [branch name] |
Pull changes from remote repository | |
git remote add origin ssh://git@github.com/[username]/[repository-name].git |
Add a remote repository | |
git remote set-url origin ssh://git@github.com/[username]/[repository-name].git |
Set a repository’s origin branch to SSH |
Inspection & Comparison | Command | Description |
---|---|---|
git log |
View changes | |
git log --summary |
View changes (detailed) | |
git diff [source branch] [target branch] |
Preview changes before merging |
GitHub¶
GitHub is (as of spring 2019) the largest and most popular platform for working with git
.
The use of GitHub could become the most central point of software supporting your science lab. Reproducible research requires you to host your analysis code, copies of the software (with version), operating system, and language kernels used to complete the analysis, in addition to the actual data.
GitHub allows you to support your open science lab by creating ‘repositories’ where you can host each of these components of your data science workflows. It also allows you to make copies (clones) or new branches of a master repository to test out new analyses or code changes, and to merge these back in.
- Basic Workflows
- Repositories
- Branches
- Collaboration
Other more powerful uses of GitHub include the integration with other web services, like container registries (DockerHub), websites (ReadTheDocs, GitHub Pages https://pages.github.com/), continuous integration (CircleCI, Jenkins, `Travis<https://travis-ci.org/>`_).
Note
In this workshop, we’re working with GitHub, but there are other services, like GitLab or Bitbucket which might fit your needs better.
Your first Repository¶
- Log into GitHub
- Create a new Repository and initiate it with a
README.md
file - Edit the README.md by clicking on the pencil icon.
- Create a new file by clicking on the “Create New File” icon.
- Type in LICENSE - note that a “Choose a License Template” icon has been activated.
- Click on the License Template icon and choose your preferred license.
Issue Tracking¶
Development teams use tracking software, like Jira, or GitHub Issues to track their development progress.
ZenHub and Jira use a Kanban style board for organizing issues.
We’re going to use ZenHub because it is free and works off of GitHub Issues.
Self-Paced Lessons¶
The Carpentries host numerous tutorials on using git
. You can take time on your own to explore these lessons, find a workshop, or request one be taught at your local institution.
Using Github on your own or for your classes
Fix or improve this documentation:
- On Github: Github Repo Link
- Send feedback: Tutorials@CyVerse.org