Command Line and the Unix Shell¶
- Download data-shell.zip and move the file to your Desktop.
- Unzip/extract the file. You should end up with a new folder called data-shell on your Desktop.
- Open a terminal and type cd, then press the Enter key.
That last step will make sure you start with your home folder as your working directory. In the lesson, you will find out how to access the data in this folder.
At a high level, computers do four things:
- run programs
- store data
- communicate with each other, and
- interact with us
- The graphical user interface (GUI) is the most widely used way to interact with personal computers.
- give instructions (to run a program, to copy a file, to create a new folder/directory) with mouse
- intuitive and very easy to learn
- scales very poorly
- The shell - a command-line interface (CLI) to make repetitive tasks automatic and fast.
- can take a single instruction and repeat it
If we have to copy the third line of each of a thousand text files stored in thousand different folders/directories and paste it into a single file line by line.
- Using the traditional GUI approach will take several hours to do this.
- Using the shell this will only take a couple of minutes (at most).
- The heart of a command-line interface is a read-evaluate-print loop (REPL). When you type a command and press Return
- the shell reads your command
- evaluates (or “executes”) it
- prints the output of your command
- loops back and waits for you to enter another command
- The Shell is a program which runs other programs rather than doing calculations itself.
- programs can be as complicated as a climate modeling software
- as simple as a program that creates a new folder/directory
- simple programs used to perform stand alone tasks are usually refered to as commands.
- most popular Unix shell is Bash, (the Bourne Again SHell).
- Bash is the default shell on most modern implementations of Unix
A typical shell window looks something like:
- first line shows only a prompt
- indicates the shell is waiting for input
- your shell may use different text for the prompt
- do not type the prompt, only the commands that follow it
- the second line
- command is ls, with an option -F and an argument /
- options change the behavior of a command
- each part is separated by spaces
- capitalization matters
- commands can have more than one option or arugment
- commands don’t always require and option or argument
- lines 3-5 contain output that command produced
- this is a list of files and folders in the root directory (/)
Finally, the shell again prints the prompt and waits for you to type the next command.
Open a shell window and try executing ls -F / for yourself (don’t forget that spaces and capitalization are important!).
$ ls -F /
$ ls-F ls-F: command not found
Usually this means that you have mis-typed the command - in this case we omitted the space between ls and -F.
To re-enter the same command again use the up arrow to display the previous command. Press the up arrow twice to show the command before that (and so on).
Working with Files and Directories¶
Let’s go back to our data-shell directory on the Desktop and use ls -F to see what it contains:
$ pwd /Users/nelle/Desktop/data-shell $ ls -F creatures/ data/ molecules/ north-pacific-gyre/ notes.txt pizza.cfg solar.pdf writing/
Let’s create a new directory called thesis using the command mkdir thesis (which has no output):
$ mkdir thesis
As you might guess from its name, mkdir means “make directory”. Since thesis is a relative path (i.e., does not have a leading slash, like /what/ever/thesis), the new directory is created in the current working directory:
$ ls -F creatures/ data/ molecules/ north-pacific-gyre/ notes.txt pizza.cfg solar.pdf thesis/ writing/
Good Names for Files and Directories
Complicated names of files and directories can make your life painful when working on the command line. Here we provide a few useful tips for the names of your files.
Don’t use spaces.
Spaces can make a name more meaningful, but since spaces are used to separate arguments on the command line it is better to avoid them in names of files and directories. You can use - or _ instead (e.g. north-pacific-gyre/ rather than north pacific gyre/).
Don’t begin the name with - (dash).
Commands treat names starting with - as options.
Stick with letters, numbers, . (period or ‘full stop’), - (dash) and _ (underscore).
Many other characters have special meanings on the command line. We will learn about some of these during this lesson. There are special characters that can cause your command to not work as expected and can even result in data loss.
If you need to refer to names of files or directories that have spaces or other special characters, you should surround the name in quotes (“”).
Since we’ve just created the thesis directory, there’s nothing in it yet:
$ ls -F thesis
Create a text file¶
Let’s change our working directory to thesis using cd, then run a text editor called Nano to create a file called draft.txt:
$ cd thesis $ nano draft.txt
When we say, “nano is a text editor,” we really do mean “text”: it can only work with plain character data, not tables, images, or any other human-friendly media. We use it in examples because it is one of the least complex text editors. On Unix systems (such as Linux and Mac OS X), many programmers use Emacs or Vim (both of which require more time to learn), or a graphical editor such as Gedit. On Windows, you may wish to use Notepad++. Windows also has a built-in editor called notepad that can be run from the command line in the same way as nano for the purposes of this lesson.
Let’s type in a few lines of text. Once we’re happy with our text, we can press Ctrl+O (press the Ctrl or Control key and, while holding it down, press the O key) to write our data to disk (we’ll be asked what file we want to save this to: press Return to accept the suggested default of draft.txt).
Once our file is saved, we can use Ctrl-X to quit the editor and return to the shell.
Control, Ctrl, or ^ Key
The Control key is also called the “Ctrl” key. There are various ways in which using the Control key may be described. For example, you may see an instruction to press the Control key and, while holding it down, press the X key, described as any of:
In nano, along the bottom of the screen you’ll see ^G Get Help ^O WriteOut. This means that you can use Control-G to get help and Control-O to save your file.
nano doesn’t leave any output on the screen after it exits, but ls now shows that we have created a file called draft.txt:
$ ls draft.txt
Creating Files a Different Way
We have seen how to create text files using the nano editor. Now, try the following command:
$ touch my_file.txt
What did the touch command do?
Use ls -l to inspect the files. How large is my_file.txt?
$ ls -l
You may have noticed that files are named “something dot something”, and in this part of the lesson, we always used the extension .txt. This is just a convention: we can call a file mythesis or almost anything else we want. However, most people use two-part names most of the time to help them (and their programs) tell different kinds of files apart. The second part of such a name is called the filename extension, and indicates what type of data the file holds.
Naming a PNG image of a whale as whale.mp3 doesn’t somehow magically turn it into a recording of whalesong, though it might cause the operating system to try to open it with a music player when someone double-clicks it.
Moving files and directories¶
Returning to the data-shell directory,
$ cd ~/Desktop/data-shell/
In our thesis directory we have a file draft.txt which isn’t a particularly informative name, so let’s change the file’s name using mv, which is short for “move”:
$ mv thesis/draft.txt thesis/quotes.txt
The first argument tells mv what we’re “moving”, while the second is where it’s to go. In this case, we’re moving thesis/draft.txt to thesis/quotes.txt, which has the same effect as renaming the file. Sure enough, ls shows us that thesis now contains one file called quotes.txt:
$ ls thesis quotes.txt
One has to be careful when specifying the target file name, since mv will silently overwrite any existing file with the same name, which could lead to data loss. An additional option, mv -i (or mv –interactive), can be used to make mv ask you for confirmation before overwriting.
mv also works on directories
Let’s move quotes.txt into the current working directory. We use mv once again, but this time we’ll just use the name of a directory as the second argument to tell mv that we want to keep the filename, but put the file somewhere new. (This is why the command is called “move”.) In this case, the directory name we use is the special directory name . that we mentioned earlier.
$ mv thesis/quotes.txt .
The effect is to move the file from the directory it was in to the current working directory. ls now shows us that thesis is empty:
$ ls thesis
Further, ls with a filename or directory name as an argument only lists that file or directory. We can use this to see that quotes.txt is still in our current directory:
$ ls quotes.txt quotes.txt
Copying Files and Directories¶
The cp command works very much like mv, except it copies a file instead of moving it. We can check that it did the right thing using ls with two paths as arguments — like most Unix commands, ls can be given multiple paths at once:
$ cp quotes.txt thesis/quotations.txt $ ls quotes.txt thesis/quotations.txt quotes.txt thesis/quotations.txt
We can also copy a directory and all its contents by using the recursive option -r, e.g. to back up a directory:
$ cp -r thesis thesis_backup
We can check the result by listing the contents of both the thesis and thesis_backup directory:
$ ls thesis thesis_backup thesis: quotations.txt thesis_backup: quotations.txt
Removing files and directories¶
Returning to the data-shell directory, let’s tidy up this directory by removing the quotes.txt file we created. The Unix command we’ll use for this is rm (short for ‘remove’):
$ rm quotes.txt
We can confirm the file has gone using ls:
$ ls quotes.txt ls: cannot access 'quotes.txt': No such file or directory
Using rm Safely
If we try to remove the thesis directory using rm thesis, we get an error message:
$ rm thesis rm: cannot remove `thesis': Is a directory
This happens because rm by default only works on files, not directories.
rm can remove a directory and all its contents if we use the recursive option -r, and it will do so without any confirmation prompts:
$ rm -r thesis
Deleting Is Forever
The Unix shell doesn’t have a trash bin that we can recover deleted files from. Instead, when we delete files, they are unlinked from the file system so that their storage space on disk can be recycled. Given that there is no way to retrieve files deleted using the shell, rm -r should be used with great caution (you might consider adding the interactive option rm -r -i).
Operations with multiple files and directories¶
Oftentimes one needs to copy or move several files at once. This can be done by providing a list of individual filenames, or specifying a naming pattern using wildcards.
Copy with Multiple Filenames
For this exercise, you can test the commands in the data-shell/data directory.
In the example below, what does cp do when given several filenames and a directory name?
$ mkdir backup $ cp amino-acids.txt animals.txt backup/
If given more than one file name followed by a directory name (i.e. the destination directory must be the last argument), cp copies the files to the named directory.
Using wildcards for accessing multiple files at once
* is a wildcard, which matches zero or more characters. Let’s consider the data-shell/molecules directory: *.pdb matches ethane.pdb, propane.pdb, and every file that ends with ‘.pdb’. On the other hand, p*.pdb only matches pentane.pdb and propane.pdb, because the ‘p’ at the front only matches filenames that begin with the letter ‘p’.
? is also a wildcard, but it matches exactly one character. So ?ethane.pdb would match methane.pdb whereas *ethane.pdb matches both ethane.pdb, and methane.pdb.
Wildcards can be used in combination with each other e.g. ???ane.pdb matches three characters followed by ane.pdb, giving cubane.pdb ethane.pdb octane.pdb.
Other Useful Tools and Commands¶
prints the first few (10 by default) lines of a file
$ head data/sunspot.txt (* Sunspot data collected by Robin McQuinn from *) (* http://sidc.oma.be/html/sunspot.html *) (* Month: 1749 01 *) 58 (* Month: 1749 02 *) 63 (* Month: 1749 03 *) 70 (* Month: 1749 04 *) 56 (* Month: 1749 05 *) 85 (* Month: 1749 06 *) 84 (* Month: 1749 07 *) 95
prints the last few (10 by default) lines of a file
$ tail data/sunspot.txt (* Month: 2004 05 *) 42 (* Month: 2004 06 *) 43 (* Month: 2004 07 *) 51 (* Month: 2004 08 *) 41 (* Month: 2004 09 *) 28 (* Month: 2004 10 *) 48 (* Month: 2004 11 *) 44 (* Month: 2004 12 *) 18 (* Month: 2005 01 *) 31 (* Month: 2005 02 *) 29
displays the last few hundred commands that have been executed
$ history 1988 cd .. 1989 ls 1990 cd data-shell/ 1991 ls 1992 mkdir thesis 1993 ls 1994 ls-F 1995 ls 1996 cd Desktop/data-shell/data/ 1997 pwd 1998 cd .. 1999 pwd 2000 ls -F 2001 cd Desktop/data-shell/ 2002 head data/sunspot.txt 2003 tail data/sunspot.txt 2004 history
finds and prints lines in files that match a pattern
$ cd $ cd Desktop/data-shell/writing $ cat haiku.txt The Tao that is seen Is not the true Tao, until You bring fresh toner. With searching comes loss and the presence of absence: "My Thesis" not found. Yesterday it worked Today it is not working Software is like that.
$ grep not haiku.txt Is not the true Tao, until "My Thesis" not found Today it is not working
To find all the files in the ‘writing’ directory and sub-directories
$ find . . ./thesis ./thesis/empty-draft.md ./tools ./tools/format ./tools/old ./tools/old/oldtool ./tools/stats ./haiku.txt ./data ./data/two.txt ./data/one.txt ./data/LittleWomen.txt
To find all the files that end with ‘.txt’
$find -name *.txt ./haiku.txt
print stings (text)
This is especially useful when writing Bash scripts
$echo hello world hello world
prints output to a file rather than the shell
$ grep not haiku.txt > not_haiku.txt $ ls data haiku.txt not_haiku.txt thesis tools
appends output to the end of a file
$ grep Tao haiku.txt >> not_haiku.txt $ nano not_haiku.txt
directs output from the first command into the second command (and the second into the third)
$ cd ../north-pacific-gyre/2012-07-03 $ wc -l *.txt | sort -n | head -n 5 240 NENE02018B.txt 300 NENE01729A.txt 300 NENE01729B.txt 300 NENE01736A.txt 300 NENE01751A.txt
This is was just a brief summary of how to use the command line. There is much, much more you can do. For more information check out the Software Caprentry page.