# Local, single-user, linear workflow

## Directed Acyclic Graphs (DAG)

The fundamental unit of information in git is what is called a _commit_. A commit is a snapshot of a work at a point in time. Now, for version control systems we don't want to have just one snapshot of our work, but instead we want many of them. Furthermore, instead of a linear sequence of snapshots we may imagine having multiple lines or branches of snapshots including different versions of our project. These commits are then organized in a directed acyclic graph, as the one shown in the following picture.

<!-- offline: 
![](images/commit_anatomy.png)

<img src="https://raw.github.com/fperez/reprosw/master/fig/commit_anatomy.png">
-->

![](images/commit_anatomy.png)

Credit: ProGit book, by Scott Chacon, CC License.

We identify each node (commit) with a hash, a fingerprint of the content of each commit and its parent. It is important the fact that the hash include information of the parent node, since this allow us to keep the check the structural consistency of the DAG. 

Let's create a first hash:

In [1]:
from hashlib import sha1

# Our first commit
data1 = b'This is the start of my paper.'
meta1 = b'date: 1/1/17'
hash1 = sha1(data1 + meta1).hexdigest( )
print('Hash:', hash1)

Hash: 3b32905baabd5ff22b3832c892078f78f5e5bd3b


Every small change we make on the previous text with result in a full change of the associated hash code. Notice also how in the next hash we have included the information of the parent node. 

In [4]:
data2 = b'Some more text in my paper...'
meta2 = b'date: 1/2/1'
# Note we add the parent hash here!
hash2 = sha1(data2 + meta2 + hash1.encode()).hexdigest()
print('Hash:', hash2)

Hash: 1c12d2aad51d5fc33e5b83a03b8787dfadde92a4


## Locals

Type `git` to see a full list of all the _core_ commands.  We'll now go through most of these via small practical exercises:

In [5]:
!git

usage: git [-v | --version] [-h | --help] [-C <path>] [-c <name>=<value>]
           [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
           [-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare]
           [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
           [--super-prefix=<path>] [--config-env=<name>=<envvar>]
           <command> [<args>]

These are common Git commands used in various situations:

start a working area (see also: git help tutorial)
   clone     Clone a repository into a new directory
   init      Create an empty Git repository or reinitialize an existing one

work on the current change (see also: git help everyday)
   add       Add file contents to the index
   mv        Move or rename a file, a directory, or a symlink
   restore   Restore working tree files
   rm        Remove files from the working tree and from the index

examine the history and state (see also: git help revisions)
   bisect    Use binary search 

```{tip}
You can execute commands in bash from a Jupyter notebook by adding the `!` in front. You can also add the magic `%%bash` at the top of the cell to indicate that all the code in the cell must be executed in the shell. 
```

```{warning}
If well you can do most of the following git operations from a notebook, we actually recommend you to execute this commands directly from the shell. 
```

First we create an empty folder and then we initialize it as a git repository. From your terminal, run

```bash
git init test
```
Let's see what git just did. Move inside to this folder and use the listing function to show all the files and directories inside the test repository `test`:
```bash
cd test 
ls -la
```

```{note}
What makes `test` a repository? If you pay attention to what is inside the folder, you will see an especial folder called `.git`. Here is where all the information that git is using to version control your files belongs. If you delete this folder, then your repository will pass to be a simple folder with no version control capabilities. 
```

Now let's edit our first file in the test directory with a text editor. We can also create a new text file with the next command.
```bash
echo "My first bit of text" > file1.txt
```

The first step we have to do in order to version control this new file, we need to **add** it
```bash
git add file1.txt
```

We can now ask git about what happened with `status`:
```bash
git status
```

The next step is to **commit** our changes to permanently record our changes in git's database. For now, we are *always* going to call `git commit` either with the `-a` option *or* with specific filenames (`git commit file1 file2...`).  This delays the discussion of an aspect of git called the *index* (often referred to also as the _staging area_) that we will cover later.  Most everyday work in regular scientific practice doesn't require understanding the extra moving parts that the index involves, so on a first round we'll bypass it.  Later on we will discuss how to use it to achieve more fine-grained control of what and how git records our actions.
```bash
git commit -a -m"This is our first commit"
```
In the commit above, we  used the `-m` flag to specify a message at the command line.  If we don't do that, git will open the editor we specified in our configuration above and require that we enter a message.  By default, git refuses to record changes that don't have a message to go along with them (though you can obviously 'cheat' by using an empty or meaningless string: git only tries to facilitate best practices, it's not your nanny).

````{tip}
You can use `git log` to see what has been commited so far
```bash
git log
```
Sometimes it's handy to see a very summarized version of the log:
```bash
git log --oneline --topo-order --graph
```
Git supports *aliases:* new names given to command combinations. Let's make this handy shortlog an alias, so we only have to type `git slog` and see this compact log:
```bash
git config --global alias.slog "log --oneline --topo-order --graph"
```
and now we can use this new alias to print a short version of the commit history
```bash
git slog
```
````

Let's do a little bit more work... Again, in practice you'll be editing the files by hand, here we do it via shell commands for the sake of automation (and therefore the reproducibility of this tutorial!)
```bash
echo "And now some more text..." >> file1.txt
```

And now we ask git what is different
```bash
git diff
```

The format of the output above is well explained in detail in [this Stack Overflow post](https://stackoverflow.com/questions/2529441/how-to-read-the-output-from-git-diff).  But we can provide a brief summary here:

```
diff --git a/file1.txt b/file1.txt
```

This tells us which files changed overall, with 'a' representing the old path and 'b' the new one (in this case it's the same file, though if a file had been renamed it would be different).

```
index ce645c7..4baa979 100644
```
These are hashes of the file at the two stages, needed by git itself for other operations with the diff output.

The next block shows the actual changes. The first two lines show which paths are being compared (in this case the same file, `file1.txt`): 


```
--- a/file1.txt
+++ b/file1.txt
```

The next line indicates where the changes happened. The format is `@@ from-file-range to-file-range @@`, where there's one more `@` character than there's parents to the file comparison (git can handle multi-way diff/merges), adn the file range format is `-/+<start line>,<# of lines>`, with `-` for the `from-file` and `+` for the `to-file`:

```
@@ -1 +1,2 @@
```

Lines prepended with `-` correspond to deletions (none in this case), and lines with `+` to additions. A few lines around deletions/additions are shown for context:

```
 My first bit of text
+And now some more text...
```

And for now on, the circle of virtue just repeats: work, commit, work, commit
```bash
git commit -a -m"I have made great progress on this critical matter."
```

While `git add` is used to add files to the list git tracks, we must also tell it if we want their  names to change or for it to stop tracking them.  In familiar Unix fashion, the `mv` and `rm` git commands do precisely this:
```bash
git mv file1.txt file-newname.txt
git status
```

Note that these changes must be committed too, to become permanent!  In git's world, until something hasn't been committed, it isn't permanently recorded anywhere.

## Apendix

These is the sequence of all bash commands we have use in this tutorial in the right order. 
```bash
git init test 
...
...
...
```