git tutorial for latex projects

Till Bargheer, Niklas Beisert, 2014 (2014/09/14)

Introduction

The philosophy behind git is independent distributed development of a joint project. It is well suited for small distributed projects with only a few files such as latex source. This tutorial describes the essential steps in contributing to such a project. Apart from this there are many tutorials on git and almost all potential cases to be found in the web. A recommended resource is http://git-scm.com.

Basic Definitions

Basic operations

See also sketch.

Strategy

The general strategy to introduce a change to the project is to fetch new commits from the remote repository to the local repository (fetch) and then apply the newly fetched commits to the working directory (merge or rebase). Now the work can be done in the working directory. The work is then committed to the local repository (commit) and pushed to the remote repository (push).

When several collaborators work on a common state, the history will inevitably diverge creating a branched tree of commits. Git can handle such branched trees in the repositories. However, when pushing the local to the remote repository, their structures must be compatible: a branch can only be extended when the head of the remote branch lies in the history of the head of the local branch. Otherwise a push will fail calling for updates on the local repository: branches can be combined (merge), reorganized (rebase), or removed. Such an update shall make the local structure compatible to the remote and enable the push.

Update considerations for latex

The storage of updates and in particular the automatic and manual conflict resolution is based on diff. Diff is a line-based tool, and therefore the structure of changes is most apparent if

This is the most natural outcome for a classical text editor without automatic line breaking. A latex editor / environment should be configured not to restructure text blocks to fit a given width. Ideally, the lines are broken manually at a full stop, punctuation (equality, relation) or somewhere in the middle of long sentences (equations).

Single commits should be atomic rather than collective:

It makes sense to commit frequently (after each atomic step) to the local repository. After some piece of work is completed, the commits can be pushed collectively to the remote repository.

Advantages over other tools

Compared to simple file-synchronization tools and services (e.g. dropbox), git has multiple advantages. The most important ones are:

Software

git is a set of command line tools. There are many useful gui extensions to simplify the workflow.

Tools

Installation

Setup

You should set up your contact details to identify yourself as the author of your commits

git config [--global] user.name "First Last"
git config [--global] user.email "em@i.l"

You can do this globally or on a project basis.

Step by step example

We start with a simple example demonstrating a typical workflow step by step. This example can be performed at the command line in a Linux environment. Most of the below operations are also available in the graphical user interfaces.

Say you want to work on a project that already has a remote repository user@server.domain:repository. First, create a local copy of the repository by

git clone user@server.domain:repository

This will create a subdirectory named repository in the current working directory that contains all the files belonging to the project.

When you want to start working on your files, first do a

git pull

which will download all new commits from the remote to your local repository, and update your local files (evidently, this operation is unnecessary right after the initial download, but usually it is the first step when you start working). Say the project contains a latex file paper.tex that you want to edit. Enter the directory and do your editing. Now a

git status

will show some information about the repository, in particular it will say

Changes not staged for commit:
  ...
     modified:    paper.tex

This means that the working copy of the file has changed, but the changes have not yet been saved to the repository. After you finish your editing, do

git add paper.tex

The command marks the file paper.tex as being ready for submission to your local repository. Another git status now shows

Changes to be committed:
  ...
     modified:    paper.tex

You can finally save your changes with

git commit -m "Added proof of theorem X"

including a useful summary of what you did. The current status of your project has now been saved in a new commit. You could also skip the explicit git add step by using git commit -a -m "...", which will automatically commit all changed files. You can see the recent history, including your new commit, with

git log --oneline --graph -5

This shows the previous five commits.

After you are done editing and have saved your work in (possibly multiple) commits, you can make your changes available to your collaborators by

git push

This will upload your new commits to the remote repository, and will update the remote files accordingly.

When you want to continue working on your files later on, again do a git pull to update your local files. They will now include all changes that have been made by your collaborators in the meantime. Repeat the edit-add-commit cycle, and push when you are done. You can also use git pull as often as you like to inspect the changes of the others even if you are not planning to edit the project yourself.

If someone else uploads changes to the remote repository while you edit, your git push will fail with an error. Just do a git pull to automatically combine all changes made by others with all changes made by you. A subsequent git push should now succeed. If an automatic merge is not possible because two authors have simultaneously changed the same part of the same file, git will run into a conflict. Don't panic. Conflicts are usually easy to resolve, see resolving conflicts below.

In case you forget to do git pull before you edit, don't worry. Unless two authors have simultaneously changed the same part of the same file, git pull will do the job of merging the changes. You can run git pull whenever your local repository is in a clean state, i.e. when there are no changes that have not been committed.

Modes for filling the repository

There are many ways to fill the repository with concurrent updates. The amount of work and the final result is more or less the same for all methods. However, the way the tree of commits builds up depends on the order and type of actions taken. Depending on the situation, different types of trees will simplify the collaboration and the resolution of issues. Several models are discussed in the following. One should agree on one mode and stick to it as far as possible. It is not necessary to understand all the other modes.

Merge on master branch

This model can be viewed as the default mode of operation. Here the tree is branched and merged with only a single remote branch (commonly: master). Everyone applies one's new commits onto one's local branch head and uses merge if necessary to unite with the remote branch head.

     E--F (master)         merge       E--F--G (master)       push       E--F--G (master,
    /                       ===>      /     /                 ===>      /     /  origin/master)
A--B--C--D (origin/master)        A--B--C--D (origin/master)        A--B--C--D

Advantages / disadvantages

Introducing changes

Inspecting updates by other contributors

Normally, one would pull to the latest update and then inspect it in the working directory.

In the case of changes to the working directory (not in clean state) pull will not work. Then you may use git stash save (before pull) and git stash pop (after pull) to save the changes to the working directory across the update. Stash save and pop is similar to a merge of the working directory with the new branch head.

Rebase on master branch

This model is similar to the above one, but it produces a different history of commits which may improve the tracking of changes: Here there is only a single remote branch which has a completely linear history. Everyone applies one's new commits onto the local branch head and uses rebase if necessary to place them on top of the remote branch head.

     E--F (master)         rebase            E'-F' (master)  push             E'-F' (master,
    /                       ===>            /                ===>            /      origin/master)
A--B--C--D (origin/master)        A--B--C--D (origin/master)       A--B--C--D

Advantages / disadvantages

Introducing changes

Note: it makes sense to set the default pull behaviour to rebase by git config branch.master.rebase true.

Inspecting updates by other contributors

Normally, one would fetch and rebase to the latest update and then inspect it in the working directory

In the case of changes to the working directory (not in clean state) rebase will not work. Then you may use git stash save (before rebase) and git stash pop (after rebase) to save the changes to the working directory across the update. Stash save and pop is similar to a rebase of the working directory onto the new branch head.

User branches

Here the tree is branched and merged with a single branch head for each contributor. Everyone applies one's new commits onto one's own branch head. Merge is used to import the changes of the other developers.

Advantages / disadvantages

Introducing changes

Inspecting updates by other contributors

There are two useful methods:

In both cases git fetch --all will show updates to the various remote branches.

Development branches

Here there is a central main branch of development (master), plus (temporary) side branches for every development. One creates a temporary branch to work on. This branch can also be pushed without creating conflicts. At a reasonable stage, the temporary branch can be merged back into the master branch.

     E--F--G         M--N--O--P--Q--R--S--T (master, origin/master)
    /       \       /     /          \
A--B--C---D--H--I--J--K--L            U--V--W--X (devel)

Advantages / disadvantages

Introducing changes

Inspecting updates by other contributors

There are two useful methods:

Download repository

To get started, go to the parent directory of the intended working directory, then:

git clone user@server.domain:repository [-b branch] [target-directory]

where:

Note: different formats for specifying the remote repository are in use depending on the type of remote service; here we mainly refer to the system gitolite which can be set up easily for non-commercial purposes.

Introducing changes

Commit

To save the changes in the working directory to the local repository:

git commit [-a] [-m "message"] [file(s)] [--amend]

The remote server will not be contacted during the above. These changes cannot be seen by others, yet. Therefore, they could still be edited or undone (with some effort, but without penalty).

Staging

The files of the working directory need to be staged before committing:

git add file(s)

Alternatively all modified (and previously staged) files can be committed using the -a option for commit.

Similarly, you should let git know about files you want to delete, rename or move in order for it to track them properly. The syntax of the commands is similar to the linux counterparts with prepended git. To remove, rename or move files from the working directory and from the subsequent commit, use, respectively

git rm file(s)
git mv file newname
git mv file(s) newdirectory

Push

To export the changes in the local repository to the remote repository.

git push [origin branch]

This operation will fail if the remote state is not in the history of local state (rather: branches). In that case, one has to resolve all issues on the local repository by pulling the remote changes into the local repository. As soon as the repositories are in compatible states the push operation will succeed.

The changes will be seen by others and therefore cannot be undone anymore.

The optional destination parameter origin branch overrides the standard remote target branch.

.gitignore

Normally, git considers all contents of the working directory as part of the repository. However some files are automatically generated, binary, big, backup and/or log files which are not meant to be tracked in the repository. git will notice extra files in the working directory which have not been incorporated into the repository, and warn about their presence.

One can mark certain files or classes of files to be ignored by git. A list of these files is stored as in the file .gitignore (typically in the main directory) which itself is a file of the repository and must be added explicitly.

In latex projects one might exclude files such as *.pdf, *.log, certain subdirectories. At the same time one would like to include figure files such as Fig*.pdf. A sample .gitignore file might look as follows:

#exclude generated files
*.pdf
*.aux
*.log

#include figure files
!Fig*.pdf

#some excluded directories
/extra

For small projects with a handful of files, it might be more suitable to exclude all files by default, and include individual files explicitly, as in the following example:

#exclude all files
*

#include ignore file
!/.gitignore

#include source file(s)
#add any relevant file that is not generated
!/paper.tex

#include figure files
!/Fig*.pdf

Alternatively, exclude/include rules can be specified in the file .git/info/exclude, which has the same format as the .gitignore file. The difference is that .git/info/exclude is not shared with other repositories, it remains local. Hence it is useful for excluding files that only exist in your personal working directory (for example files that you use to generate figures). Note that rules in .gitignore take precedence over rules in .git/info/exclude.

Updating repository

Pull

The pull operation is equivalent to fetch and merge or fetch and rebase.

git pull [--rebase] [origin branch]

After the initial fetch, pull works in the local repository only. Changes introduced by merge or rebase must (eventually) be pushed to the remote repository.

Without parameters, the standard remote branch is pulled (typically master). For a project with many branches (see above) one will typically pull a specific branch to merge their states.

Fetch

To download updates from the remote repository to the local repository. Invoked implicitly by git pull or explicitly by:

git fetch

By default, all remote branches are fetched.

Merge

Merge is the default behavior for pull. After an explicit git fetch, merge can be invoked as

git merge otherbranch

which replays all changes introduced by otherbranch on top of the current branch, creates a new commit that reflects the combined changes, and updates the working directory accordingly. The new commit will have both previous branch heads as parents. The head of otherbranch remains unchanged, while the head of the current branch gets updated to the new commit.

Rebase

Rebase is similar to merge, but it changes the history of the current branch such that it will be based on the head of another branch. The current branch is effectively detached from the common ancestor of both branches, and gets appended to the head of the other branch. All the changes within the current branch are rewritten such that they will refer to the head of the other branch. This creates a linear history (with only single parent commits) which may be easier to trace back. After an explicit git fetch, it can be called as

git rebase otherbranch [mybranch]

This command rebases mybranch on top of otherbranch. When mybranch is omitted, it rebases the current branch on top of otherbranch.

If one prefers rebase over merge, one can make rebase the default behaviour for pulling the branch branch by

git config branch.branch.rebase true

Note that you should never rebase a commit that has already been pushed to a remote repository. Such a commit may already be in use by other contributors. By modifying it, you are bound to create a mess. Hence a rebase should only ever be applied locally.

Pull results

A pull operation will have one of the following results which leave the local repository and working directory in some state:

Resolving conflicts

When a pull request has detected changes that require manual resolution, the working directory will be in conflict state. The conflicts should be resolved and afterwards the changes must be committed (merge) or the rebase continued (rebase).

Manual resolution

The conflicting files will contain sections marked by:

<<<<<<< first version tag
text in first version
||||||| common ancestor version tag
text in common ancestor version
=======
text in second version
>>>>>>> second version tag

These sections should be edited to represent the desired final state. The markup must be removed by hand.

The common ancestor may be useful in resolving the conflict. It is not displayed by default; it must be enabled by

git config [--global] merge.conflictstyle diff3

Diff

Git can also invoke a graphical diff tool to resolve the conflict more intuitively:

git mergetool [--tool=meld]

Continue

Once all conflicts have been resolved, the operation must be completed.

For a conflict during merge, the working directory reflects the intended merge commit. Hence, one should commit the working directory

git add file(s)
git commit ...

or instead of adding individual files, may use git commit -a.

For a conflict during rebase, the working directory reflects the commit to be modified to fit the new history. The rebase operation should be continued with

git add file(s)
git rebase --continue

Further commits to be rebased may follow.

Note that all changes due to a merge or rebase are local and have to be pushed (eventually).

Where am I?

If unclear about the present status of the working directory:

git status

This lists the conflicting files:

git ls-files -u

Give up

There is also the option to give up on the merge or rebase operation in progress and revert to the state before the pull. For a merge operation use

git merge --abort

alternatively

git reset --hard

For a rebase operation use

git rebase --abort

For example, this is a useful option if some half-finished manual resolution cannot be undone otherwise. However, all changes will be lost.

Help me!

If you find no way to resolve a conflict, you can upload your changes to the remote repository in a new branch, so that someone else can take care of integrating your changes. For this purpose, first reset your repository to a sane state with git merge --abort or git rebase --abort. Then create a new branch head named helpme (or anything else sensible) with

git branch helpme

Now push your new branch to the remote repository with

git push origin helpme:helpme

This creates a copy of your newly created branch on the remote repository.

Once someone else has merged your changes into the master branch, simply download the update via git pull, and delete your temporary branch with

git branch -d helpme

Note that git branch creates a new branch label, but does not switch to it. It is not necessary to switch to the new branch for uploading it to the remote repository.

Further useful information

Help

To figure out what a command does and which options it takes:

man git command

or, depending on your installation, man git-command will show an extensive description of the git command command. In addition, there is the book at http://git-scm.com, as well as answers to almost every possible question at stackoverflow.

Color

A very helpful option is to add color to all git output with

git config --global color.ui auto

How to branch

To create a new branch:

git branch branchname [startpoint]

creates a new branch named branchname at commit/branch startpoint. If startpoint is omitted, the new branch will be created at the current branch head.

To switch to the new branch, do

git checkout branchname

To create and switch to the new branch in one go:

git checkout -b branchname [startpoint]

Tracking branches

Branches can be configured to track other branches. The branch that is tracked is called the tracking branch. It is the branch that gets merged or updated when git pull or git git push is called without arguments. When you clone a remote repository, your master branch by default tracks the remote master branch (which is labeled by origin/master) in your local repository. To set origin/remotebranch as the tracking branch for your existing branch localbranch, do

git branch --set-upstream-to=origin/remotebranch localbranch

You can check your branch configuration, including the tracked branches, with

git branch -vv

Tags

Tags are a way to mark a certain status (milestone) of your work with a special label. For example, when you upload a paper to the arXiv, you might want to tag the corresponding commit with arxiv-v1 by

git tag arxiv-v1 [commit|branch]

If specified, commit or branch get tagged. By default, the current head gets tagged.

Show history

You can show the history of your repository with

git log

The output of this command can be configured in many ways. For example

git log --graph --all --decorate --pretty=format:'%C(auto)%h %d %s %C(blue bold)<%an> %C(cyan)(%cd)' --date=relative -20

shows the latest 20 commits, nicely formatted.

Aliases

To avoid typing long commands over and over again, git lets you specify aliases. For example, to specify an alias sl (for (s)hort(l)og) for the above log command, do

git config --global alias.sl "log --graph --all --decorate --pretty=format:'%C(auto)%h %d %s %C(blue bold)<%an> %C(cyan)(%cd)' --date=relative -20"

Now a simple git sl shows you the nicely formatted recent history. Git aliases allow you to execute arbitrary commands, so you can get very creative. For example, after a

git config --global alias.al "\!git config -l | grep alias | cut -c 7- | sed 's/=/\t/'"

you can see all your defined aliases with a simple git al.

Aliases are stored in git's global config file, which can be edited with

git config --global -e

Exercise: Create an alias ec for this command.

Stash

Simetimes you want to check out what someone else has just done while you are in the middle of editing yourself. Instead of saving your half-way edit into a dedicated commit, you can save it into the stash with

git stash

This saves your changes and resets your working directory to the previous commit. You can now pull/merge/rebase, or do further editing. Later on, you can reapply the changes that you had stashed away with

git stash pop

In case applying the stash fails due to conflicts, you need to resolve the conflicts. Afterwards you can delete the stash with

git stash drop

If the stash applies cleanly (no conflicts), git stash pop implicitly does git stash drop.

Hooks

You can automate steps in your workflow with so-called hooks. Hooks are arbitrary shell scripts located in .git/hooks/ that get executed everytime a specific git command is called. Examples that include usage instructions should already be located in your .git/hooks/ directory.

For example, you could install a script .git/hooks/post-update that automatically compiles your latex document everytime the repository gets updated by a commit/pull/merge/rebase.

Advanced topics

Git server

To setup your own git server to share repositories with others, try gitolite. Server requirements: Shell access, git, perl, openssh.