PerforceToGitOrSubversionReplication

From LucidDB Wiki
Jump to: navigation, search

Contents

Overview

This page describes how to replicate the depot from perforce.eigenbase.org to either git or Subversion repositories.

Benefits:

  • Code and history of the main repositories are easily available to anyone for browsing in their browser or checking out if they have git or subversion installed.
  • Repository is hosted on GitHub which provides a fall-back when there is Eigenbase server downtime.
  • Contributing could become very easy with more git usage.

Drawbacks:

  • Branch tracking is currently unsupported.
  • Not all subversion clients are supported.

Cron Tasks

At the time of writing, only our partitioned repository's master branch (equivalent to //open/dev), each individual submodule's master branch, and Mondrian's master branch are actively synced and updated with the perforce sources. This is due to a limitation with the git-p4 script which may or may not be trivial to resolve, see http://issues.eigenbase.org/browse/UTL-14 for any updates to this status or if you would like to contribute a fix.

Current State

The easiest way to browse the source of our projects is to head over to our master git super-repository (also here for a non-partitioned version with several branches) for LucidDB-related projects, or our mondrian mirror.

If you want to check out our code using git or svn, it is usually easiest just to clone / checkout the individual project you are interested in:

# Clone just LucidDB
git clone http://github.com/eigenbase/luciddb.git
# Or checkout with SVN:
svn checkout http://svn.github.com/eigenbase/luciddb.git ./luciddb

# Clone just Mondrian
git clone http://github.com/eigenbase/mondrian.git
# Or checkout with SVN:
svn checkout http://svn.github.com/eigenbase/mondrian.git ./mondrian

However, the eigenbase super-repository exists to facilitate checking out all projects (except Mondrian):

# First clone the super-repo
git clone http://github.com/eigenbase/eigenbase_partitioned.git
# Initialize the submodules
git submodule init
# Fetch and update (this will download all of them!)
git submodule update
# The last two steps can be shortened to just:
# git submodule update --init
# with git >= 1.7

And if you don't want the partitioned version, simply omit "_partitioned" from the above command and ignore the information about submodules.

Again, if you just want a single project here, you can specify the name in the git submodule command and git will only fetch the specified repository. Otherwise you will be downloading quite a few megabytes of data!

SVN support here is again lacking. The super repository contains a shell script to checkout the submodules after the super repository has been checked out, since subversion does not do this automatically:

# First checkout the super-rep
svn checkout http://svn.github.com/eigenbase/eigenbase_partitioned.git ./eigenbase_partitioned
# If you look into one of the submodule directories, you'll see that it is empty. Run the shell script to check them out.
cd eigenbase_partitioned
./SVN_get_submodules.sh

SVN users: If you get an error similar to "svn: REPORT of '/eigenbase/.../!svn/vcc/default': 200 OK (https://svn.github.com)", cd into the repository and try "svn resolved *; svn up" to complete the checkout. This is a GitHub problem with no currently reliable solution.

Incremental replication runs via crontab every hour, so that is the maximum lag you should see between when a change gets checked into Perforce and when it appears on GitHub.

Tools

If you would like to see how we are performing the replication, or if you want to make one yourself (either hosted on github or elsewhere), please see: HowToReplicatePerforceToGit.

Why Submodule(s)?

While git can theoretically handle large repositories just fine (the Linux kernel uses it, for example), it is nevertheless good practice to separate large binary blobs, which generally don't need to be diffed and have long update cycles, from actively developed source code. Git is a source control manager, not a binary store!

Secondly, submodules future-proof the Eigenbase project for a move from Perforce to git as the dominant developing SCM. We're using GitHub to host not just because GitHub is reliable and has good features, but because it is an incredible resource for collaborative development. However, free accounts do have (at the time of writing) a soft upper-limit of 300 MB storage, and we want people to be able to fork the main parts of Eigenbase into their own accounts to take advantage of GitHub's features as well.

Unfortunately, the presence of submodules complicates branching. An alternative model is to maintain the mega repository in its complete state (not submoduled as it is currently) on e.g. //open/dev with all its branches as git branches, and then submoduling into separate disconnected repos from there. This way, if someone wants to fork part of the project, they fork a submodule and hopefully we can integrate any changes upward. If a user doesn't want to have their own GitHub repository they can always clone and keep it local.

Personal tools
Product Documentation