March 19th, 2023 ![]() Suppose you have a code repository that you want to open source. You go ahead and add all your license, contributor agreement, README, and any other files, but you also have a few changes that you know will remain internal. That is, you know that your open source'd code will not be the authoritative copy, but a fork of your closed copy that you will have to track separately. Now git is supposed to help you with this, and you know you want a local branch as well as a second remote repository, but git is notoriously unintuitive at times, so here's the quick set up for this scenario: We start out with a simple git repository with just a single remote and our main branch: $ ls Makefile README.md doc private src test $ git branch * main $ git remote -v origin ssh://git@git.our-internal.example.com:/~jschauma/whatever.git (fetch) origin ssh://git@git.our-internal.example.com:/~jschauma/whatever.git (push) $ Now let's create a new branch for open sourcing this code. This will become the authoritative open source upstream, so we want to remove the private data and add our open source files. But note: simply using git rm would leave the private file(s) still in the git metadata, and thus would be leaked to the open source repository. Instead, we need to perform some more invasive git surgery using git filter-branch: $ git checkout -b opensource Switched to a new branch 'opensource' $ export FILTER_BRANCH_SQUELCH_WARNING=1 $ git filter-branch --tree-filter 'rm -fr private' HEAD Rewrite a98f9f39cba2be0268feee5082272405b433dc53 (1/1) (0 seconds passed, remaining 0 predicted) Ref 'refs/heads/opensource' was rewritten $ $EDITOR LICENSE CONTRIBUTING $ git add LICENSE CONTRIBUTING $ git commit -m 'ready for open source' [opensource 611db4c] ready for open source 2 files changed, 2 insertions(+) create mode 100644 CONTRIBUTING create mode 100644 LICENSE $ Next, we initialize the new remote repository. E.g., on GitHub, you might create a new (empty) repository. This will be the repository that we push our open source branch into. We then add the new repository as a second remote and push our branch there as the main branch: $ git branch main * opensource $ git remote add github git@github.com:jschauma/second-remote-example.git $ git remote -v github git@github.com:jschauma/second-remote-example.git (fetch) github git@github.com:jschauma/second-remote-example.git (push) origin ssh://git@git.our-internal.example.com:/~jschauma/whatever.git (fetch) origin ssh://git@git.our-internal.example.com:/~jschauma/whatever.git (push) $ git push --set-upstream github opensource:main # "main" here is the remote "main" Enumerating objects: 25, done. Counting objects: 100% (25/25), done. Compressing objects: 100% (20/20), done. Writing objects: 100% (25/25), 9.32 KiB | 1.86 MiB/s, done. Total 25 (delta 3), reused 0 (delta 0), pack-reused 0 remote: Resolving deltas: 100% (3/3), done. To github.com:jschauma/second-remote-example.git * [new branch] opensource -> main branch 'opensource' set up to track 'github/main'. $ And there you go, we're all set. Your main branch remains untouched, you can sync changes from there into your opensource branch or pull in changes from others from GitHub. On second thought, let's not go to Camelot...Note that information in the git history / reflog easily leaks. In fact, the first draft of this blog post did just use git rm and thus leaked the private information. If you want to be on the safe side, then it's better to have two actually separate repositories: $ ls Makefile README.md doc private src test $ git branch * main $ git remote -v origin ssh://git@git.our-internal.example.com:/~jschauma/whatever.git (fetch) origin ssh://git@git.our-internal.example.com:/~jschauma/whatever.git (push) $ cd .. $ git clone git@github.com:jschauma/second-remote-example.git Cloning into 'second-remote-example'... warning: You appear to have cloned an empty repository. $ cd second-remote-example $ cp -R ../whatever/* . $ rm -fr private $ $EDITOR LICENSE CONTRIBUTING $ git add * $ git commit -m 'open source version of whatever' [main (root-commit) ce7145b] open source version of whatever 13 files changed, 887 insertions(+) [...] $ git push Enumerating objects: 19, done. Counting objects: 100% (19/19), done. Compressing objects: 100% (16/16), done. Writing objects: 100% (19/19), 8.91 KiB | 2.23 MiB/s, done. Total 19 (delta 1), reused 0 (delta 0), pack-reused 0 remote: Resolving deltas: 100% (1/1), done. To github.com:jschauma/second-remote-example.git * [new branch] main -> main $ The disadvantage is that now you have two separate repositories that you have to keep in sync, but on the upside, you can't (quite as) easily leak data from the internal repository to the outside world. It just goes to show once more that when it comes to using git, it's never as easy as one might like... March 19th, 2023 This page mainly exists because I forget the order and syntax of the right git commands. :-) Links: |