The desire to format or lint code throughout history has arisen several times. It's more natural to do this in filter-branch since it somewhat forces people to run external commands, but we have an example contrib demo that shows how to run an external command on each file in history that I created even before any of these requests came in and yet I still periodically get requests about it. Make lint-history ever-so-slightly easier to apply to a subset of filenames, and include its usage as an extra cheat sheet comparison for filter-branch-vs-filter-repo commands. Signed-off-by: Elijah Newren <newren@gmail.com>
11 KiB
Cheat Sheet: Converting from filter-branch
This document is aimed at folks who are familiar with filter-branch and want to learn how to convert over to using filter-repo.
Table of Contents
- Half-hearted conversions
- Intention of "equivalent" commands
- Basic Differences
- Cheat Sheet: Conversion of Examples from the filter-branch manpage
- Cheat Sheet: Additional conversion examples
Half-hearted conversions
You can switch nearly any git filter-branch
command to use
filter-repo under the covers by just replacing the git filter-branch
part of the command with
filter-lamely
. The
git.git regression testsuite passes when I swap out the filter-branch
script with filter-lamely, for example. (However, the filter-branch
tests are not very comprehensive, so don't rely on that too much.)
Doing a half-hearted conversion has nearly all of the drawbacks of filter-branch and nearly none of the benefits of filter-repo, but it will make your command run a few times faster and makes for a very simple conversion.
You'll get a lot more performance, safety, and features by just switching to direct filter-repo commands.
Intention of "equivalent" commands
filter-branch and filter-repo have different defaults, as highlighted in the Basic Differences section below. As such, getting a command which behaves identically is not possible. Also, sometimes the filter-branch manpage lies, e.g. it says "suppose you want to...from all commits" and then uses a command line like "git filter-branch ... HEAD", which only operates on commits in the current branch rather than on all commits.
Rather than focusing on matching filter-branch output as exactly as possible, I treat the filter-branch examples as idiomatic ways to solve a certain type of problem with filter-branch, and express how one would idiomatically solve the same problem in filter-repo. Sometimes that means the results are not identical, but they are largely the same in each case.
Basic Differences
With git filter-branch
, you have a git repository where every single
commit (within the branches or revisions you specify) is checked out
and then you run one or more shell commands to transform the working
copy into your desired end state.
With git filter-repo
, you are essentially given an editing tool to
operate on the fast-export
serialization of a repo. That means there is an input stream of all
the contents of the repository, and rather than specifying filters in
the form of commands to run, you usually employ a number of common
pre-defined filters that provide various ways to slice, dice, or
modify the repo based on its components (such as pathnames, file
content, user names or emails, etc.) That makes common operations
easier, even if it's not as versatile as shell callbacks. For cases
where more complexity or special casing is needed, filter-repo
provides python callbacks that can operate on the data structures
populated from the fast-export stream to do just about anything you
want.
filter-branch defaults to working on a subset of the repository, and
requires you to specify a branch or branches, meaning you need to
specify -- --all
to modify all commits. filter-repo by contrast
defaults to rewriting everything, and you need to specify --refs <rev-list-args>
if you want to limit to just a certain set of
branches or range of commits. (Though any <rev-list-args>
that
begin with a hyphen are not accepted by filter-repo as they look like
the start of different options.)
filter-repo also takes care of additional concerns automatically, like rewriting commit messages that reference old commit IDs to instead reference the rewritten commit IDs, pruning commits which do not start empty but become empty due to the specified filters, and automatically shrinking and gc'ing the repo at the end of the filtering operation.
Cheat Sheet: Conversion of Examples from the filter-branch manpage
Removing a file
The filter-branch manual provided three different examples of removing a single file, based on different levels of ease vs. carefulness and performance:
git filter-branch --tree-filter 'rm filename' HEAD
git filter-branch --tree-filter 'rm -f filename' HEAD
git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
All of these just become
git filter-repo --invert-paths --path filename
Extracting a subdirectory
Extracting a subdirectory via
git filter-branch --subdirectory-filter foodir -- --all
is one of the easiest commands to convert; it just becomes
git filter-repo --subdirectory-filter foodir
Moving the whole tree into a subdirectory
Keeping all files but placing them in a new subdirectory via
git filter-branch --index-filter \
'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
git update-index --index-info &&
mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD
(which happens to be GNU-specific and will fail with BSD userland in very subtle ways) becomes
git filter-repo --to-subdirectory-filter newsubdir
(which works fine regardless of GNU vs BSD userland differences.)
Re-grafting history
The filter-branch manual provided one example with three different commands that could be used to achieve it, though the first of them had limited applicability (only when the repo had a single initial commit). These three examples were:
git filter-branch --parent-filter 'sed "s/^\$/-p <graft-id>/"' HEAD
git filter-branch --parent-filter \
'test $GIT_COMMIT = <commit-id> && echo "-p <graft-id>" || cat' HEAD
git replace --graft $commit-id $graft-id
git filter-branch $graft-id..HEAD
git-replace did not exist when the original two examples were written, but it is clear that the last example is far easier to understand. As such, filter-repo just uses the same mechanism:
git replace --graft $commit-id $graft-id
git filter-repo --force
NOTE: --force should usually be avoided unless you have taken care to make sure you have a backup (or are running on a fresh clone of) your repo. It is needed in this case because filter-repo errors out when no arguments are specified, and because it usually first checks whether you are in a fresh clone before irrecoverably rewriting your repository (git-replace created a new graft and thus added something to your previously fresh clone).
Removing commits by a certain author
WARNING: This is a BAD example for BOTH filter-branch and filter-repo.
It does not remove the changes the user made from the repo, it just
removes the commit in question while smashing the changes from it into
any subsequent commits as though the subsequent authors had been
responsible for those changes as well. git rebase
is likely to be a
better fit for what you really want if you are looking at this
example. (See also this explanation of the differences between
rebase and
filter-repo)
This filter-branch example
git filter-branch --commit-filter '
if [ "$GIT_AUTHOR_NAME" = "Darl McBribe" ];
then
skip_commit "$@";
else
git commit-tree "$@";
fi' HEAD
becomes
git filter-repo --commit-callback '
if commit.author_name == b"Darl McBribe":
commit.skip()
'
Rewriting commit messages -- removing text
Removing git-svn-id: lines from commit messages via
git filter-branch --msg-filter '
sed -e "/^git-svn-id:/d"
'
becomes
git filter-repo --message-callback '
return re.sub(b"^git-svn-id:.*\n", b"", message, flags=re.MULTILINE)
'
Rewriting commit messages -- adding text
Adding Acked-by lines to the last ten commits via
git filter-branch --msg-filter '
cat &&
echo "Acked-by: Bugs Bunny <bunny@bugzilla.org>"
' master~10..master
becomes
git filter-repo --message-callback '
return message + b"Acked-by: Bugs Bunny <bunny@bugzilla.org>\n"
' --refs master~10..master
Changing author/committer(/tagger?) information
git filter-branch --env-filter '
if test "$GIT_AUTHOR_EMAIL" = "root@localhost"
then
GIT_AUTHOR_EMAIL=john@example.com
fi
if test "$GIT_COMMITTER_EMAIL" = "root@localhost"
then
GIT_COMMITTER_EMAIL=john@example.com
fi
' -- --all
becomes either
# Ensure '<john@example.com> <root@localhost>' is a line in .mailmap, then:
git filter-repo --use-mailmap
or
git filter-repo --email-callback '
return email if email != b"root@localhost" else b"john@example.com"
'
(and as a bonus both filter-repo alternatives will fix tagger emails too, unlike the filter-branch example)
Restricting to a range
The partial examples
git filter-branch ... C..H
git filter-branch ... C..H ^D
git filter-branch ... D..H ^C
become
git filter-repo ... --refs C..H
git filter-repo ... --refs C..H ^D
git filter-repo ... --refs D..H ^C
Note that filter-branch accepts --not
among the revision specifiers,
but that appears to python to be a flag name which breaks parsing.
So, instead of e.g. --not C
as we might use with filter-branch, we
can specify ^C
to filter-repo.
Cheat Sheet: Additional conversion examples
Running a code formatter or linter on each file with some extension
Running some program on a subset of files is relatively natural in filter-branch:
git filter-branch --tree-filter '
git ls-files -z "*.c" \
| xargs -0 -n 1 clang-format -style=file -i
'
filter-repo decided not to provide a way to run an external program to do filtering, because most filter-branch uses of this ability are riddled with safety problems and performance issues. However, in special cases like this it's fairly safe. One can write a script that uses filter-repo as a library to achieve this, while also gaining filter-repo's automatic handling of other concerns like rewriting commit IDs in commit messages or pruning commits that become empty. In fact, one of the contrib demos, lint-history, handles this exact type of situation already:
lint-history --relevant 'return filename.endswith(b".c")' \
clang-format -style=file -i