Pruning of commits which become empty can result in a variety of
topology changes: a merge may have lost all its ancestors corresponding
to one of (or more) of its parents, a merge may end up merging a commit
with itself, or a merge may end up merging a commit with its own
ancestor. Merging a commit with itself makes no sense, so we'd rather
prune down to one parent and hopefully prune the merge commit, but we do
need to worry about whether the are changes in the commit and whether
the original merge commit also merged something with itself. We have
similar cases for dealing with a merge of some commit with its own
ancestor: if the original topology did the same, or the merge commit has
additional file changes, then we cannot remove the commit. But,
otherwise, the commit can be pruned.
Add testcases covering the variety of changes that can occur to make
sure we get them right.
Signed-off-by: Elijah Newren <newren@gmail.com>
There are several cases to worry about with commit pruning; commits
that start empty and had no parent, commits that start empty and
had a parent which may or may not get pruned, commits which had
changes but became empty, commits which were merges but lost a line
of ancestry and have no changes of their own, etc. Add testcases
covering these cases, though most topology related ones will be
deferred to a later set of tests.
Signed-off-by: Elijah Newren <newren@gmail.com>
Many of the callback functions might only be a single line, and as such
instead of forcing the user to write a full blown program with an import
and everything, let them just specify the body of the callback function
as a command line parameter. Add several tests of this functionality as
well.
Signed-off-by: Elijah Newren <newren@gmail.com>
Python wants filenames with underscores instead of hyphens and with a
.py extension. We really want the main file named git-filter-repo, but
we can add a git_filter_repo.py symlink. Doing so dramatically
simplifies the steps needed to import it as a library in external python
scripts.
Signed-off-by: Elijah Newren <newren@gmail.com>
Most filtering operations are not interested in the time that commits
were authored or committed, or when tags were tagged. As such,
translating the string representation of the date into a datetime object
is wasted effort, and causes us to waste more time later as we have to
translate it back into a string.
Instead, provide string_to_date() and date_to_string() functions so that
callers can perform the translation if wanted, and let the normal case
be fast.
Provides a small but noticable speedup when just filtering based on
paths; about a 3.5% improvement in execution time for writing the new
history.
Signed-off-by: Elijah Newren <newren@gmail.com>
Make it easy for users to search and replace text throughout the
repository history. Instead of inventing some new syntax, reuse the
same syntax used by BFG repo filter's --replace-text option, namely,
a file with one expression per line of the form
[regex:|glob:|literal:]$MATCH_EXPR[==>$REPLACEMENT_EXPR]
Where "$MATCH_EXPR" is by default considered to be literal text, but
could be a regex or a glob if the appropriate prefix is used. Also,
$REPLACEMENT_EXPR defaults to '***REMOVED***' if not specified. If
you want a literal '==>' to be part of your $MATCH_EXPR, then you
must also manually specify a replacement expression instead of taking
the default. Some examples:
sup3rs3kr3t
(replaces 'sup3rs3kr3t' with '***REMOVED***')
HeWhoShallNotBeNamed==>Voldemort
(replaces 'HeWhoShallNotBeNamed' with 'Voldemort')
very==>
(replaces 'very' with the empty string)
regex:(\d{2})/(\d{2})/(\d{4})==>\2/\1/\3
(replaces '05/17/2012' with '17/05/2012', and vice-versa)
The format for regex is as from
re.sub(<pattern>, <repl>, <string>) from
https://docs.python.org/2/library/re.html
The <string> comes from file contents of the repo, and you specify
the <pattern> and <repl>.
glob:Copy*t==>Cartel
(replaces 'Copyright' or 'Copyleft' or 'Copy my st' with 'Cartel')
Signed-off-by: Elijah Newren <newren@gmail.com>