fast-import gained a new raw-permissive date format explictly for
allowing people to import repositories as-is. Make use of the flag, and
stop rewriting the bogus timezone found in rails.git.
If users do not like these bogus times, they can of course write a
filter to fix them (or even make them bogus in a different way). For
example:
git filter-repo ... --commit-callback '
if commit.author_date.endswith(b"+051800"):
commit.author_date.replace(b"+051800", b"+0261")
'
Signed-off-by: Elijah Newren <newren@gmail.com>
Some projects have a strict --no-ff merging policy. With the default
behavior of --prune-degenerate, we can prune merge commits in a way that
transforms the history into a fast-forward merge. Consider this
example:
* There are two independent commits or branches, named B & C, which
are both built on top of A so that history look like this diagram:
A
\ \
\ B
\
-C
* Someone runs the following sequence of commands:
* git checkout A
* git merge --no-ff B
* git merge --no-ff C
* This will result in a history that looks like:
A---AB---AC
\ \ / /
\ B /
\ /
-C-
* Later, someone comes along and runs filter-repo, specifying to
remove the only path(s) that were modified by B. That would
naturally remove commit B and the no-longer-necessary merge
commit AB. For someone using a strict no-ff policy, the desired
history is
A---AC
\ /
C
However, the default handling for --prune-degenerate would
notice that AC merely merges C into its own ancestor A, whereas
the original AC merged C into something separate (namely, AB).
So, it would say that AC has become degenerate and prune it,
leaving the simple history of
A
\
C
For projects not using a strict no-ff policy, this simpler history
is probably better, but for folks that want a strict no-ff policy,
it is unfortunate.
Provide a --no-ff option to tweak the --prune-degenerate behavior so
that it ignores the first parent being an ancestor of another parent
(leaving the first parent unpruned even if it is or becomes degenerate
in this fashion).
Signed-off-by: Elijah Newren <newren@gmail.com>
The mailmap format parsed by the "git shortlog" command allows for
matching mailmap entries with no email address. This is admittedly an
edge case, because most Git commits will have an email address
associated with them as well as a name, but technically the address
isn't required, and "git shortlog" accommodates that in its mailmap
format. This commit teaches git-filter-repo to do the same thing.
Signed-off-by: Karl Lenz <xorangekiller@gmail.com>
We have a good default for pruning of empty commits and degenerate merge
commits: only pruning such commits that didn't start out that way (i.e.
that couldn't intentionally have been empty or degenerate). However,
users may have reasons to want to aggressively prune such commits (maybe
they used BFG repo filter or filter-branch previously and have lots of
cruft commits that they want remoed), and we may as well allow them to
specify that they don't want pruning too, just to be flexible.
Signed-off-by: Elijah Newren <newren@gmail.com>
fast-import syntax declares how to specify the parents of a commit with
'from' and possibly 'merge' directives, but it oddly also allows parents
to be implicitly specified via branch name. The documentation is easy
to misread:
"Omitting the from command in the first commit of a new branch will
cause fast-import to create that commit with no ancestor."
Note that the "in the first commit of a new branch" is key here. It is
reinforced later in the document with:
"Omitting the from command on existing branches is usually desired, as
the current commit on that branch is automatically assumed to be the
first ancestor of the new commit."
Desirability of operating this way aside, this raises an interesting
question: what if you only have one branch in some repository, but that
branch has more than one root commit? How does one use the fast-import
format to import such a repository? The fast-import documentation
doesn't state as far as I can tell, but using a 'reset' directive
without providing a 'from' reference for it is the way to go.
Modify filter-repo to understand implicit 'from' commits, and to
appropriately issue 'reset' directives when we need additional root
commits.
Signed-off-by: Elijah Newren <newren@gmail.com>
There are a number of things not present in "normal" imports that we
nevertheless support and need to be tested:
* broken timezone adjustment (+051800->+0261; observed in the wild
in real repos, and adjustment prevents fast-import from dying)
* commits missing an author (observed in the wild in a real repo;
just sets author to committer)
* optional additional linefeeds in the input allowed by
git-fast-import but usually not written by git-fast-export
* progress and checkpoint objects
* progress, checkpoint, and 'everything' callbacks
Signed-off-by: Elijah Newren <newren@gmail.com>
Pruning of commits which become empty can result in a variety of
topology changes: a merge may have lost all its ancestors corresponding
to one of (or more) of its parents, a merge may end up merging a commit
with itself, or a merge may end up merging a commit with its own
ancestor. Merging a commit with itself makes no sense, so we'd rather
prune down to one parent and hopefully prune the merge commit, but we do
need to worry about whether the are changes in the commit and whether
the original merge commit also merged something with itself. We have
similar cases for dealing with a merge of some commit with its own
ancestor: if the original topology did the same, or the merge commit has
additional file changes, then we cannot remove the commit. But,
otherwise, the commit can be pruned.
Add testcases covering the variety of changes that can occur to make
sure we get them right.
Signed-off-by: Elijah Newren <newren@gmail.com>
There are several cases to worry about with commit pruning; commits
that start empty and had no parent, commits that start empty and
had a parent which may or may not get pruned, commits which had
changes but became empty, commits which were merges but lost a line
of ancestry and have no changes of their own, etc. Add testcases
covering these cases, though most topology related ones will be
deferred to a later set of tests.
Signed-off-by: Elijah Newren <newren@gmail.com>
Make it easy for users to search and replace text throughout the
repository history. Instead of inventing some new syntax, reuse the
same syntax used by BFG repo filter's --replace-text option, namely,
a file with one expression per line of the form
[regex:|glob:|literal:]$MATCH_EXPR[==>$REPLACEMENT_EXPR]
Where "$MATCH_EXPR" is by default considered to be literal text, but
could be a regex or a glob if the appropriate prefix is used. Also,
$REPLACEMENT_EXPR defaults to '***REMOVED***' if not specified. If
you want a literal '==>' to be part of your $MATCH_EXPR, then you
must also manually specify a replacement expression instead of taking
the default. Some examples:
sup3rs3kr3t
(replaces 'sup3rs3kr3t' with '***REMOVED***')
HeWhoShallNotBeNamed==>Voldemort
(replaces 'HeWhoShallNotBeNamed' with 'Voldemort')
very==>
(replaces 'very' with the empty string)
regex:(\d{2})/(\d{2})/(\d{4})==>\2/\1/\3
(replaces '05/17/2012' with '17/05/2012', and vice-versa)
The format for regex is as from
re.sub(<pattern>, <repl>, <string>) from
https://docs.python.org/2/library/re.html
The <string> comes from file contents of the repo, and you specify
the <pattern> and <repl>.
glob:Copy*t==>Cartel
(replaces 'Copyright' or 'Copyleft' or 'Copy my st' with 'Cartel')
Signed-off-by: Elijah Newren <newren@gmail.com>