filter-repo: fix handling of topological changes

Due to pruning of empty commits, merge commits can become degenerate
(same commit serving as both parents, or one parent is an ancestor of
one of the others).  While we usually want to allow such degenerate
merge commits to themselves be pruned (assuming they add no additional
file changes), we do not want to prune them if the merge commit in the
original repository had the same degenerate topology.  So, we need to
keep track of the ancestry graph of the original repository as well and
include it in the logic about whether to allow merge commits to be
pruned.

Signed-off-by: Elijah Newren <newren@gmail.com>
pull/13/head
Elijah Newren 5 years ago
parent 49732e8b5f
commit 697b9256f5

@ -202,22 +202,22 @@ provide at least one of the last three traits as well:
the pruning of parents and other ancestors can ultimately result the pruning of parents and other ancestors can ultimately result
in the loss of one or more parents. If a merge commit loses in the loss of one or more parents. If a merge commit loses
enough parents to become a non-merge commit and it has no file enough parents to become a non-merge commit and it has no file
changes, then it too can be pruned. Topology changes are also changes, then it too can be pruned. Merge commits can also have
possible if the entire non-first-parent history is pruned away; a topology that becomes degenerate: it could end up with the
rather than having that parent of the merge be rewritten to the merge_base serving as both parents (if all intervening commits
merge base, it may (depending on whether the merge also had file from the original repo were pruned), or it could end up with one
changes of its own) instead make sense to just prune that parent. parent which is an ancestor of its other parent. In such cases,
(We do not want to prune away a first parent being rewritten to if the merge has no file changes of its own, then the merge
the merge base since some projects prefer --no-ff merges, though commit can also be pruned. However, if the merge commit was
this could be made an option.) Finally, note that we originally already degenerate in the original history, then it was probably
talked not about pruning empty commits, but about pruning commits intentional and the merge commit will not be pruned. Finally,
which become empty. Some projects intentionally create empty note that we originally talked about pruning commits which become
commits for versioning or publishing reasons, and these should empty, NOT about pruning empty commits. Some projects
not be removed. Instead, only commits which become empty should intentionally create empty commits for versioning or publishing
be pruned. (As a special case, commits which started empty but reasons, and these should not be removed. Instead, only commits
originally had a parent and which become a root commit due to the which become empty should be pruned. (As a special case, commits
pruning of other commits will also be considered to have "become which started empty but whose parent was pruned away will also be
empty".) considered to have "become empty".)
1. [Speed] Filtering should be reasonably fast 1. [Speed] Filtering should be reasonably fast

@ -806,6 +806,8 @@ class FastExportFilter(object):
# speak). The depth of a commit is one more than the max depth of any # speak). The depth of a commit is one more than the max depth of any
# of its ancestors. # of its ancestors.
self._graph = AncestryGraph() self._graph = AncestryGraph()
# Another one, for ancestry of commits in the original repo
self._orig_graph = AncestryGraph()
# A set of commit hash pairs (oldhash, newhash) which used to be merge # A set of commit hash pairs (oldhash, newhash) which used to be merge
# commits but due to filtering were turned into non-merge commits. # commits but due to filtering were turned into non-merge commits.
@ -1174,15 +1176,17 @@ class FastExportFilter(object):
# 'None') # 'None')
# Remove all parents rewritten to None, and keep track of which parents # Remove all parents rewritten to None, and keep track of which parents
# were rewritten to an ancestor. # were rewritten to an ancestor.
tmp = zip(parents, [x in _SKIPPED_COMMITS for x in orig_parents]) tmp = zip(parents,
orig_parents,
[x in _SKIPPED_COMMITS for x in orig_parents])
tmp2 = [x for x in tmp if x[0] is not None] tmp2 = [x for x in tmp if x[0] is not None]
parents, is_rewritten = [list(x) for x in zip(*tmp2)] if tmp2 else ([], []) if not tmp2:
# All ancestors have been pruned; we have no parents. Note that the
# However, the way fast-export/fast-import split parents into from_commit # way fast-export/fast-import split parents into from_commit and
# and merge_commits means we'd rather a parentless commit be represented # merge_commits means we'd rather a parentless commit be represented
# as a list containing a single None entry. # as a list containing a single None entry.
if not parents: return [None], None
parents.append(None) parents, orig_parents, is_rewritten = [list(x) for x in zip(*tmp2)]
# We can't have redundant parents if we don't have at least 2 parents # We can't have redundant parents if we don't have at least 2 parents
if len(parents) < 2: if len(parents) < 2:
@ -1197,9 +1201,9 @@ class FastExportFilter(object):
# Deleting duplicate rewritten parents means keeping parents if either # Deleting duplicate rewritten parents means keeping parents if either
# they have not been seen or they are ones that have not been rewritten. # they have not been seen or they are ones that have not been rewritten.
parents_copy = parents parents_copy = parents
pairs = [[p, is_rewritten[i]] for i, p in enumerate(parents) uniq = [[p, orig_parents[i], is_rewritten[i]] for i, p in enumerate(parents)
if not (p in seen or seen_add(p)) or not is_rewritten[i]] if not (p in seen or seen_add(p)) or not is_rewritten[i]]
parents, is_rewritten = [list(x) for x in zip(*pairs)] parents, orig_parents, is_rewritten = [list(x) for x in zip(*uniq)]
if len(parents) < 2: if len(parents) < 2:
return parents_copy, parents[0] return parents_copy, parents[0]
@ -1213,10 +1217,21 @@ class FastExportFilter(object):
if not is_rewritten[cur]: if not is_rewritten[cur]:
continue continue
for other in xrange(num_parents): for other in xrange(num_parents):
if cur != other and self._graph.is_ancestor(parents[cur], if cur == other:
parents[other]): continue
to_remove.append(cur) if not self._graph.is_ancestor(parents[cur], parents[other]):
break # cur removed, so skip rest of others -- i.e. check cur+=1 continue
# parents[cur] is an ancestor of parents[other], so parents[cur]
# seems redundant. However, if it was intentionally redundant
# (e.g. a no-ff merge) in the original, then we want to keep it.
if self._orig_graph.is_ancestor(orig_parents[cur],
orig_parents[other]):
continue
# Okay so the cur-th parent is an ancestor of the other-th parent,
# and it wasn't that way in the original repository; mark the
# cur-th parent as removable.
to_remove.append(cur)
break # cur removed, so skip rest of others -- i.e. check cur+=1
for x in reversed(to_remove): for x in reversed(to_remove):
parents.pop(x) parents.pop(x)
if len(parents) < 2: if len(parents) < 2:
@ -1237,8 +1252,6 @@ class FastExportFilter(object):
if len(orig_parents) < 2: if len(orig_parents) < 2:
# Special logic for commits that started empty... # Special logic for commits that started empty...
if not had_file_changes: if not had_file_changes:
if orig_parents == [None]:
orig_parents = []
had_parents_pruned = (len(parents) < len(orig_parents) or had_parents_pruned = (len(parents) < len(orig_parents) or
(len(orig_parents) == 1 and (len(orig_parents) == 1 and
orig_parents[0] in _SKIPPED_COMMITS)) orig_parents[0] in _SKIPPED_COMMITS))
@ -1376,6 +1389,8 @@ class FastExportFilter(object):
parents, new_1st_parent = self.trim_extra_parents(orig_parents, parents) parents, new_1st_parent = self.trim_extra_parents(orig_parents, parents)
from_commit = parents[0] from_commit = parents[0]
merge_commits = parents[1:] merge_commits = parents[1:]
if orig_parents == [None]:
orig_parents = []
# Get the list of file changes # Get the list of file changes
file_changes = [] file_changes = []
@ -1407,6 +1422,7 @@ class FastExportFilter(object):
# Record ancestry graph # Record ancestry graph
self._graph.add_commit_and_parents(commit.id, commit.get_parents()) self._graph.add_commit_and_parents(commit.id, commit.get_parents())
self._orig_graph.add_commit_and_parents(id_, orig_parents)
# Record the original list of file changes relative to first parent # Record the original list of file changes relative to first parent
orig_file_changes = set(commit.file_changes) orig_file_changes = set(commit.file_changes)

Loading…
Cancel
Save