filter-repo: avoid merging a commit with one of its own ancestors

Pruning of empty commits can cause an entire line of history to become
empty and be pruned, resulting in a merge commit that merges some commit
with one of its ancestors.  In such a case, we should remove the
unnecessary parent(s) -- which can and will often result in the merge
commit being empty so we can remove it as well.

Currently, if the side that becomes empty is the first parent side, then
we do not detect if the commit becomes empty, due to the way that
fast-export lists changes in a merge commit relative to first parent only.
A subsequent commit will address this.

Note that the callbacks could theoretically insert additional commits or
reparent our commit on top of something else, meaning that the ancestry
graph might need post-callback updates.  However, in any extreme case
where that mattered, we would more or less need full updates to the
ancestry graph to be made for all the new commits from the callback as
well, and once we expect the callback to handle any ancestry graph
updates it can handle modifying it for the current commit.  However, it
is hard to come up with a case where it matters, since for the most part
we just want to know whether our filtering causes commits to become
empty and knowing the source repo we are exporting from is sufficient
information without knowing any new commits inserted or reparenting that
happens elsewhere.

Signed-off-by: Elijah Newren <newren@gmail.com>
pull/13/head
Elijah Newren 6 years ago
parent fa515c8d10
commit 70505e00f9

@ -65,6 +65,65 @@ class FixedTimeZone(tzinfo):
def dst(self, dt):
return timedelta(0)
class AncestryGraph(object):
"""
A class that maintains a direct acycle graph of commits for the purpose of
determining if one commit is the ancestor of another.
"""
def __init__(self):
self.cur_value = 0
# A mapping from the external identifers given to us to the simple integers
# we use in self.graph
self.value = {}
# A tuple of (depth, list-of-ancestors). Values and keys in this graph are
# all integers from the self.value dict. The depth of a commit is one more
# than the max depth of any of its ancestors.
self.graph = {}
def add_commit_and_parents(self, commit, parents):
"""
Record in graph that commit has the given parents. parents _MUST_ have
been first recorded. commit _MUST_ not have been recorded yet.
"""
assert all(p in self.value for p in parents)
assert commit not in self.value
# Get values for commit and parents
self.cur_value += 1
self.value[commit] = self.cur_value
graph_parents = [self.value[x] for x in parents]
# Determine depth for commit, then insert the info into the graph
depth = 1
if parents:
depth += max(self.graph[p][0] for p in graph_parents)
self.graph[self.cur_value] = (depth, graph_parents)
def is_ancestor(self, possible_ancestor, check):
"""
Return whether possible_ancestor is an ancestor of check
"""
a, b = self.value[possible_ancestor], self.value[check]
a_depth = self.graph[a][0]
ancestors = [b]
visited = set()
while ancestors:
ancestor = ancestors.pop()
if ancestor in visited:
continue
visited.add(ancestor)
depth, more_ancestors = self.graph[ancestor]
if ancestor == a:
return True
elif depth <= a_depth:
continue
ancestors.extend(more_ancestors)
return False
class _IDs(object):
"""
A class that maintains the 'name domain' of all the 'marks' (short int
@ -579,6 +638,12 @@ class FastExportFilter(object):
# to if the last (or even only) commit on that branch was pruned
self._seen_refs = {}
# A tuple of (depth, list-of-ancestors). Commits and ancestors are
# identified by their id (their 'mark' in fast-export or fast-import
# speak). The depth of a commit is one more than the max depth of any
# of its ancestors.
self._graph = AncestryGraph()
# A handle to the input source for the fast-export data
self._input = None
@ -801,7 +866,27 @@ class FastExportFilter(object):
merge_ref = self._parse_optional_parent_ref('merge')
was_merge = len(parents) > 1
# Remove redundant parents (if both sides of history are empty commits,
# the most recent ancestor on both sides may be the same commit).
parents = collections.OrderedDict.fromkeys(parents).keys()
# Flatten unnecessary merges. (If one side of history is entirely
# empty commits that were pruned, we may end up attempting to
# merge a commit with its ancestor. Remove parents that are an
# ancestor of another parent.)
num_original_parents = len(parents)
if num_original_parents > 1:
to_remove = []
for cur in xrange(num_original_parents):
for other in xrange(num_original_parents):
if cur != other and self._graph.is_ancestor(parents[cur],
parents[other]):
to_remove.append(cur)
for x in reversed(to_remove):
parents.pop(x)
# Record our new parents after above pruning of parents representing
# pruned empty histories
from_commit = parents[0]
merge_commits = parents[1:]
@ -831,6 +916,9 @@ class FastExportFilter(object):
commit.old_id = id_
_IDS.record_rename(id_, commit.id)
# Record ancestry graph
self._graph.add_commit_and_parents(commit.id, commit.get_parents())
# Call any user callback to allow them to modify the commit
if self._commit_callback:
self._commit_callback(commit)

Loading…
Cancel
Save