git-filter-repo

mirror of https://github.com/newren/git-filter-repo.git synced 2024-11-19 03:25:33 +00:00

Author	SHA1	Message	Date
Elijah Newren	cefeef1c0a	filter-repo: use new --date-format=raw-permissive fast-import option fast-import gained a new raw-permissive date format explictly for allowing people to import repositories as-is. Make use of the flag, and stop rewriting the bogus timezone found in rails.git. If users do not like these bogus times, they can of course write a filter to fix them (or even make them bogus in a different way). For example: git filter-repo ... --commit-callback ' if commit.author_date.endswith(b"+051800"): commit.author_date.replace(b"+051800", b"+0261") ' Signed-off-by: Elijah Newren <newren@gmail.com>	2020-07-07 09:38:34 -06:00
Elijah Newren	5e04dff097	filter-repo: add new --no-ff option Some projects have a strict --no-ff merging policy. With the default behavior of --prune-degenerate, we can prune merge commits in a way that transforms the history into a fast-forward merge. Consider this example: * There are two independent commits or branches, named B & C, which are both built on top of A so that history look like this diagram: A \ \ \ B \ -C * Someone runs the following sequence of commands: * git checkout A * git merge --no-ff B * git merge --no-ff C * This will result in a history that looks like: A---AB---AC \ \ / / \ B / \ / -C- * Later, someone comes along and runs filter-repo, specifying to remove the only path(s) that were modified by B. That would naturally remove commit B and the no-longer-necessary merge commit AB. For someone using a strict no-ff policy, the desired history is A---AC \ / C However, the default handling for --prune-degenerate would notice that AC merely merges C into its own ancestor A, whereas the original AC merged C into something separate (namely, AB). So, it would say that AC has become degenerate and prune it, leaving the simple history of A \ C For projects not using a strict no-ff policy, this simpler history is probably better, but for folks that want a strict no-ff policy, it is unfortunate. Provide a --no-ff option to tweak the --prune-degenerate behavior so that it ignores the first parent being an ancestor of another parent (leaving the first parent unpruned even if it is or becomes degenerate in this fashion). Signed-off-by: Elijah Newren <newren@gmail.com>	2020-01-01 10:49:56 -08:00
Karl Lenz	780c74b218	filter-repo: parse mailmap entries with no email address The mailmap format parsed by the "git shortlog" command allows for matching mailmap entries with no email address. This is admittedly an edge case, because most Git commits will have an email address associated with them as well as a name, but technically the address isn't required, and "git shortlog" accommodates that in its mailmap format. This commit teaches git-filter-repo to do the same thing. Signed-off-by: Karl Lenz <xorangekiller@gmail.com>	2019-12-27 09:25:25 -05:00
Elijah Newren	2c8f763426	filter-repo: allow users to adjust pruning of empty & degenerate commits We have a good default for pruning of empty commits and degenerate merge commits: only pruning such commits that didn't start out that way (i.e. that couldn't intentionally have been empty or degenerate). However, users may have reasons to want to aggressively prune such commits (maybe they used BFG repo filter or filter-branch previously and have lots of cruft commits that they want remoed), and we may as well allow them to specify that they don't want pruning too, just to be flexible. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-05-11 13:01:19 -07:00
Elijah Newren	6a6d21aff5	filter-repo: handle implicit parents fast-import syntax declares how to specify the parents of a commit with 'from' and possibly 'merge' directives, but it oddly also allows parents to be implicitly specified via branch name. The documentation is easy to misread: "Omitting the from command in the first commit of a new branch will cause fast-import to create that commit with no ancestor." Note that the "in the first commit of a new branch" is key here. It is reinforced later in the document with: "Omitting the from command on existing branches is usually desired, as the current commit on that branch is automatically assumed to be the first ancestor of the new commit." Desirability of operating this way aside, this raises an interesting question: what if you only have one branch in some repository, but that branch has more than one root commit? How does one use the fast-import format to import such a repository? The fast-import documentation doesn't state as far as I can tell, but using a 'reset' directive without providing a 'from' reference for it is the way to go. Modify filter-repo to understand implicit 'from' commits, and to appropriately issue 'reset' directives when we need additional root commits. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-05-11 13:00:32 -07:00
Elijah Newren	e913ccbe8d	filter-repo: add coverage for some corner cases and unusual constructs There are a number of things not present in "normal" imports that we nevertheless support and need to be tested: * broken timezone adjustment (+051800->+0261; observed in the wild in real repos, and adjustment prevents fast-import from dying) * commits missing an author (observed in the wild in a real repo; just sets author to committer) * optional additional linefeeds in the input allowed by git-fast-import but usually not written by git-fast-export * progress and checkpoint objects * progress, checkpoint, and 'everything' callbacks Signed-off-by: Elijah Newren <newren@gmail.com>	2019-04-29 09:56:38 -07:00
Elijah Newren	5ba62ba4e8	filter-repo: add testcases dealing with topology changes Pruning of commits which become empty can result in a variety of topology changes: a merge may have lost all its ancestors corresponding to one of (or more) of its parents, a merge may end up merging a commit with itself, or a merge may end up merging a commit with its own ancestor. Merging a commit with itself makes no sense, so we'd rather prune down to one parent and hopefully prune the merge commit, but we do need to worry about whether the are changes in the commit and whether the original merge commit also merged something with itself. We have similar cases for dealing with a merge of some commit with its own ancestor: if the original topology did the same, or the merge commit has additional file changes, then we cannot remove the commit. But, otherwise, the commit can be pruned. Add testcases covering the variety of changes that can occur to make sure we get them right. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-04-29 09:56:38 -07:00
Elijah Newren	49732e8b5f	filter-repo: add testcases dealing with commit pruning There are several cases to worry about with commit pruning; commits that start empty and had no parent, commits that start empty and had a parent which may or may not get pruned, commits which had changes but became empty, commits which were merges but lost a line of ancestry and have no changes of their own, etc. Add testcases covering these cases, though most topology related ones will be deferred to a later set of tests. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-04-29 09:56:37 -07:00
Elijah Newren	4635102d0f	filter-repo: add more path-related testcases Add some testcases for multiple --path arguments, for --path-glob, and for --path-regex. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-04-29 09:56:37 -07:00
Elijah Newren	73e91edecc	filter-repo: add text removal (or replacement) via file of expressions Make it easy for users to search and replace text throughout the repository history. Instead of inventing some new syntax, reuse the same syntax used by BFG repo filter's --replace-text option, namely, a file with one expression per line of the form [regex:\|glob:\|literal:]$MATCH_EXPR[==>$REPLACEMENT_EXPR] Where "$MATCH_EXPR" is by default considered to be literal text, but could be a regex or a glob if the appropriate prefix is used. Also, $REPLACEMENT_EXPR defaults to '*REMOVED' if not specified. If you want a literal '==>' to be part of your $MATCH_EXPR, then you must also manually specify a replacement expression instead of taking the default. Some examples: sup3rs3kr3t (replaces 'sup3rs3kr3t' with 'REMOVED') HeWhoShallNotBeNamed==>Voldemort (replaces 'HeWhoShallNotBeNamed' with 'Voldemort') very==> (replaces 'very' with the empty string) regex:(\d{2})/(\d{2})/(\d{4})==>\2/\1/\3 (replaces '05/17/2012' with '17/05/2012', and vice-versa) The format for regex is as from re.sub(<pattern>, <repl>, <string>) from https://docs.python.org/2/library/re.html The <string> comes from file contents of the repo, and you specify the <pattern> and <repl>. glob:Copyt==>Cartel (replaces 'Copyright' or 'Copyleft' or 'Copy my st' with 'Cartel') Signed-off-by: Elijah Newren <newren@gmail.com>	2019-04-26 07:56:03 -07:00
Elijah Newren	dd438dc455	filter-repo: add mailmap handling Signed-off-by: Elijah Newren <newren@gmail.com>	2019-04-26 07:56:03 -07:00
Elijah Newren	17a2f7102d	filter-repo: add some basic tests, with git-style test-lib.sh Signed-off-by: Elijah Newren <newren@gmail.com>	2019-03-12 14:19:38 -07:00

12 Commits