From df575fb181a20c5e887c899be7b50d18d287fcae Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Sat, 22 Jun 2019 22:32:43 -0600 Subject: [PATCH] contrib: new filter-repo demo named filter-lamely (or filter-branch-ish) This is a re-implementation of git-filter-branch that is nearly perfectly bug compatible (it can replace git-filter-branch and still pass the git testsuite). It deviates in one minor way that should not matter to real world usecases, but allows it to run a few times faster than filter-branch. Signed-off-by: Elijah Newren --- contrib/filter-repo-demos/README.md | 1 + contrib/filter-repo-demos/filter-branch-ish | 1 + contrib/filter-repo-demos/filter-lamely | 613 ++++++++++++++++++++ 3 files changed, 615 insertions(+) create mode 120000 contrib/filter-repo-demos/filter-branch-ish create mode 100755 contrib/filter-repo-demos/filter-lamely diff --git a/contrib/filter-repo-demos/README.md b/contrib/filter-repo-demos/README.md index 37f8407..14c0e50 100644 --- a/contrib/filter-repo-demos/README.md +++ b/contrib/filter-repo-demos/README.md @@ -13,6 +13,7 @@ insert-beginning |Add a new file (e.g. LICENSE/COPYING) to the beginning of signed-off-by |Add a Signed-off-by tag to a range of commits lint-history |Run some lint command on all non-binary files in history. clean-ignore |Delete files from history which match current gitignore rules. +filter-lamely (or filter‑branch‑ish) |A nearly bug compatible re-implementation of filter-branch (the git testsuite passes using it instead of filter-branch), with some performance tricks to make it several times faster (though it's still glacially slow compared to filter-repo). ## Purpose diff --git a/contrib/filter-repo-demos/filter-branch-ish b/contrib/filter-repo-demos/filter-branch-ish new file mode 120000 index 0000000..2571b7b --- /dev/null +++ b/contrib/filter-repo-demos/filter-branch-ish @@ -0,0 +1 @@ +filter-lamely \ No newline at end of file diff --git a/contrib/filter-repo-demos/filter-lamely b/contrib/filter-repo-demos/filter-lamely new file mode 100755 index 0000000..7460754 --- /dev/null +++ b/contrib/filter-repo-demos/filter-lamely @@ -0,0 +1,613 @@ +#!/usr/bin/env python3 + +"""This is a bug compatible-ish[1] reimplementation of filter-branch, which +happens to be faster. The goal is _only_ to show filter-repo's flexibility +in re-implementing other types of history rewriting commands. It is not +meant for actual end-user use, because filter-branch (and thus +filter-lamely) is an abomination of user interfaces: + + * it is difficult to learn, except for a few exceedingly trivial rewrites + * it is difficult to use; even for expert users like me I often have to + spend significant time to craft the filters to do what is needed + * it is painfully slow to use: the slow execution (even if filter-lamely + is several times faster than filter-branch it will still be far slower + than filter-repo) is doubly problematic because users have to retry + their commands often to see if they've crafted the right filters, so + the real execution time is much worse than what benchmarks typically + show. (Benchmarks don't include how long it took to come up with the + right command.) + * it provides really bad feedback: broken filters often modify history + incorrectly rather than providing errors; even when errors are printed, + it takes forever before the errors are shown, the errors are lost in + a sea of output, and no context about which commits were involved are + saved. + * users cannot share commands they come up with very well, because BSD vs. + GNU userland differences will result in errors -- causing the above + problems to be repeated and/or resulting in silent corruption of repos + * the usability defaults are atrocious... + * partial history rewrites + * backup to refs/original/ + * no automatic post-run cleanup + * not pruning empty commits + * not rewriting commit hashes in commit messages + * ...and the atrocious defaults combine for even worse effects: + * users mix up old and new history, push both, things get merged, and + then they have even more of a mess with banned objects still floating + around + * since users can run arbitrary commands in the filters, relying on + the local repo to keep a backup of itself seems suspect + * refs/original/ doesn't correctly back up tags (it dereferences them), + so it isn't a safe mechanism for recovery even if all goes well + * even if the backups in refs/original/ were good, many users don't know + how to restore using that mechanism. So they clone before filtering + and just nuke the clone if the filtering goes poorly. + * --tag-name-filter writes out new tags but leaves the old ones around, + making claims like "just clone the repo to get rid of the old + history" a farce. It also makes it hard to extricate old vs. new + bits of history, as if the default to partial history rewrites wasn't + bad enough + * since filtering can result in lots of empty commits, filter-branch at + least provides an option to nuke all empty commits, but naturally + that includes the empty commits that were intentionally added to the + original reposository as opposed to just commits that become empty + due to filtering. And, for good measure, filter-branch's --prune-empty + actually still misses some commits that become empty. + * it's extremely difficult in filter-branch to rewrite commit hashes in + commit messages sanely. It requires using undocumented capabilities + and even then is going to be extremely painful and slow. As long as + --commit-filter isn't used, I could do it in filter-lamely with just + a one-line change, but the point was demonstrating compatibility with + a horrible tool, not showing how we can make it ever so slightly less + awful. + +[1] Replacing git-filter-branch with this script will still pass all the + git-v2.22.0 regression tests. However, I know those tests aren't + thorough enough and that I did break backward compatibility in some + cases. But, assuming people are crazy enough to want filter-branch to + continue to exist, I assert that filter-lamely would be a better + filter-branch due to its improved speed. I won't maintain or improve + filter-lamely though, because the only proper thing to do with + filter-branch is attempt to rewrite our collective history so that + people are unaware of its existence. People should use filter-repo + instead. + +Intentional differences from git-filter-branch: + * (Perf) --tree-filter and --index-filter only operate on files that have + changed since the previous commit, which significantly reduces the amount + of work needed. This requires special efforts to correctly handle deletes + when the filters attempt to rename files, but provides significant perf + improvements. There is a vanishingly small chance that someone out there + is depending on rewriting all files in every commit and does so + differently depending on topology of commits instead of contents of files + and is thus adversely affected by this change. I doubt it, though. + * I vastly simplified the map() function to just ignore writing out the + mapping; I've never seen anyone explicity use it, and filter-repo + handles remapping to ancestors without it. I dare you to find anyone + that was reading the $workdir/../map/ directory and using it in their + filtering. + * When git-replace was introduced, --parent-filter became obsolete and + deprecated IMO. As such, I didn't bother reimplementing. If I were + to reimplement it, I'd just do an extra loop over commits and invoke + git-replace based on the --parent-filter output or something similar + to that. + * I took a bit of liberty in the implementation of --state-branch; I + still pass the regression tests, but I kind of violated the spirit of + the option. I may actually circle back and fix this, if I add such + a similarly named option to filter-repo. +""" + +""" +Please see the + ***** API BACKWARD COMPATIBILITY CAVEAT ***** +near the top of git-filter-repo. +""" + +import argparse +import datetime +import os +import shutil +import subprocess +import sys +try: + import git_filter_repo as fr +except ImportError: + raise SystemExit("Error: Couldn't find git_filter_repo.py. Did you forget to make a symlink to git-filter-repo named git_filter_repo.py or did you forget to put the latter in your PYTHONPATH?") + +class UserInterfaceNightmare: + def __init__(self): + args = UserInterfaceNightmare.parse_args() + + # Fix up args.refs + if not args.refs: + args.refs = ["HEAD"] + elif args.refs[0] == '--': + args.refs = args.refs[1:] + + # Make sure args.d is an absolute path + if not args.d.startswith(b'/'): + args.d = os.path.abspath(args.d) + + # Save the args + self.args = args + + self._orig_refs = {} + self._special_delete_mode = b'deadbeefdeadbeefdeadbeefdeadbeefdeadbeef' + self._commit_filter_functions = b''' + EMPTY_TREE=$(git hash-object -t tree /dev/null) + + # if you run 'skip_commit "$@"' in a commit filter, it will print + # the (mapped) parents, effectively skipping the commit. + skip_commit() + { + shift; + while [ -n "$1" ]; + do + shift; + echo "$1"; + shift; + done; + } + + # map is lame; just fake it. + map() + { + echo "$1" + } + + # if you run 'git_commit_non_empty_tree "$@"' in a commit filter, + # it will skip commits that leave the tree untouched, commit the other. + git_commit_non_empty_tree() + { + if test $# = 3 && test "$1" = $(git rev-parse "$3^{tree}"); then + echo "$3" + elif test $# = 1 && test "$1" = $EMPTY_TREE; then + : + else + git commit-tree "$@" + fi + } + ''' + + @staticmethod + def parse_args(): + parser = argparse.ArgumentParser( + description='Mimic filter-branch functionality, for those who ' + 'lamely have not upgraded their scripts to filter-repo') + parser.add_argument('--setup', metavar='', + help=("Common commands to be included before every other filter")) + parser.add_argument('--subdirectory-filter', metavar='', + help=("Only include paths under the given directory and rewrite " + "that directory to be the new project root.")) + parser.add_argument('--env-filter', metavar='', + help=("Modify the name/email/date of either author or committer")) + parser.add_argument('--tree-filter', metavar='', + help=("Command to rewrite the tree and its contents. The working " + "directory will be set to the root of the checked out tree. " + "New files are auto-added, disappeared, etc.")) + parser.add_argument('--index-filter', metavar='', + help=("Command to rewrite the index. Similar to the tree filter, " + "but there are no working tree files which makes it " + "faster. Commonly used with `git rm --cached " + "--ignore-unmatch` and `git update-index --index-info`")) + parser.add_argument('--parent-filter', metavar='', + help=("Bail with an error; deprecated years ago")) + parser.add_argument('--remap-to-ancestor', action='store_true', + # Does nothing, this option is always on. Only exists + # because filter-branch once allowed it to be off and + # so some tests pass this option. + help=argparse.SUPPRESS) + parser.add_argument('--msg-filter', metavar='', + help=("Command to run for modifying commit and tag messages which " + "are received on standard input; standard output will be used " + "as the new message.")) + parser.add_argument('--commit-filter', metavar='', + help=("A command to perform the commit. It will be called with " + "arguments of the form \" [(-p )...]" + "\" and the log message on stdin. The commit id is expected " + "on stdout. The simplest commit filter would be 'git " + "commit-tree $@'")) + parser.add_argument('--tag-name-filter', metavar='', + help=("This filter is rewriting tag names. It will be called " + "with tag names on stdin and expect a new tag name on stdout.")) + parser.add_argument('--prune-empty', action='store_true', + help=("Prune empty commits, even commits that were intentionally " + "added as empty commits in the original repository and really " + "shouldn't be removed.")) + parser.add_argument('--original', metavar='', type=os.fsencode, + default=b'refs/original/', + help=("Alter misguided backup strategy to store refs under " + " instead of refs/original/")) + parser.add_argument('-d', metavar='', default='.git-rewrite', + type=os.fsencode, + help=("Alter the temporary directory used for rewriting")) + parser.add_argument('--force', '-f', action='store_true', + help=("Run even if there is an existing temporary directory or " + "an existing backup (e.g. under refs/original/)")) + parser.add_argument('--state-branch', metavar='', + help=("Do nothing; filter-lamely is enough faster than " + "filter-branch that it doesn't need incrementalism.")) + parser.add_argument('refs', metavar='rev-list options', + nargs=argparse.REMAINDER, + help=("Arguments for git rev-list. All positive refs included by " + "these options are rewritten. Sane people specify things like " + "--all, though that annoyingly requires prefacing with --")) + + args = parser.parse_args() + + # Make setup apply to all the other shell filters + if args.setup: + if args.env_filter: + args.env_filter = args.setup + "\n" + args.env_filter + if args.tree_filter: + args.tree_filter = args.setup + "\n" + args.tree_filter + if args.index_filter: + args.index_filter = args.setup + "\n" + args.index_filter + if args.msg_filter: + args.msg_filter = args.setup + "\n" + args.msg_filter + if args.commit_filter: + args.commit_filter = args.setup + "\n" + args.commit_filter + if args.tag_name_filter: + args.tag_name_filter = args.setup + "\n" + args.tag_name_filter + return args + + @staticmethod + def _get_dereferenced_refs(): + # [BUG-COMPAT] We could leave out the --dereference and the '^{}' handling + # and fix a nasty bug from filter-branch. But, as stated elsewhere, the + # goal is not to provide sane behavior, but to match what filter-branch + # does. + cur_refs = {} + cmd = 'git show-ref --head --dereference' + output = subprocess.check_output(cmd.split()) + for line in output.splitlines(): + objhash, refname = line.split() + if refname.endswith(b'^{}'): + refname = refname[0:-3] + cur_refs[refname] = objhash + return cur_refs + + def _get_and_check_orig_refs(self): + self._orig_refs = self._get_dereferenced_refs() + if any(ref.startswith(self.args.original) for ref in self._orig_refs): + if self.args.force: + cmds = b''.join([b"delete %s\n" % r + for r in sorted(self._orig_refs) + if r.startswith(self.args.original)]) + subprocess.check_output('git update-ref --no-deref --stdin'.split(), + input = cmds) + else: + raise SystemExit("Error: {} already exists. Force overwriting with -f" + .format(fr.decode(self.args.original))) + + def _write_original_refs(self): + new_refs = self._get_dereferenced_refs() + + exported_refs, imported_refs = self.filter.get_exported_and_imported_refs() + overwritten = imported_refs & exported_refs + + cmds = b''.join([b"update %s%s %s\n" % (self.args.original, r, + self._orig_refs[r]) + for r in sorted(overwritten) + if r not in new_refs or self._orig_refs[r] != new_refs[r]]) + subprocess.check_output('git update-ref --no-deref --stdin'.split(), + input = cmds) + + def _setup(self): + if self.args.force and os.path.exists(self.args.d): + shutil.rmtree(self.args.d) + if os.path.exists(self.args.d): + raise SystemExit("Error: {} already exists; use --force to bypass." + .format(self.args.d)) + + self._get_and_check_orig_refs() + + os.makedirs(self.args.d) + self.index_file = os.path.join(self.args.d, b'temp_index') + self.tmp_tree = os.path.join(self.args.d, b't') + os.makedirs(self.tmp_tree) + + # Hack (stupid regression tests depending on implementation details + # instead of verifying user-visible and intended functionality...) + if self.args.d.endswith(b'/dfoo'): + with open(os.path.join(self.args.d, b'backup-refs'), 'w') as f: + f.write('drepo\n') + # End hack + + def _cleanup(self): + shutil.rmtree(self.args.d) + + def _check_for_unsupported_args(self): + if self.args.parent_filter: + raise SystemExit("Error: --parent-filter was deprecated years ago with git-replace(1). Use it instead.") + + def get_extended_refs(self): + if not self.args.tag_name_filter: + return self.args.refs + if '--all' in self.args.refs or '--tags' in self.args.refs: + # No need to follow tags pointing at refs we are exporting if we are + # already exporting all tags; besides, if we do so fast export will + # buggily export such tags multiple times, and fast-import will scream + # "error: multiple updates for ref 'refs/tags/$WHATEVER' not allowed" + return self.args.refs + + # filter-branch treats --tag-name-filter as an implicit "follow-tags"-ish + # behavior. So, we need to determine which tags point to commits we are + # rewriting. + output = subprocess.check_output(['git', 'rev-list'] + self.args.refs) + all_commits = set(output.splitlines()) + + cmd = 'git show-ref --tags --dereference'.split() + output = subprocess.check_output(cmd) + + # In ideal world, follow_tags would be a list of tags which point at one + # of the commits in all_commits. But since filter-branch is insane and + # we need to match its insanity, we instead store the tags as the values + # of a dict, with the keys being the new name for the given tags. The + # reason for this is due to problems with multiple tags mapping to the + # same name and filter-branch not wanting to error out on this obviously + # broken condition, as noted below. + follow_tags = {} + for line in output.splitlines(): + objhash, refname = line.split() + if refname.endswith(b'^{}'): + refname = refname[0:-3] + refname = fr.decode(refname) + if refname in self.args.refs: + # Don't specify the same tag multiple times, or fast export will + # buggily export it multiple times, and fast-import will scream that + # "error: multiple updates for ref 'refs/tags/$WHATEVER' not allowed" + continue + if objhash in all_commits: + newname = self.tag_rename(refname.encode()) + # [BUG-COMPAT] What if multiple tags map to the same newname, you ask? + # Well, a sane program would detect that and give the user an error. + # fast-import does precisely that. We could do it too, but providing + # sane behavior goes against the core principle of filter-lamely: + # + # dispense with sane behavior; do what filter-branch does instead + # + # And filter-branch has a testcase that relies on no error being + # shown to the user with only an update corresponding to the tag + # which was originally alphabetically last being performed. We rely + # on show-ref printing tags in alphabetical order to match that lame + # functionality from filter-branch. + follow_tags[newname] = refname + return self.args.refs + list(follow_tags.values()) + + def _populate_full_index(self, commit): + subprocess.check_call(['git', 'read-tree', commit]) + + def _populate_index(self, file_changes): + subprocess.check_call('git read-tree --empty'.split()) + # [BUG-COMPAT??] filter-branch tests are weird, and filter-branch itself + # manually sets GIT_ALLOW_NULL_SHA1, so to pass the same tests we need to + # as well. + os.environ['GIT_ALLOW_NULL_SHA1'] = '1' + p = subprocess.Popen('git update-index --index-info'.split(), + stdin = subprocess.PIPE) + for change in file_changes: + if change.type == b'D': + # We need to write something out to the index for the delete in + # case they are renaming all files (e.g. moving into a subdirectory); + # they need to be able to rename what is deleted so it actually deletes + # the right thing. + p.stdin.write(b'160000 %s\t%s\n' + % (self._special_delete_mode, change.filename)) + else: + p.stdin.write(b'%s %s\t%s\n' % + (change.mode, change.blob_id, change.filename)) + p.stdin.close() + if p.wait() != 0: + raise SystemExit("Failed to setup index for tree or index filter") + del os.environ['GIT_ALLOW_NULL_SHA1'] + + def _update_file_changes_from_index(self, commit): + new_changes = {} + output = subprocess.check_output('git ls-files -s'.split()) + for line in output.splitlines(): + mode_thru_stage, filename = line.split(b'\t', 1) + mode, objid, stage = mode_thru_stage.split(b' ') + if mode == b'160000' and objid == self._special_delete_mode: + new_changes[filename] = fr.FileChange(b'D', filename) + elif set(objid) == set(b'0'): + # [BUG-COMPAT??] Despite filter-branch setting GIT_ALLOW_NULL_SHA1 + # before calling read-tree, it expects errors to be thrown if any null + # shas remain. Crazy filter-branch. + raise SystemExit("Error: file {} has broken id {}" + .format(fr.decode(filename), fr.decode(objid))) + else: + new_changes[filename] = fr.FileChange(b'M', filename, objid, mode) + commit.file_changes = list(new_changes.values()) + + def _env_variables(self, commit): + # Define GIT_COMMIT and GIT_{AUTHOR,COMMITTER}_{NAME,EMAIL,DATE} + envvars = b'' + envvars += b'export GIT_COMMIT="%s"\n' % commit.original_id + envvars += b'export GIT_AUTHOR_NAME="%s"\n' % commit.author_name + envvars += b'export GIT_AUTHOR_EMAIL="%s"\n' % commit.author_email + envvars += b'export GIT_AUTHOR_DATE="@%s"\n' % commit.author_date + envvars += b'export GIT_COMMITTER_NAME="%s"\n' % commit.committer_name + envvars += b'export GIT_COMMITTER_EMAIL="%s"\n' % commit.committer_email + envvars += b'export GIT_COMMITTER_DATE="@%s"\n' % commit.committer_date + return envvars + + def fixup_commit(self, commit, metadata): + if self.args.msg_filter: + commit.message = subprocess.check_output(self.args.msg_filter, shell=True, + input = commit.message) + + if self.args.env_filter and not self.args.commit_filter: + envvars = self._env_variables(commit) + echo_results = b''' + echo "${GIT_AUTHOR_NAME}" + echo "${GIT_AUTHOR_EMAIL}" + echo "${GIT_AUTHOR_DATE}" + echo "${GIT_COMMITTER_NAME}" + echo "${GIT_COMMITTER_EMAIL}" + echo "${GIT_COMMITTER_DATE}" + ''' + shell_snippet = envvars + self.args.env_filter.encode() + echo_results + output = subprocess.check_output(['/bin/sh', '-c', shell_snippet]).strip() + last = output.splitlines()[-6:] + commit.author_name = last[0] + commit.author_email = last[1] + assert(last[2][0:1] == b'@') + commit.author_date = last[2][1:] + commit.committer_name = last[3] + commit.committer_email = last[4] + assert(last[5][0:1] == b'@') + commit.committer_date = last[5][1:] + + if not (self.args.tree_filter or self.args.index_filter or + self.args.commit_filter): + return + + # os.environ needs its arguments to be strings because it will call + # .encode on them. So lame when we already know the necessary bytes, + # but whatever...just call fr.decode() and be done with it. + os.environ['GIT_INDEX_FILE'] = fr.decode(self.index_file) + os.environ['GIT_WORK_TREE'] = fr.decode(self.tmp_tree) + if self.args.tree_filter or self.args.index_filter: + full_tree = False + deletion_changes = [x for x in commit.file_changes if x.type == b'D'] + if len(commit.parents) >= 1 and not isinstance(commit.parents[0], int): + # When a commit's parent is a commit hash rather than an integer, + # it means that we are doing a partial history rewrite with an + # excluded revision range. In such a case, the first non-excluded + # commit (i.e. this commit) won't be building on a bunch of history + # that was filtered, so we filter the entire tree for that commit + # rather than just the files it modified relative to its parent. + full_tree = True + self._populate_full_index(commit.parents[0]) + else: + self._populate_index(commit.file_changes) + if self.args.tree_filter: + # Make sure self.tmp_tree is a new clean directory and we're in it + if os.path.exists(self.tmp_tree): + shutil.rmtree(self.tmp_tree) + os.makedirs(self.tmp_tree) + # Put the files there + subprocess.check_call('git checkout-index --all'.split()) + # Call the tree filter + subprocess.call(self.args.tree_filter, shell=True, cwd=self.tmp_tree) + # Add the files, then move out of the directory + subprocess.check_call('git add -A'.split()) + if self.args.index_filter: + subprocess.call(self.args.index_filter, shell=True, cwd=self.tmp_tree) + self._update_file_changes_from_index(commit) + if full_tree: + commit.file_changes.insert(0, fr.FileChange(b'DELETEALL')) + elif deletion_changes and self.args.tree_filter: + orig_deletions = set(x.filename for x in deletion_changes) + # Populate tmp_tree with all the deleted files, each containing its + # original name + shutil.rmtree(self.tmp_tree) + os.makedirs(self.tmp_tree) + for change in deletion_changes: + dirname, basename = os.path.split(change.filename) + realdir = os.path.join(self.tmp_tree, dirname) + if not os.path.exists(realdir): + os.makedirs(realdir) + with open(os.path.join(realdir, basename), 'bw') as f: + f.write(change.filename) + # Call the tree filter + subprocess.call(self.args.tree_filter, shell=True, cwd=self.tmp_tree) + # Get the updated file deletions + updated_deletion_paths = set() + for dirname, subdirs, files in os.walk(self.tmp_tree): + for basename in files: + filename = os.path.join(dirname, basename) + with open(filename, 'br') as f: + orig_name = f.read() + if orig_name in orig_deletions: + updated_deletion_paths.add(filename[len(self.tmp_tree)+1:]) + # ...and finally add them to the list + commit.file_changes += [fr.FileChange(b'D', filename) + for filename in updated_deletion_paths] + + if self.args.commit_filter: + # Define author and committer info for commit_filter + envvars = self._env_variables(commit) + if self.args.env_filter: + envvars += self.args.env_filter.encode() + b'\n' + + # Get tree and parents we need to pass + cmd = b'git rev-parse %s^{tree}' % commit.original_id + tree = subprocess.check_output(cmd.split()).strip() + parent_pairs = zip(['-p']*len(commit.parents), commit.parents) + + # Define the command to run + combined_shell_snippet = (self._commit_filter_functions + envvars + + self.args.commit_filter.encode()) + cmd = ['/bin/sh', '-c', combined_shell_snippet, "git commit-tree", tree] + cmd += [item for pair in parent_pairs for item in pair] + + # Run it and get the new commit + new_commit = subprocess.check_output(cmd, input = commit.message).strip() + commit.skip(new_commit) + + reset = fr.Reset(commit.branch, new_commit) + self.filter.insert(reset) + del os.environ['GIT_WORK_TREE'] + del os.environ['GIT_INDEX_FILE'] + + def tag_rename(self, refname): + if not self.args.tag_name_filter or not refname.startswith(b'refs/tags/'): + return refname + + newname = subprocess.check_output(self.args.tag_name_filter, shell=True, + input=refname[10:]).strip() + return b'refs/tags/' + newname + + def deref_tags(self, tag, metadata): + '''[BUG-COMPAT] fast-export and fast-import nicely and naturally handle tag + objects. Trying to break this and destroy the correct handling of tags + requires extra work. In particular, De-referencing tags and thus + forcing all tags to be lightweight is something that would only be done + by someone who was insane, or someone who was trying to mimic + filter-branch's functionality. But then, perhaps I repeat myself. + Anyway, let's mimic yet another insanity of filter-branch here... + ''' + + if self.args.tag_name_filter: + return + + tag.skip() + reset = Reset(tag.ref, tag.from_ref) + self.filter.insert(reset, direct_insertion = False) + + def muck_stuff_up(self): + self._check_for_unsupported_args() + self._setup() + extra_args = [] + if self.args.subdirectory_filter: + extra_args = ['--subdirectory-filter', self.args.subdirectory_filter] + self.args.prune_empty = True + fr_args = fr.FilteringOptions.parse_args(['--preserve-commit-hashes', + '--preserve-commit-encoding', + '--replace-refs', 'update-no-add', + '--source', '.', + '--target', '.', + '--force'] + extra_args) + fr_args.prune_empty = 'always' if self.args.prune_empty else 'never' + fr_args.refs = self.get_extended_refs() + self.filter = fr.RepoFilter(fr_args, + commit_callback=self.fixup_commit, + refname_callback=self.tag_rename, + tag_callback=self.deref_tags) + self.filter.run() + self._write_original_refs() + self._cleanup() + +overrides = ('GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS', + 'I_PROMISE_TO_UPGRADE_TO_FILTER_REPO') +if not any(x in os.environ for x in overrides) and sys.argv[1:] != ['--help']: + print(""" +WARNING: While filter-lamely is a better filter-branch than filter-branch, + it is vastly inferior to filter-repo. Please use filter-repo + instead. (You can squelch this warning and five second pause with + export {}=1 )""".format(overrides[-1])) + import time + time.sleep(5) +filter_branch = UserInterfaceNightmare() +filter_branch.muck_stuff_up()