This is a re-implementation of BFG Repo Cleaner, with some changes...
New features:
* pruning unwanted objects streamlined (automatic repack) and made more robust
(BFG makes user repack manually, and while it provides instructions on how
to do so, it won't successfully remove large objects in cases like unpacked
refs, loose objects, or use of --no-blob-protection; the robustness details
are bugfixes, so are covered below.)
* pruning of commits which become empty (or become degenerate and empty)
* creation of new replace refs so folks can access new commits using old
(unabbreviated) commit hashes
* respects and uses grafts and replace refs in the rewrite to make them
permanent (this is half new feature, half bug fix; thus also mentioned
in bugfixes below)
* auto-update of commit encoding to utf-8 (as per fast-export's default;
could pass --preserve-commit-encoding to FilteringOptions.parse_args() if
this isn't wanted...)
Bug fixes:
* Works for both packfiles and loose objects
(With BFG, if you don't repack before running, large blobs may be retained.)
(With BFG, any files larger than core.bigFileThreshold are thus hard to
remove since they will not be packed by a gc or a repack.)
* Works for both packed-refs and loose refs
(As per BFG issue #221, BFG fails to properly walk history unless packed.)
* Works with replace refs
(BFG operates directly on packfiles and packed-refs, and does not
understand replace refs; see BFG issue #82)
* Updates both index and working tree at end of rewrite
(With BFG and --no-blob-protection, these are still left out-of-date. This
is a doubly-whammy principle-of-least-astonishment violation: (1) users
are likely to accidentally commit the "staged" changes, re-introducing the
large blobs or removed passwords, (2) even if they don't commit the
changes the index holding them will prevent gc from shrinking the repo.
Fixing these two glaring problems not only makes --no-blob-protection
safe to recommend, it makes it safe to make it the default.)
* Fixes the "protection" defaults
(With BFG, it won't rewrite the tree for HEAD; it can't reasonably switch
to doing so because of the bugs mentioned above with updating the index
and working tree. However, this behavior comes with a surprise for users:
if HEAD is "protected" because users should manually update it first, why
isn't that also true of the other branches? In my opinion, there's no
user-facing distinction that makes sense for such a difference in
handling. "Protecting" HEAD can also be an error-prone requirement for
users -- why do they have to manually edit all files the same way
--replace-text is doing and why do they have to risk dirty diffs if they
get it slightly different (or a useless and ugly empty commit if they
manage to get it right)? Finally, a third reason this was in my opinion a
bad default was that it works really poorly in conjunction with other
types of history rewrites, e.g. --subdirectory-filter,
--to-subdirectory-filter, --convert-to-git-lfs, --path-rename, etc. For
all three of these reasons, and the fixes mentioned above to make it safe,
--no-blob-protection is made the default.)
* Implements privacy improvements, defaulting to true
(As per BFG #139, one of the BFG maintainers notes problematic issues
with the "privacy" handling in BFG, suggesting options which could be
added to improve the story. I implemented those options, except that I
felt --private should be the default and made the various non-private
choices individual options; see the --use-* options.)
Other changes:
* Removed the --convert-to-git-lfs option
(As per BFG issues #116 and #215, and git-lfs issue #1589, handling LFS
conversion is poor in BFG and not recommended; other tools are suggested
even by the BFG authors.)
* Removed the --strip-biggest-blobs option
(I philosophically disagree with offering such an option when no
mechanism is provided to see what the N biggest blobs are. How is the
user supposed to select N? Even if they know they have three files
which have been large, they may be unaware of others in history. Even
if there aren't any other files in history and the user requests to
remove the largest three blobs, it might not be what they want: one of
the files might have had multiple versions, in which case their request
would only remove some versions of the largest file from history and
leave all versions of the second and third largest files as well as all
but three versions of the largest file. Finally, on a more minor note,
what is done in the case of a tie -- remove more than N, less than N, or
just pick one of the objects tieing for Nth largest at random? It's
ill-defined.)
...even with all these improvements, I think filter-repo is the better tool,
and thus I suggest folks use it. I have no plans to improve bfg-ish
further. However, bfg-ish serves as a nice demonstration of the ability to
use filter-repo to write different filtering tools, which was its purpose.
"""
"""
Please see the
***** API BACKWARD COMPATIBILITY CAVEAT *****
near the top of git-filter-repo.
"""
import argparse
import fnmatch
import os
import re
import subprocess
import tempfile
try:
import git_filter_repo as fr
except ImportError:
raise SystemExit("Error: Couldn't find git_filter_repo.py. Did you forget to make a symlink to git-filter-repo named git_filter_repo.py or did you forget to put the latter in your PYTHONPATH?")
help=("filter content of files, replacing matched text. Match expressions should be listed in the file, one expression per line - by default, each expression is treated as a literal, but 'regex:' & 'glob:' prefixes are supported, with '==>' to specify a replacement string other than the default of '***REMOVED***'."))
help=("Do not filter the trees for final commit of the specified refs, only in the history before those commits (by default, filtering options affect all commits, even those at ref tips). This is not recommended."))
help=("allow the BFG to modify even your *latest* commit. Not only is this highly recommended, it is the default. As such, this option does not actually do anything and is provided solely for compatibility with BFG. To undo this option, use --preserve-ref-tips and specify HEAD or the current branch name"))
help=("when updating commit hashes in commit messages also add a [formerly OLDHASH] text, possibly violating commit message line length guidelines and providing an inferior way to lookup old hashes (replace references are much preferred as git itself will understand them)"))
help=("append a `Former-commit-id:` footer to commit messages. This is an inferior way to lookup old hashes (replace references are much preferred as git itself will understand them)"))
help=("replace any removed file by a `<filename>.REMOVED.git-id` file. Makes history ugly as it litters it with replacement files for each one you want removed, but has a small chance of being useful if you find you pruned something incorrectly."))