Lots o' updates
parent
856e7ada33
commit
56d3009d41
@ -0,0 +1,51 @@
|
||||
Subject 1: filter-repo: history rewriting tool OR tool for writing history rewriting tools?
|
||||
Subject 2: filter-repo versatility
|
||||
|
||||
|
||||
Hi everyone,
|
||||
|
||||
A while ago, Jonathan expressed a worry that making filter-repo a core
|
||||
command could discourage experimentation with history-rewriting, much
|
||||
as he felt filter-branch did. So, I came up with a crazy idea to
|
||||
demonstrate why I think including filter-repo may actually do the
|
||||
opposite:
|
||||
|
||||
I re-wrote BFG and filter-branch as scripts on top of filter-repo.
|
||||
|
||||
You can see these scripts at t/t9392/git-bfgish and
|
||||
t/t9392/git-really-bad-idea in the filter-repo repository. In BFG's
|
||||
case, I left out BFG's nice post-run reports but believe I implemented
|
||||
everything else and lightly tested on a couple cases to verify I got
|
||||
the same results[1]. In filter-branch's case, what I implemented is
|
||||
technically not backwards compatible -- it creates trees and indexes
|
||||
with only the subset of files that changed in any given commit and
|
||||
without access to the full 'git-log' of history to that point. BUT,
|
||||
I've never seen a filter-branch invocation that made use of either in
|
||||
modifying history, so it gives results that match filter-branch for
|
||||
all practical intents and purposes[2].
|
||||
|
||||
Crazy? Genius? I don't know.
|
||||
|
||||
|
||||
|
||||
Maybe most people will just use filter-repo as a simple tool and I'm
|
||||
the only one interested in this kind of flexibility, but filter-repo
|
||||
is certainly far more versatile than filter-branch in addition to being
|
||||
faster and (in my opinion) having much better usability.
|
||||
|
||||
|
||||
|
||||
[1] Of course, I had to disable empty commit pruning, make sure to
|
||||
only match on file basenames rather than full paths, and needed to add
|
||||
the concept of blob protection in order to match, but none of this
|
||||
needed changes to filter-repo core. Only the match on blob size
|
||||
needed a change to filter-repo core, and it was relatively small and
|
||||
something I wanted anyway.
|
||||
|
||||
[2] Also, to match filter branch I did have to turn off automatic
|
||||
commit message updating, use less-accurate prune-empty logic, slow
|
||||
history rewriting to a crawl by forking zillions of shell commands
|
||||
(though it's still a lot faster than filter-branch), implement
|
||||
infuriating defaults, add usability pitfalls for users, etc., but it
|
||||
allowed me to compare end results (e.g. git show-ref) and verify
|
||||
identicalness.
|
@ -1,183 +0,0 @@
|
||||
----- Short version -----
|
||||
|
||||
As suggested by Ævar[1], I am proposing git repo-filter for inclusion
|
||||
in git.git. I hope that my documentation included in the repo-filter
|
||||
repository[2] can answer questions you have about it; if it does not,
|
||||
that may indicate I need to supplement its documentation. However, I
|
||||
am happy to answer any and all questions you may have about the tool;
|
||||
fire away.
|
||||
|
||||
|
||||
Basic Info:
|
||||
|
||||
git repo-filter is tool for rewriting history that includes some
|
||||
capabilities I have not found anywhere else. It is most similar to
|
||||
filter-branch, though it has a significantly different taste in
|
||||
usability. Also, being based on fast-export/fast-import, is orders of
|
||||
magnitude faster (it has speed roughly comparable to BFG repo cleaner,
|
||||
but isn't multi-threaded).
|
||||
|
||||
repo-filter is a ~2500 (FIXME) line single-file python script,
|
||||
depending only on the python standard library (and execution of git
|
||||
commands), all of which is designed to make build/installation
|
||||
trivial: you just need to copy it into your $PATH.
|
||||
|
||||
|
||||
[1] https://public-inbox.org/git/87r2fq3b9t.fsf@evledraar.gmail.com/
|
||||
[2] Currently tracked at https://github.com/newren/git-repo-filter,
|
||||
but the plan would be to instead point people at git.git if it is
|
||||
merged. (And if it is merged, the merge should just delete its
|
||||
antique fork of t/test-lib.sh and its README.md.)
|
||||
|
||||
|
||||
----- Intermediate length version -----
|
||||
|
||||
As suggested Ævar[1], I am proposing git repo-filter[2] for inclusion
|
||||
in git.git. There are a few issues that make me wonder if the git
|
||||
community will want it, which I've done my best to explain and address
|
||||
these below.
|
||||
|
||||
Sorry for the lengthy email; feel free to skim for whatever bits seem
|
||||
relevant to you.
|
||||
|
||||
|
||||
Basic background
|
||||
----------------
|
||||
|
||||
git repo-filter is tool for rewriting history. It has a significantly
|
||||
different taste in usability than filter-branch, and being based on
|
||||
fast-export/fast-import, is orders of magnitude faster (it has speed
|
||||
roughly comparable to BFG repo cleaner, but isn't multi-threaded). It
|
||||
includes some capabilities I have not found anywhere else.
|
||||
|
||||
|
||||
Important inclusion information
|
||||
-------------------------------
|
||||
|
||||
1. Build: No special build rules required; it's a single-file script
|
||||
to simplify build/installation. Its only dependencies are
|
||||
git and python. This python script only uses the python
|
||||
standard library, so no extra python packages are needed.
|
||||
|
||||
2. Tests: (FIXME) git-style end-to-end tests (using an ancient fork of
|
||||
test-lib.sh from git.git) are in use, making the inclusion
|
||||
into git trivial. There are also some python-style unit
|
||||
tests, though these are also invoked from a test in the
|
||||
end-to-end suite so no additional tooling is needed.
|
||||
|
||||
3. Documentation: (FIXME) Built-in help and git-style asciidoc man-page
|
||||
already included.
|
||||
|
||||
|
||||
Possible reasons to exclude from git.git
|
||||
----------------------------------------
|
||||
|
||||
1. Portability: repo-filter is written in Python, which I've heard
|
||||
is difficult for some platforms where git is run.
|
||||
|
||||
2. Maintainability/EOL decisions: repo-filter is (currently) written
|
||||
in Python 2 rather than Python 3.
|
||||
|
||||
3. User story: Since repo-filter will not and can not be backward
|
||||
compatible to filter-branch, we inevitably would have two tools
|
||||
for rewriting history. Some may see that as confusing to users,
|
||||
especially since I didn't just implement a slightly different
|
||||
feature set: I fixed usability warts by changing a few basic
|
||||
underlying assumptions.
|
||||
|
||||
|
||||
Counter-arguments against exclusion
|
||||
-----------------------------------
|
||||
|
||||
1) Portability:
|
||||
|
||||
1a) repo-filter only uses the python standard library, simplifying
|
||||
the porting story significantly.
|
||||
1b) repo-filter is a single file script. While it is even longer than
|
||||
git-send-email.perl, putting it on the big side, this does mean
|
||||
no special build instructions are needed.
|
||||
1c) repo-filter is not a daily-use tool, nor is it a collaboration
|
||||
tool. It's a tool that one person on your team uses once in
|
||||
maybe five years, then shares the results with everyone once. Thus,
|
||||
portability to esoteric platforms is perhaps less critical than it
|
||||
is for other components of git.
|
||||
|
||||
2) *shrug*. repo-filter was started by importing git-fast-filter[3]
|
||||
(which was in Python 2), and I haven't bothered porting. I have often
|
||||
worked with older enterprise distros, so I am a bit of a laggard with
|
||||
the Python 3 transition. If others find this worrisome, I can work on
|
||||
porting.
|
||||
|
||||
3) I've already made this email too long so I'll summarize; let me
|
||||
know if you want more detail. In short: repo-filter enables
|
||||
usage on repositories for which filter-branch is just completely
|
||||
impractical, and also has new capabilities that I cannot even
|
||||
emulate within filter-branch. But it's more than just that.
|
||||
While filter-branch is a nifty easy-to-use tool for a few very
|
||||
simple cases and has enough versatility to sometimes handle more
|
||||
complex cases, the the complexity increases rapidly and some of
|
||||
the underlying assumptions make for greater user confusion and/or
|
||||
cause problems in trying to use several different features for
|
||||
the same filtering operation. As such, I think a tool designed
|
||||
for larger filtering operations or less sophisticated users of
|
||||
necessity needs to change some basic things about how
|
||||
filter-branch operates, which implies it must be a new different
|
||||
tool.
|
||||
|
||||
|
||||
So...thoughts?
|
||||
|
||||
Thanks,
|
||||
Elijah
|
||||
|
||||
|
||||
[1] https://public-inbox.org/git/87r2fq3b9t.fsf@evledraar.gmail.com/
|
||||
[2] Currently tracked at https://github.com/newren/git-repo-filter,
|
||||
but the plan would be to instead point people at git.git if it's
|
||||
included.
|
||||
[3] https://public-inbox.org/git/51419b2c0904072035u1182b507o836a67ac308d32b9@mail.gmail.com/
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Background:
|
||||
Desire to combine, split-apart, or clean up repositories
|
||||
Examples: pgdev, nucleus, willamette
|
||||
Example, want:
|
||||
Only certain paths (a specific directory)
|
||||
move into a subdirectory
|
||||
rename tags to not conflict
|
||||
Filter-branch command (takes 65.950 seconds, or 15.594 seconds):
|
||||
|
||||
|
||||
time git filter-branch --tree-filter 'mkdir -p modules && git ls-files | grep -v ^src/main/java/com/palantir/annotation | xargs git rm -f -q && ls -d * | grep -v modules | xargs -I files mv files modules/' --tag-name-filter 'echo "table-helper-$(cat)"' --prune-empty -- --all
|
||||
|
||||
|
||||
Faster version (takes 37.802 seconds, or 6.287 seconds):
|
||||
|
||||
|
||||
time git filter-branch --index-filter 'git ls-files | grep -v ^src/main/java/com/palantir/annotation | xargs git rm -q --cached; git ls-files -s | sed "s-$(printf \\t)-&modules/-" | git update-index --index-info; git ls-files | grep -v ^modules/ | xargs -r git rm -q --cached' --tag-name-filter 'echo "table-helper-$(cat)"' --prune-empty -- --all
|
||||
|
||||
|
||||
Caveats:
|
||||
Really complicated to come up with
|
||||
Googled solutions may be subtly os- or case- specific (sed, xargs, '*' above)
|
||||
(I know git & bash & gnu vs. bsd, fixed filter-branch, etc.)
|
||||
Error Prone:
|
||||
mixing old and new history
|
||||
safety -- how to restore (refs/original hard; annotated tags may be missing)
|
||||
pruning of empty commits overeager
|
||||
Painful, but possible:
|
||||
selecting stuff to keep (as opposed to removing)
|
||||
renaming files
|
||||
figuring out what to remove (--analyze)
|
||||
shrinking (man-page is misleading...)
|
||||
Limiting:
|
||||
speed
|
||||
commit message rewriting
|
||||
Compare:
|
||||
|
||||
git repo-filter --analyze
|
||||
|
||||
time git repo-filter --path src/main/java/com/palantir/annotation --subdirectory-filter modules
|
@ -0,0 +1,14 @@
|
||||
filter-branch questions:
|
||||
https://stackoverflow.com/questions/53413645/filter-branch-wont-delete-orphan-branches
|
||||
https://stackoverflow.com/questions/53691547/keeping-history-of-splitted-repository-on-renamed-folder
|
||||
https://stackoverflow.com/questions/6638019/detach-subdirectory-that-was-renamed-into-a-new-repo
|
||||
https://stackoverflow.com/questions/53200708/retaining-original-folder-with-git-subdirectory-filter
|
||||
https://stackoverflow.com/questions/53502654/how-do-i-run-a-code-formatter-over-my-source-without-modifying-git-history
|
||||
https://stackoverflow.com/questions/52505480/how-can-i-convert-this-git-filter-branch-command-from-tree-filter-to-index-filte
|
||||
|
||||
|
||||
BFG questsion:
|
||||
https://stackoverflow.com/questions/54310566/cant-get-rid-of-a-big-file-in-gitlab-repository
|
||||
https://stackoverflow.com/questions/54139438/bfg-is-there-any-way-to-replace-text-on-files-on-a-specific-path
|
||||
https://stackoverflow.com/questions/53821522/before-git-push-how-can-we-delete-big-files-using-bfg-including-protected-dir
|
||||
https://stackoverflow.com/questions/50288203/github-cleaning-history-of-unwanted-files
|
Loading…
Reference in New Issue