Various todo-related files
commit
caffe46d77
@ -0,0 +1,197 @@
|
||||
Before widely announcing:
|
||||
- Notes on splitting
|
||||
- exporter needs to know the pipe combination for commit message rewriting
|
||||
- commit message rewriting gets weird if commits held in memory for later
|
||||
- pruning gets weird too
|
||||
- need to handle 1 export -> 2 imports
|
||||
- Test setup
|
||||
- Add several more tests, particularly around:
|
||||
- commit pruning
|
||||
- pruning commits that become empty
|
||||
- pruning commits that started empty and have no parent
|
||||
- not pruning commits that have changes or remain a merge commit
|
||||
- pruning parent(s) of a merge
|
||||
- coalescing common commits of a merge
|
||||
- coalescing parents of a merge when one is an ancestor of the other
|
||||
- ref pruning
|
||||
- tags pointing at commits which are pruned along with their history
|
||||
- refs pointing at commits which are pruned along with their history
|
||||
- refs or tags behind a negative revision specification
|
||||
- commit message rewriting
|
||||
- renaming, particular when it causes collisions
|
||||
- use coverage.py to direct test writing
|
||||
- Check whether the version of git in use supports the appropriate flags
|
||||
- Rewrite history
|
||||
- Remove tests from older commits until they would actually work
|
||||
|
||||
Generate upstream patches:
|
||||
- Tags of tags of commits fail to export:
|
||||
- In git.git, try:
|
||||
$ git fast-export --no-data --use-done-feature --signed-tags=strip \
|
||||
--tag-of-filtered-object=rewrite-feature v1.0rc1 >/dev/null
|
||||
fatal: tag 5f4cd4ca015dc795b9f7f4fed11b3f80a60ac175 tags unexported tag!
|
||||
|
||||
Bigger ideas
|
||||
- 1st step, create local branches for each remote tracking branch:
|
||||
git fetch . refs/remotes/origin/*:refs/heads/*
|
||||
also, nuke refs/remotes/origin/*; it won't match upstream anyway
|
||||
- Performance:
|
||||
- Smarter record_remapping -- do it lazily
|
||||
- Unnecessary re-computation of 'epoch' (calling fromtimestamp)
|
||||
...and perhaps just unnecessary use of FixedTimeZone when most the time
|
||||
it will not be checked or modified?
|
||||
- What part of _parse_commit takes so much time?
|
||||
- What part of commit.dump takes so much time?
|
||||
- Speedup _parse_optional_filechange using str.split(None, 3) instead of re
|
||||
- Which wait() are we waiting on?
|
||||
- Smarter become-empty checks; only do more expensive checks if:
|
||||
- First parent is no longer original first parent or ancestor thereof
|
||||
- e.g. first-parent history empty, second parent becomes first parent
|
||||
- e.g. --parent-filter causes some kind of graft operation (although
|
||||
maybe we don't want to prune in this case anyway...)
|
||||
- Blob filtering is active AND the only file_changes involved correspond
|
||||
to filenames that have previously been modified.
|
||||
- Regex optimization
|
||||
- memoize (or just outright store?) filename remapping
|
||||
- memoize net result: dequote -> do mods -> requote
|
||||
- Work with submodules
|
||||
- Important features
|
||||
- paths-from-file (--paths-from-file <(git ls-tree -r HEAD)
|
||||
- include-old-names-of-specified-files
|
||||
- so users don't have to look for rename data from --analyze
|
||||
- Do git rev-list --count to get idea of amount of work; show progress
|
||||
|
||||
Left over bits:
|
||||
- Fix up --analyze
|
||||
* shouldn't allow running --analyze with negative refspecs
|
||||
* add a --no-detect-renames option (for performance)
|
||||
- metadata
|
||||
- On second and subsequent runs, update metadata instead of overwriting
|
||||
- for maps, give beginning_hash -> end_hash, not intermediate hashes
|
||||
- OR error out if .git/repo-filter already created?
|
||||
- error out if any progress messages in stream (can't deal with them unless
|
||||
we can pass --cat-blob-fd to fast-import, and that seems non-portable)
|
||||
|
||||
More path stuff, maybe
|
||||
--path-rename-regex
|
||||
--path-stream-rename (invoked once; must read one line then print)
|
||||
--path-stream-filter (invoked once per commit with new files)
|
||||
--path-tree-filter
|
||||
Ref stuff
|
||||
--ref-rename
|
||||
--ref-stream-rename
|
||||
Blob filter
|
||||
--tree-filter
|
||||
|
||||
|
||||
Safety stuff
|
||||
--keep-excluded-revisions
|
||||
--keep-excluded-refs
|
||||
--store-backup
|
||||
--empty-pruning={no/off,auto,always/on}
|
||||
--negative-refs={drop,reference}
|
||||
|
||||
Other things:
|
||||
- add a filename_callback too, for just editing file names
|
||||
- add --skip-cleanup (pruning, gc, etc.; keep reset --hard) for speed compare
|
||||
- get rid of user-run fast-export & fast-import; don't want to have to
|
||||
update two callsites.
|
||||
|
||||
Performance notes:
|
||||
* On rails:
|
||||
* 1) time git fast-export --show-original-ids --signed-tags=strip \
|
||||
--tag-of-filtered-object=rewrite --no-data \
|
||||
--use-done-feature --all >/dev/null
|
||||
* 2) time git fast-export --show-original-ids --signed-tags=strip \
|
||||
--tag-of-filtered-object=rewrite --no-data \
|
||||
--use-done-feature --all >saved_output
|
||||
* 3a) time git fast-export --show-original-ids --signed-tags=strip \
|
||||
--tag-of-filtered-object=rewrite --no-data \
|
||||
--use-done-feature --all \
|
||||
| sed -e s/+051800/+0261/ >/dev/null
|
||||
* 3b) time git fast-export --show-original-ids --signed-tags=strip \
|
||||
--tag-of-filtered-object=rewrite --no-data \
|
||||
--use-done-feature --all \
|
||||
| stupid.py >/dev/null
|
||||
* 4) time git fast-export --show-original-ids --signed-tags=strip \
|
||||
--tag-of-filtered-object=rewrite --no-data \
|
||||
--use-done-feature --all \
|
||||
| sed -e s/+051800/+0261/ \
|
||||
| git fast-import --force --quiet >/dev/null
|
||||
* 5) time git repo-filter --invert-paths --path pushgems.rb
|
||||
(with early quit right before removing unused refs)
|
||||
* 6) time python -m cProfile -o repo-filter.profile \
|
||||
~/floss/git-repo-filter/git-repo-filter \
|
||||
--invert-paths --path pushgems.rb
|
||||
* 7) time java -jar ~/Downloads/bfg-1.13.0.jar --delete-files pushgems.rb
|
||||
|
||||
|
||||
1: 3.910 fast-export
|
||||
2: 3.958 fast-export + save output
|
||||
3: 4.128 fast-export + sed (but toss output)
|
||||
3a: 4.234 fast-export + python stdin using 'for' iterator
|
||||
3b: 4.189 fast-export + python stdin using readline
|
||||
3c:27.796 fast-export + python from subprocess using readline
|
||||
3d: 4.196 fast-export + python from subprocess using 'for' iterator
|
||||
3e: 4.580 fast-export + python3 from subprocess using readline
|
||||
3f: 5.334 fast-export + python3 from subprocess using 'for' iterator
|
||||
3g: 4.264 fast-export + python from subprocess using readline & bufsize
|
||||
4: 11.279 fast-export + sed + fast-import
|
||||
5: 64.098 filter-repo
|
||||
5: 35.914 filter-repo, after bufsize=-1 for subprocess stuff
|
||||
6: 69.150 filter-repo run under cProfile
|
||||
7: 20.155 bfg
|
||||
|
||||
Other Notes:
|
||||
* cProfile:
|
||||
python -m cProfile -o repo-filter.profile \
|
||||
~/floss/git-repo-filter/git-repo-filter \
|
||||
--invert-paths --path pushgems.rb
|
||||
python
|
||||
>>> import pstats
|
||||
>>> p = pstats.Stats('repo-filter.profile')
|
||||
>>> p.strip_dirs().sort_stats('cumtime').print_stats()
|
||||
* reports 64.2% of time in readline()
|
||||
* reports 37.0% of time under _advance_currentline
|
||||
|
||||
|
||||
Argument parsing stuff:
|
||||
# NOT YET IMPLEMENTED OPTIONS BELOW
|
||||
misc.add_argument('--empty-pruning', choices=['always', 'auto', 'never'],
|
||||
default='auto',
|
||||
help='''The default, auto, will check if filtering
|
||||
causes commits to become empty (have no file
|
||||
changes and only have one parent) and prune them
|
||||
if so. This pruning can also cause merge
|
||||
commits to have fewer parents and possibly
|
||||
become empty themselves, and thus be pruned.
|
||||
Further, any branch or tag whose entire history
|
||||
is pruned due to becoming empty will be pruned.
|
||||
However, auto will not prune commits which
|
||||
started out empty in the original repo and have
|
||||
a non-pruned parent.''')
|
||||
misc.add_argument('--store-backup', default=None,
|
||||
metavar='NAMESPACE', dest='backup',
|
||||
help='Store a copy of original refs under refs/NAMESPACE/')
|
||||
misc.add_argument('--keep-excluded-refs', action='store_true',
|
||||
help='''If refs are excluded either explicitly (e.g.
|
||||
^master) or implicitly (e.g. a branch in the
|
||||
history of an excluded ref/revision, or a branch
|
||||
not listed in the set of revisions to filter),
|
||||
then that ref will be deleted by the filtering
|
||||
process. Use --keep-excluded-refs to retain
|
||||
such refs.''')
|
||||
|
||||
misc.add_argument('--keep-excluded-revisions', action='store_true',
|
||||
help='''If negative revisions are provided to exclude
|
||||
the range of history we are filtering over (e.g.
|
||||
negative_branch..master or ^negative_branch_1
|
||||
^negative_branch_2 master develop), then by
|
||||
default any commits in the history of those
|
||||
revisions are excluded from the filtered history
|
||||
(resulting in the first not-excluded commit in
|
||||
history becoming a root commit and often
|
||||
containing an unusually large number of file
|
||||
changes). With --keep-excluded-revisions, those
|
||||
commits are all retained (in their unfiltered
|
||||
form).''')
|
@ -0,0 +1,96 @@
|
||||
#!/bin/bash
|
||||
|
||||
if [[ $# < 2 || $# > 3 ]]; then
|
||||
echo "Syntax:"
|
||||
echo " $0 REPO1 REPO2 [--summary]"
|
||||
exit 1
|
||||
fi
|
||||
repo1="$1"
|
||||
repo2="$2"
|
||||
detail=1
|
||||
if [ $# == 3 ]; then
|
||||
if [ $3 != "--summary" ]; then
|
||||
echo "Unrecognized argument: $3"
|
||||
exit 1
|
||||
fi
|
||||
detail=
|
||||
fi
|
||||
|
||||
if ( ! (cd "$repo1" && git rev-parse --git-dir > /dev/null) ); then
|
||||
echo "$repo1 is not a directory or does not have a git repository!"
|
||||
exit 1
|
||||
fi
|
||||
if ( ! (cd "$repo2" && git rev-parse --git-dir > /dev/null) ); then
|
||||
echo "$repo2 is not a directory or does not have a git repository!"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
tempfile=$(mktemp)
|
||||
|
||||
#
|
||||
# Compare branches for identicalness
|
||||
#
|
||||
diff -u <(cd "$repo1" && git show-ref -h --heads --tags) <(cd "$repo2" && git show-ref -h --heads --tags) > $tempfile
|
||||
if [ $? != 0 ]; then
|
||||
echo -n "Branches & tags do not match"
|
||||
if test $detail; then
|
||||
echo "; differences:"
|
||||
cat $tempfile
|
||||
else
|
||||
echo "."
|
||||
fi
|
||||
else
|
||||
echo "* Branches and tags match exactly"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
#
|
||||
# Compare branch names
|
||||
#
|
||||
diff -u <(cd "$repo1" && git for-each-ref --format="%(refname)" | grep refs/heads/) <(cd "$repo2" && git for-each-ref --format="%(refname)" | grep refs/heads/) > $tempfile
|
||||
if [ $? != 0 ]; then
|
||||
echo -n "Branch names do not match"
|
||||
if test $detail; then
|
||||
echo "; differences:"
|
||||
cat $tempfile
|
||||
else
|
||||
echo "."
|
||||
fi
|
||||
else
|
||||
echo "* Branch names match"
|
||||
fi
|
||||
|
||||
#
|
||||
# Compare trees of branches
|
||||
#
|
||||
diff -u <(cd "$repo1" && git rev-parse $(git for-each-ref --format="%(refname)" | grep refs/heads/ | sed -e s/$/^{tree}/)) <(cd "$repo2" && git rev-parse $(git for-each-ref --format="%(refname)" | grep refs/heads/ | sed -e s/$/^{tree}/)) > $tempfile
|
||||
if [ $? != 0 ]; then
|
||||
echo -n "Trees of branches do not match"
|
||||
if test $detail; then
|
||||
echo "; differences:"
|
||||
cat $tempfile
|
||||
else
|
||||
echo "."
|
||||
fi
|
||||
else
|
||||
echo "* Trees of branches match"
|
||||
fi
|
||||
|
||||
#
|
||||
# Compare number of commits on each branch
|
||||
#
|
||||
diff -u <(cd "$repo1" && for i in $(git for-each-ref --format="%(refname)" | grep refs/heads/); do count=$(git rev-list $i | wc -l); printf "%5d %s\n" $count $i; done) <(cd "$repo2" && for i in $(git for-each-ref --format="%(refname)" | grep refs/heads/); do count=$(git rev-list $i | wc -l); printf "%5d %s\n" $count $i; done) > $tempfile
|
||||
if [ $? != 0 ]; then
|
||||
echo -n "Branch commit counts do not match"
|
||||
if test $detail; then
|
||||
echo "; differences:"
|
||||
cat $tempfile
|
||||
else
|
||||
echo "."
|
||||
fi
|
||||
else
|
||||
echo "* Branch commit counts match"
|
||||
fi
|
||||
|
||||
|
||||
rm $tempfile
|
@ -0,0 +1,102 @@
|
||||
Background:
|
||||
Desire to combine, split-apart, or clean up repositories
|
||||
Examples: pgdev, nucleus, willamette
|
||||
Example, want:
|
||||
Only certain paths (a specific directory)
|
||||
move into a subdirectory
|
||||
rename tags to not conflict
|
||||
Filter-branch command (takes 65.950 seconds, or 15.594 seconds):
|
||||
|
||||
|
||||
time git filter-branch --tree-filter 'mkdir -p modules && git ls-files | grep -v ^src/main/java/com/palantir/annotation | xargs git rm -f -q && ls -d * | grep -v modules | xargs -I files mv files modules/' --tag-name-filter 'echo "table-helper-$(cat)"' --prune-empty -- --all
|
||||
|
||||
|
||||
Faster version (takes 37.802 seconds, or 6.287 seconds):
|
||||
|
||||
|
||||
time git filter-branch --index-filter 'git ls-files | grep -v ^src/main/java/com/palantir/annotation | xargs git rm -q --cached; git ls-files -s | sed "s-$(printf \\t)-&modules/-" | git update-index --index-info; git ls-files | grep -v ^modules/ | xargs -r git rm -q --cached' --tag-name-filter 'echo "table-helper-$(cat)"' --prune-empty -- --all
|
||||
|
||||
|
||||
Caveats:
|
||||
Really complicated to come up with
|
||||
Googled solutions may be subtly os- or case- specific (sed, xargs, '*' above)
|
||||
(I know git & bash & gnu vs. bsd, fixed filter-branch, etc.)
|
||||
Error Prone:
|
||||
mixing old and new history
|
||||
safety -- how to restore (refs/original hard; annotated tags may be missing)
|
||||
pruning of empty commits overeager
|
||||
Painful, but possible:
|
||||
selecting stuff to keep (as opposed to removing)
|
||||
renaming files
|
||||
figuring out what to remove (--analyze)
|
||||
shrinking (man-page is misleading...)
|
||||
Limiting:
|
||||
speed
|
||||
commit message rewriting
|
||||
Compare:
|
||||
|
||||
git repo-filter --analyze
|
||||
|
||||
time git repo-filter --path src/main/java/com/palantir/annotation --subdirectory-filter modules
|
||||
|
||||
|
||||
|
||||
**********************************************************************
|
||||
|
||||
Before demo tomorrow:
|
||||
Submit git patch
|
||||
Come up with basic demo and what to discuss
|
||||
|
||||
issues:
|
||||
common:
|
||||
no up-front report to help find what to remove
|
||||
painful to select things to keep
|
||||
shrinking is extra painful step
|
||||
|
||||
git-filter-branch issues:
|
||||
doesn't rewrite commit messages
|
||||
slow
|
||||
mixes old and new history (& needs help to remove big objects)
|
||||
pruning of empty commits is possible but overbearing hammer
|
||||
painful to rename
|
||||
safety: if using '--tag-name-filter cat', annotated tags NOT backed up
|
||||
|
||||
bfg:
|
||||
cannot rename
|
||||
does not prune empty commits
|
||||
|
||||
git-filter-branch:
|
||||
|
||||
65.950 time git filter-branch --tree-filter 'mkdir -p modules && git ls-files | grep -v ^src/main/java/com/palantir/annotation | xargs git rm -f -q && ls -d * | grep -v modules | xargs -I files mv files modules/' --tag-name-filter 'echo "table-helper-$(cat)"' --prune-empty -- --all
|
||||
|
||||
37.802 time git filter-branch --index-filter 'git ls-files | grep -v ^src/main/java/com/palantir/annotation | xargs git rm -q --cached; git ls-files -s | sed "s-$(printf \\t)-&modules/-" | git update-index --index-info; git ls-files | grep -v ^modules/ | xargs git rm -q --cached' --tag-name-filter 'echo "table-helper-$(cat)"' --prune-empty -- --all
|
||||
|
||||
|
||||
time git clone ../whatever newcopy
|
||||
du -ks .git
|
||||
git for-each-ref --format="delete %(refname)" refs/tags/ | grep -v refs/tags/foo- | git update-ref --stdin
|
||||
time git gc --prune=now
|
||||
du -ks .git
|
||||
|
||||
0.660 time git repo-filter --path src/main/java/com/palantir/annotation --path-rename :modules/
|
||||
|
||||
git for-each-ref --format="delete %(refname)" refs/tags/ | grep -v refs/tags/foo- | git update-ref --stdin
|
||||
|
||||
git for-each-ref --format="delete %(refname)" refs/original/ | git update-ref --stdin
|
||||
git reflog expire --expire=now --all
|
||||
git gc --prune=now
|
||||
|
||||
BFG:
|
||||
bfg --delete-from <(git rev-list --objects --all | awk {print\$2} | grep -v ^$ | sort | uniq | grep -v $DIR_OF_INTEREST)
|
||||
git fetch . refs/tags/*:refs/tags/foo-*
|
||||
git show-ref --tags | awk {print\$2} | grep -v refs/tags/foo- | sed -e 's/^/delete /' | git update-ref --stdin
|
||||
git reflog expire --expire-unreachable=now
|
||||
git gc --prune=now
|
||||
|
||||
***************************************************************************
|
||||
|
||||
rails:
|
||||
5252.036 time git filter-branch --tree-filter 'rm -f pushgems.rb' --tag-name-filter cat -- --all
|
||||
1962.735 time git filter-branch --index-filter 'git rm --cached --ignore-unmatch pushgems.rb' --tag-name-filter cat -- --all
|
||||
39.715 time git repo-filter --invert-paths --path pushgems.rb
|
||||
33.169 <same, but with early exit>
|
@ -0,0 +1,48 @@
|
||||
git-filter-branch
|
||||
Ease of use differences in usability:
|
||||
Easier path selection and renaming
|
||||
Rewrite sha1sums (and abbreviations) in commit messages
|
||||
Defaults to pruning empty commits (but only BECOME empty commits)
|
||||
- (Technical notes, on kinds of empty:
|
||||
- Empty due to blob filtering resulting in later patch becoming empty
|
||||
- Empty due to path filtering
|
||||
- Empty branch causing merge to lose parent(s) -- 3 styles
|
||||
- One or more parents had no changes themselves or in their history
|
||||
- Most recent non-empty commit on all branches was either the
|
||||
merge-base or an ancestor (i.e. keeping the merge commit would
|
||||
mean merging a commit with itself)
|
||||
- Most recent non-empty commit on one parent's side of history is
|
||||
an ancestor of another parent (i.e. that side no longer has any
|
||||
interesting changes, and the parent corresponding to the empty
|
||||
side should be removed)
|
||||
- Empty ref due to entire history before it being empty
|
||||
Deletes stuff not requested in the rewrite (unless overridden), so that
|
||||
it doesn't confuse user or accidentally get re-pushed
|
||||
Typically far faster to execute
|
||||
Bails if not in a clean clone by default
|
||||
- Users have a far easier time restoring if they can just nuke the clone
|
||||
- Avoids the default need for users to mess with backups of original refs
|
||||
(either for restoration, or for pruning to make sure repo is clean)
|
||||
Repacks and shrinks repo for you (unless overridden)
|
||||
- Makes it easier to ensure you've cleaned out unwanted stuff
|
||||
|
||||
Advantages over git-repo-filter:
|
||||
- Filters every file once per revision even if unmodified between commits;
|
||||
allows filtering differently for different commits.
|
||||
|
||||
|
||||
|
||||
bfg repo-cleaner
|
||||
Ease of use differences in usability:
|
||||
Automatic repack and shrink repo (instead of documenting extra steps)
|
||||
No stupid 'fix your current branch first manually, then run'
|
||||
Pathname inclusion, not just exclusion
|
||||
Full pathname matching, instead of just *basename* (globs for basename)
|
||||
|
||||
Capability differences:
|
||||
Prunes commits which become empty due to filtering
|
||||
Lots of general filtering options outside of removing a few big files
|
||||
|
||||
Advantages of BFG repo cleaner:
|
||||
- Very focused on just removing crazy big files, and sensitive data
|
||||
-
|
@ -0,0 +1,65 @@
|
||||
rails (git clone https://github.com/rails/rails)
|
||||
Timings of: time git repo-filter --invert-paths --path pushgems.rb
|
||||
|
||||
64.098 Starting point
|
||||
35.914 After using bufsize=-1 on output only subprocess stuff
|
||||
27.777 After removing fi_input/fi_output write/read for sha1sum mapping
|
||||
20.980 After removing fi_input/fi_output write/read for check_merge_if_empty
|
||||
|
||||
Other important factors:
|
||||
|
||||
Am I calling is_ancestor too much? (Only call with pruned parents)
|
||||
Unnecessary re-computation of 'epoch' (calling fromtimestamp)
|
||||
Excessive calls to re.compile
|
||||
Why is posix.waitpid so long?
|
||||
Can parse_user be sped up by if..endswith rather than try..except?
|
||||
Memoize filename remapping in order to spead up tweak_commit?
|
||||
|
||||
ncalls tottime percall cumtime percall filename:lineno(function)
|
||||
83488 1.830 0.000 19.022 0.000 git-repo-filter:989(_parse_commit)
|
||||
33314 1.192 0.000 1.650 0.000 git-repo-filter:123(is_ancestor)
|
||||
997617 1.108 0.000 1.108 0.000 {method 'match' of '_sre.SRE_Pattern' objects}
|
||||
1020663 1.083 0.000 1.083 0.000 {method 'readline' of 'file' objects}
|
||||
334486 0.995 0.000 1.535 0.000 {built-in method fromtimestamp}
|
||||
1081102 0.985 0.000 0.991 0.000 re.py:230(_compile)
|
||||
83476 0.902 0.000 3.564 0.000 git-repo-filter:490(dump)
|
||||
11 0.855 0.078 0.855 0.078 {posix.waitpid}
|
||||
417904 0.803 0.000 2.685 0.000 git-repo-filter:807(_parse_optional_filechange)
|
||||
167255 0.640 0.000 1.066 0.000 git-repo-filter:56(__init__)
|
||||
167255 0.586 0.000 3.186 0.000 git-repo-filter:871(_parse_user)
|
||||
997598 0.560 0.000 2.589 0.000 re.py:138(match)
|
||||
1284279 0.529 0.000 0.529 0.000 {method 'write' of 'file' objects}
|
||||
167231 0.492 0.000 1.629 0.000 git-repo-filter:42(_write_date)
|
||||
668972 0.485 0.000 0.485 0.000 git-repo-filter:83(dst)
|
||||
83488 0.463 0.000 1.006 0.000 git-repo-filter:2255(tweak_commit)
|
||||
334394 0.428 0.000 0.654 0.000 git-repo-filter:410(dump)
|
||||
83488 0.417 0.000 0.674 0.000 collections.py:50(__init__)
|
||||
1020663 0.408 0.000 1.492 0.000 git-repo-filter:766(_advance_currentline)
|
||||
83488 0.353 0.000 0.377 0.000 {method 'sub' of '_sre.SRE_Pattern' objects}
|
||||
334416 0.331 0.000 0.439 0.000 git-repo-filter:381(__init__)
|
||||
1796776 0.304 0.000 0.304 0.000 {method 'startswith' of 'str' objects}
|
||||
1 0.271 0.271 19.961 19.961 git-repo-filter:1367(run)
|
||||
100432 0.260 0.000 0.618 0.000 git-repo-filter:784(_parse_optional_parent_ref)
|
||||
334416 0.254 0.000 0.497 0.000 git-repo-filter:2267(newname)
|
||||
|
||||
|
||||
Python commands:
|
||||
|
||||
$ python -m cProfile -o repo-filter.profile \
|
||||
~/floss/git-repo-filter/git-repo-filter \
|
||||
--invert-paths --path pushgems.rb
|
||||
|
||||
Just showing basic stats ('cumtime' and 'tottime' seem to be what matter):
|
||||
import pstats
|
||||
p = pstats.Stats('repo-filter.profile')
|
||||
p.strip_dirs().sort_stats('cumtime').print_stats()
|
||||
|
||||
Writing to some other string instead of stdout:
|
||||
a = cStringIO.StringIO()
|
||||
p = pstats.Stats('repo-filter.profile', stream=a)
|
||||
p.strip_dirs().sort_stats('tottime').print_stats()
|
||||
|
||||
Get various data out of the written output
|
||||
lines = a.getvalue().splitlines()[7:-2]
|
||||
sum(float(line.split(None, 5)[1]) for line in lines)
|
||||
print('\n'.join(' '.join(line.split(None, 5)[1:6:4]) for line in lines))
|
@ -0,0 +1,183 @@
|
||||
----- Short version -----
|
||||
|
||||
As suggested by Ævar[1], I am proposing git repo-filter for inclusion
|
||||
in git.git. I hope that my documentation included in the repo-filter
|
||||
repository[2] can answer questions you have about it; if it does not,
|
||||
that may indicate I need to supplement its documentation. However, I
|
||||
am happy to answer any and all questions you may have about the tool;
|
||||
fire away.
|
||||
|
||||
|
||||
Basic Info:
|
||||
|
||||
git repo-filter is tool for rewriting history that includes some
|
||||
capabilities I have not found anywhere else. It is most similar to
|
||||
filter-branch, though it has a significantly different taste in
|
||||
usability. Also, being based on fast-export/fast-import, is orders of
|
||||
magnitude faster (it has speed roughly comparable to BFG repo cleaner,
|
||||
but isn't multi-threaded).
|
||||
|
||||
repo-filter is a ~2500 (FIXME) line single-file python script,
|
||||
depending only on the python standard library (and execution of git
|
||||
commands), all of which is designed to make build/installation
|
||||
trivial: you just need to copy it into your $PATH.
|
||||
|
||||
|
||||
[1] https://public-inbox.org/git/87r2fq3b9t.fsf@evledraar.gmail.com/
|
||||
[2] Currently tracked at https://github.com/newren/git-repo-filter,
|
||||
but the plan would be to instead point people at git.git if it is
|
||||
merged. (And if it is merged, the merge should just delete its
|
||||
antique fork of t/test-lib.sh and its README.md.)
|
||||
|
||||
|
||||
----- Intermediate length version -----
|
||||
|
||||
As suggested Ævar[1], I am proposing git repo-filter[2] for inclusion
|
||||
in git.git. There are a few issues that make me wonder if the git
|
||||
community will want it, which I've done my best to explain and address
|
||||
these below.
|
||||
|
||||
Sorry for the lengthy email; feel free to skim for whatever bits seem
|
||||
relevant to you.
|
||||
|
||||
|
||||
Basic background
|
||||
----------------
|
||||
|
||||
git repo-filter is tool for rewriting history. It has a significantly
|
||||
different taste in usability than filter-branch, and being based on
|
||||
fast-export/fast-import, is orders of magnitude faster (it has speed
|
||||
roughly comparable to BFG repo cleaner, but isn't multi-threaded). It
|
||||
includes some capabilities I have not found anywhere else.
|
||||
|
||||
|
||||
Important inclusion information
|
||||
-------------------------------
|
||||
|
||||
1. Build: No special build rules required; it's a single-file script
|
||||
to simplify build/installation. Its only dependencies are
|
||||
git and python. This python script only uses the python
|
||||
standard library, so no extra python packages are needed.
|
||||
|
||||
2. Tests: (FIXME) git-style end-to-end tests (using an ancient fork of
|
||||
test-lib.sh from git.git) are in use, making the inclusion
|
||||
into git trivial. There are also some python-style unit
|
||||
tests, though these are also invoked from a test in the
|
||||
end-to-end suite so no additional tooling is needed.
|
||||
|
||||
3. Documentation: (FIXME) Built-in help and git-style asciidoc man-page
|
||||
already included.
|
||||
|
||||
|
||||
Possible reasons to exclude from git.git
|
||||
----------------------------------------
|
||||
|
||||
1. Portability: repo-filter is written in Python, which I've heard
|
||||
is difficult for some platforms where git is run.
|
||||
|
||||
2. Maintainability/EOL decisions: repo-filter is (currently) written
|
||||
in Python 2 rather than Python 3.
|
||||
|
||||
3. User story: Since repo-filter will not and can not be backward
|
||||
compatible to filter-branch, we inevitably would have two tools
|
||||
for rewriting history. Some may see that as confusing to users,
|
||||
especially since I didn't just implement a slightly different
|
||||
feature set: I fixed usability warts by changing a few basic
|
||||
underlying assumptions.
|
||||
|
||||
|
||||
Counter-arguments against exclusion
|
||||
-----------------------------------
|
||||
|
||||
1) Portability:
|
||||
|
||||
1a) repo-filter only uses the python standard library, simplifying
|
||||
the porting story significantly.
|
||||
1b) repo-filter is a single file script. While it is even longer than
|
||||
git-send-email.perl, putting it on the big side, this does mean
|
||||
no special build instructions are needed.
|
||||
1c) repo-filter is not a daily-use tool, nor is it a collaboration
|
||||
tool. It's a tool that one person on your team uses once in
|
||||
maybe five years, then shares the results with everyone once. Thus,
|
||||
portability to esoteric platforms is perhaps less critical than it
|
||||
is for other components of git.
|
||||
|
||||
2) *shrug*. repo-filter was started by importing git-fast-filter[3]
|
||||
(which was in Python 2), and I haven't bothered porting. I have often
|
||||
worked with older enterprise distros, so I am a bit of a laggard with
|
||||
the Python 3 transition. If others find this worrisome, I can work on
|
||||
porting.
|
||||
|
||||
3) I've already made this email too long so I'll summarize; let me
|
||||
know if you want more detail. In short: repo-filter enables
|
||||
usage on repositories for which filter-branch is just completely
|
||||
impractical, and also has new capabilities that I cannot even
|
||||
emulate within filter-branch. But it's more than just that.
|
||||
While filter-branch is a nifty easy-to-use tool for a few very
|
||||
simple cases and has enough versatility to sometimes handle more
|
||||
complex cases, the the complexity increases rapidly and some of
|
||||
the underlying assumptions make for greater user confusion and/or
|
||||
cause problems in trying to use several different features for
|
||||
the same filtering operation. As such, I think a tool designed
|
||||
for larger filtering operations or less sophisticated users of
|
||||
necessity needs to change some basic things about how
|
||||
filter-branch operates, which implies it must be a new different
|
||||
tool.
|
||||
|
||||
|
||||
So...thoughts?
|
||||
|
||||
Thanks,
|
||||
Elijah
|
||||
|
||||
|
||||
[1] https://public-inbox.org/git/87r2fq3b9t.fsf@evledraar.gmail.com/
|
||||
[2] Currently tracked at https://github.com/newren/git-repo-filter,
|
||||
but the plan would be to instead point people at git.git if it's
|
||||
included.
|
||||
[3] https://public-inbox.org/git/51419b2c0904072035u1182b507o836a67ac308d32b9@mail.gmail.com/
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Background:
|
||||
Desire to combine, split-apart, or clean up repositories
|
||||
Examples: pgdev, nucleus, willamette
|
||||
Example, want:
|
||||
Only certain paths (a specific directory)
|
||||
move into a subdirectory
|
||||
rename tags to not conflict
|
||||
Filter-branch command (takes 65.950 seconds, or 15.594 seconds):
|
||||
|
||||
|
||||
time git filter-branch --tree-filter 'mkdir -p modules && git ls-files | grep -v ^src/main/java/com/palantir/annotation | xargs git rm -f -q && ls -d * | grep -v modules | xargs -I files mv files modules/' --tag-name-filter 'echo "table-helper-$(cat)"' --prune-empty -- --all
|
||||
|
||||
|
||||
Faster version (takes 37.802 seconds, or 6.287 seconds):
|
||||
|
||||
|
||||
time git filter-branch --index-filter 'git ls-files | grep -v ^src/main/java/com/palantir/annotation | xargs git rm -q --cached; git ls-files -s | sed "s-$(printf \\t)-&modules/-" | git update-index --index-info; git ls-files | grep -v ^modules/ | xargs -r git rm -q --cached' --tag-name-filter 'echo "table-helper-$(cat)"' --prune-empty -- --all
|
||||
|
||||
|
||||
Caveats:
|
||||
Really complicated to come up with
|
||||
Googled solutions may be subtly os- or case- specific (sed, xargs, '*' above)
|
||||
(I know git & bash & gnu vs. bsd, fixed filter-branch, etc.)
|
||||
Error Prone:
|
||||
mixing old and new history
|
||||
safety -- how to restore (refs/original hard; annotated tags may be missing)
|
||||
pruning of empty commits overeager
|
||||
Painful, but possible:
|
||||
selecting stuff to keep (as opposed to removing)
|
||||
renaming files
|
||||
figuring out what to remove (--analyze)
|
||||
shrinking (man-page is misleading...)
|
||||
Limiting:
|
||||
speed
|
||||
commit message rewriting
|
||||
Compare:
|
||||
|
||||
git repo-filter --analyze
|
||||
|
||||
time git repo-filter --path src/main/java/com/palantir/annotation --subdirectory-filter modules
|
Loading…
Reference in New Issue