You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
224 lines
11 KiB
Plaintext
224 lines
11 KiB
Plaintext
Before widely announcing:
|
|
- Cleanups:
|
|
- Remove usage of 'codecs' library; PathQuoting is better
|
|
- sanity_checks() call in run() should be moved to constructor
|
|
- is_bare should check for self._args.target repo
|
|
- orig_refs is also only relevant for self._args.target repo
|
|
- Notes on splitting
|
|
- exporter needs to know the pipe combination for commit message rewriting
|
|
- commit message rewriting gets weird if commits held in memory for later
|
|
- pruning gets weird too
|
|
- need to handle both 2 exports -> 1 import, 1 export -> 2 imports,
|
|
no exports (except one created manually) -> 1 import
|
|
- Test setup
|
|
- Fix lib-oriented tests
|
|
- Add several more tests, particularly around:
|
|
- commit pruning
|
|
- pruning commits that become empty
|
|
- pruning commits that started empty and have no parent
|
|
- not pruning commits that have changes or remain a merge commit
|
|
- pruning parent(s) of a merge
|
|
- coalescing common commits of a merge
|
|
- coalescing parents of a merge when one is an ancestor of the other
|
|
- ref pruning
|
|
- tags pointing at commits which are pruned along with their history
|
|
- refs pointing at commits which are pruned along with their history
|
|
- refs or tags behind a negative revision specification
|
|
- commit message rewriting
|
|
- renaming, particular when it causes collisions
|
|
- use coverage.py to direct test writing
|
|
- Check whether the version of git in use supports the appropriate flags
|
|
- Blob rewriting (BFG-like replacements); will need to update check-if-empty
|
|
- Rewrite history
|
|
- rename to git-repo-filter
|
|
- Change newren@palantir.com to newren@gmail.com
|
|
- remove stupid files
|
|
- rename t9302-fast-filter.sh to t9391-repo-filter-lib-usage.sh
|
|
- s/testcases/t/, for sucking into git
|
|
- do renames in analysis; modify file contents as necessary for those changes
|
|
- prefix commit messages with "repo-filter:"
|
|
- postfix messages with Signed-off-by (and add enewren@sandia.gov for Jim's)
|
|
|
|
Generate upstream patches:
|
|
- Tags of tags of commits fail to export:
|
|
- In git.git, try:
|
|
$ git fast-export --no-data --use-done-feature --signed-tags=strip \
|
|
--tag-of-filtered-object=rewrite-feature v1.0rc1 >/dev/null
|
|
fatal: tag 5f4cd4ca015dc795b9f7f4fed11b3f80a60ac175 tags unexported tag!
|
|
|
|
Bigger ideas
|
|
- 1st step, create local branches for each remote tracking branch:
|
|
git fetch . refs/remotes/origin/*:refs/heads/*
|
|
also, nuke refs/remotes/origin/*; it won't match upstream anyway
|
|
- Performance:
|
|
- Smarter record_remapping -- do it lazily
|
|
- Unnecessary re-computation of 'epoch' (calling fromtimestamp)
|
|
...and perhaps just unnecessary use of FixedTimeZone when most the time
|
|
it will not be checked or modified?
|
|
- What part of _parse_commit takes so much time?
|
|
- What part of commit.dump takes so much time?
|
|
- Speedup _parse_optional_filechange using str.split(None, 3) instead of re
|
|
- Which wait() are we waiting on?
|
|
- Smarter become-empty checks; only do more expensive checks if:
|
|
- First parent is no longer original first parent or ancestor thereof
|
|
- e.g. first-parent history empty, second parent becomes first parent
|
|
- e.g. --parent-filter causes some kind of graft operation (although
|
|
maybe we don't want to prune in this case anyway...)
|
|
- Blob filtering is active AND the only file_changes involved correspond
|
|
to filenames that have previously been modified.
|
|
- Regex optimization
|
|
- memoize (or just outright store?) filename remapping
|
|
- memoize net result: dequote -> do mods -> requote
|
|
- Work with submodules
|
|
- Important features
|
|
- paths-from-file (--paths-from-file <(git ls-tree -r HEAD)
|
|
- include-old-names-of-specified-files
|
|
- so users don't have to look for rename data from --analyze
|
|
- --use-mailmap (point to "MAPPING AUTHORS" in git-shortlog)
|
|
- Do git rev-list --count to get idea of amount of work; show progress
|
|
|
|
Left over bits:
|
|
- Fix up --analyze
|
|
* shouldn't allow running --analyze with negative refspecs
|
|
* add a --no-detect-renames option (for performance)
|
|
- renames & copies can cause commits to become empty
|
|
- metadata
|
|
- On second and subsequent runs, update metadata instead of overwriting
|
|
- for maps, give beginning_hash -> end_hash, not intermediate hashes
|
|
- OR error out if .git/repo-filter already created?
|
|
- error out if any progress messages in stream (can't deal with them unless
|
|
we can pass --cat-blob-fd to fast-import, and that seems non-portable)
|
|
|
|
More path stuff, maybe
|
|
--path-rename-regex
|
|
--path-stream-rename (invoked once; must read one line then print)
|
|
--path-stream-filter (invoked once per commit with new files)
|
|
--path-tree-filter
|
|
Ref stuff
|
|
--ref-rename
|
|
--ref-stream-rename
|
|
Blob filter
|
|
--tree-filter
|
|
|
|
|
|
Safety stuff
|
|
--keep-excluded-revisions
|
|
--keep-excluded-refs
|
|
--store-backup
|
|
--empty-pruning={no/off,auto,always/on}
|
|
--negative-refs={drop,reference}
|
|
|
|
Other things:
|
|
/ when implementing renames, check for collisions.
|
|
- add a filename_callback too, for just editing file names
|
|
- add --skip-cleanup (pruning, gc, etc.; keep reset --hard) for speed compare
|
|
- get rid of user-run fast-export & fast-import; don't want to have to
|
|
update two callsites.
|
|
- Nuke 'WIP' in commit messages
|
|
|
|
Late state stuff:
|
|
Naming
|
|
filter-repo (like filter-branch)
|
|
repo-filter (for preliminary version?)
|
|
|
|
|
|
|
|
Performance notes:
|
|
* On rails:
|
|
* 1) time git fast-export --show-original-ids --signed-tags=strip \
|
|
--tag-of-filtered-object=rewrite --no-data \
|
|
--use-done-feature --all >/dev/null
|
|
* 2) time git fast-export --show-original-ids --signed-tags=strip \
|
|
--tag-of-filtered-object=rewrite --no-data \
|
|
--use-done-feature --all >saved_output
|
|
* 3a) time git fast-export --show-original-ids --signed-tags=strip \
|
|
--tag-of-filtered-object=rewrite --no-data \
|
|
--use-done-feature --all \
|
|
| sed -e s/+051800/+0261/ >/dev/null
|
|
* 3b) time git fast-export --show-original-ids --signed-tags=strip \
|
|
--tag-of-filtered-object=rewrite --no-data \
|
|
--use-done-feature --all \
|
|
| stupid.py >/dev/null
|
|
* 4) time git fast-export --show-original-ids --signed-tags=strip \
|
|
--tag-of-filtered-object=rewrite --no-data \
|
|
--use-done-feature --all \
|
|
| sed -e s/+051800/+0261/ \
|
|
| git fast-import --force --quiet >/dev/null
|
|
* 5) time git repo-filter --invert-paths --path pushgems.rb
|
|
(with early quit right before removing unused refs)
|
|
* 6) time python -m cProfile -o repo-filter.profile \
|
|
~/floss/git-repo-filter/git-repo-filter \
|
|
--invert-paths --path pushgems.rb
|
|
* 7) time java -jar ~/Downloads/bfg-1.13.0.jar --delete-files pushgems.rb
|
|
|
|
|
|
1: 3.910 fast-export
|
|
2: 3.958 fast-export + save output
|
|
3: 4.128 fast-export + sed (but toss output)
|
|
3a: 4.234 fast-export + python stdin using 'for' iterator
|
|
3b: 4.189 fast-export + python stdin using readline
|
|
3c:27.796 fast-export + python from subprocess using readline
|
|
3d: 4.196 fast-export + python from subprocess using 'for' iterator
|
|
3e: 4.580 fast-export + python3 from subprocess using readline
|
|
3f: 5.334 fast-export + python3 from subprocess using 'for' iterator
|
|
3g: 4.264 fast-export + python from subprocess using readline & bufsize
|
|
4: 11.279 fast-export + sed + fast-import
|
|
5: 64.098 filter-repo
|
|
5: 35.914 filter-repo, after bufsize=-1 for subprocess stuff
|
|
6: 69.150 filter-repo run under cProfile
|
|
7: 20.155 bfg
|
|
|
|
Other Notes:
|
|
* cProfile:
|
|
python -m cProfile -o repo-filter.profile \
|
|
~/floss/git-repo-filter/git-repo-filter \
|
|
--invert-paths --path pushgems.rb
|
|
python
|
|
>>> import pstats
|
|
>>> p = pstats.Stats('repo-filter.profile')
|
|
>>> p.strip_dirs().sort_stats('cumtime').print_stats()
|
|
* reports 64.2% of time in readline()
|
|
* reports 37.0% of time under _advance_currentline
|
|
|
|
|
|
Argument parsing stuff:
|
|
# NOT YET IMPLEMENTED OPTIONS BELOW
|
|
misc.add_argument('--empty-pruning', choices=['always', 'auto', 'never'],
|
|
default='auto',
|
|
help='''The default, auto, will check if filtering
|
|
causes commits to become empty (have no file
|
|
changes and only have one parent) and prune them
|
|
if so. This pruning can also cause merge
|
|
commits to have fewer parents and possibly
|
|
become empty themselves, and thus be pruned.
|
|
Further, any branch or tag whose entire history
|
|
is pruned due to becoming empty will be pruned.
|
|
However, auto will not prune commits which
|
|
started out empty in the original repo and have
|
|
a non-pruned parent.''')
|
|
misc.add_argument('--store-backup', default=None,
|
|
metavar='NAMESPACE', dest='backup',
|
|
help='Store a copy of original refs under refs/NAMESPACE/')
|
|
misc.add_argument('--keep-excluded-refs', action='store_true',
|
|
help='''If refs are excluded either explicitly (e.g.
|
|
^master) or implicitly (e.g. a branch in the
|
|
history of an excluded ref/revision, or a branch
|
|
not listed in the set of revisions to filter),
|
|
then that ref will be deleted by the filtering
|
|
process. Use --keep-excluded-refs to retain
|
|
such refs.''')
|
|
|
|
misc.add_argument('--keep-excluded-revisions', action='store_true',
|
|
help='''If negative revisions are provided to exclude
|
|
the range of history we are filtering over (e.g.
|
|
negative_branch..master or ^negative_branch_1
|
|
^negative_branch_2 master develop), then by
|
|
default any commits in the history of those
|
|
revisions are excluded from the filtered history
|
|
(resulting in the first not-excluded commit in
|
|
history becoming a root commit and often
|
|
containing an unusually large number of file
|
|
changes). With --keep-excluded-revisions, those
|
|
commits are all retained (in their unfiltered
|
|
form).''')
|