Commit Graph

173 Commits (rebase-i-autosquash-rebase-merges-fails)
 

Author SHA1 Message Date
Elijah Newren a38eb1c3a3 Rename t9302 to t9391 6 years ago
Elijah Newren 37da3044d9 Final work on collab 6 years ago
Elijah Newren 234ed471f7 Initial work towards collab 6 years ago
Elijah Newren 3852bcfb87 Simpler import command for lib-usage; just one example 6 years ago
Elijah Newren cbfca1bb94 Make create_fast_export_output testcase work 6 years ago
Elijah Newren 6c4d15d4fd Add RepoFilter.finish() function for split RepoFilter cases 6 years ago
Elijah Newren 81d91522f9 Move original_id last so external callers can create objects without specifying 6 years ago
Elijah Newren 18686dc4ab WIP hacks that make splice_repo work 6 years ago
Elijah Newren 29ff2972eb FIXME: Workaround to the fact that splicing repo introduces unknown commits 6 years ago
Elijah Newren 6a2a724ea6 fixup (FIXME: find commit) -- sanity_check() doesn't need git_dir; it overwrites it 6 years ago
Elijah Newren 3534a6a877 Initial work towards supporting --source and --target 6 years ago
Elijah Newren 669e971fa0 fixup! Initial version of testing, copied from git.git 6 years ago
Elijah Newren 4fb71e76ea More WIP 6 years ago
Elijah Newren 203a6f312d repo-filter: allow RepoFilter class chaining
Allow each instance to be just input or just output so that we can splice
repos together or split one into multiple different repos.
6 years ago
Elijah Newren a5ca925731 WIP 6 years ago
Elijah Newren ff35c7b737 FIXME: Partial implementation of supporting blob callback 6 years ago
Elijah Newren 35be6e6028 WIP towards making lib-usage work with new style 6 years ago
Elijah Newren 82822cfa89 Temporary untracked files 6 years ago
Elijah Newren b16d416689 Allow RepoFilter.run to be passed callbacks 6 years ago
Elijah Newren 5e50f65692 Group high-level repo filtering functions into a class 6 years ago
Elijah Newren ae6f85e8ad Group repo analysis functions into a class 6 years ago
Elijah Newren ed955138c4 Move sanity_check to put analyze functions before filtering ones 6 years ago
Elijah Newren fdd769fb8a Collect various short functions into a GitUtils helper class 6 years ago
Elijah Newren aa3d1122e5 restructure argument parsing for re-use 6 years ago
Elijah Newren bdb766cacc perf hack -- avoid expensive empty pruning checks
If a commit was a non-merge commit previously, then since we do not do
any kind of blob modifications (or funny parent grafting), there is no
way for a filemodify instruction to introduce the same version of the
file that already existed in the parent, as such the only check we need
to do to determine whether a commit becomes empty is whether
file_changes is empty.  Subsequent more expensive checks can be skipped.
6 years ago
Elijah Newren d39d0ff7fb perf hack -- do minimal amount of quoting required by fast-import 6 years ago
Elijah Newren 909001cee8 Be more thorough about path quoting, and handle non-ascii 6 years ago
Elijah Newren 84d13364f8 Restructure empty pruning
Split a lot of the logic out into separate functions, and avoid
flattening parents when the original commit history itself had
redundant parents (such as --no-ff merges).
6 years ago
Elijah Newren 8ddaf4cef0 Track skipped/pruned commits 6 years ago
Elijah Newren 2dceeba37c Modify parse_optional_parent_ref to return original parent too
commits may not have any parents at all.  As such,
parse_optional_parent_ref() is used expecting that it will sometimes
return None.

Now, when commits are skipped, we have a scheme to translate anyone that
depends on such commits to instead depend on the nearest ancestor of
such commits.  If the entire ancestry of a commit was skipped along with
a comit, then that commit will be translated to None, which is
indistinguishable from there having been no parent to begin with.
Sometimes our scheme needs to distinguish between a commit that started
with no parents and one which ended up with no parents, so we need a way
to tell these apart.

Also, not knowing the original parent makes it hard for us to
determine if the original had the same weird topology that the current
commit does.  For example, it is possible for a merge commit to have
one parent be the ancestor of another (particularly when --no-ff is
passed to git merge), or even for a merge commit to have the same
commit used as both parents (if you use low-level commands to create
a crazy commit).  There are cases where the pruning of some commits
could cause either of these situations to arise, and it's useful to be
able to distinguish between intentionally "weird" history and history
that has been made weird due to other pruning, because the latter we
may have reason to do additional pruning on.
6 years ago
Elijah Newren d7ebca27ac Add a couple minor clarifications 6 years ago
Elijah Newren e4c0c2a29d Fix crazy timezone issues
Oh, boy, timezone +051800 exists in the wild.  Is that 0518 hours and 00
minutes?  Or 05 hours and 1800 minutes?  Or 051 hours and 800 minutes?
Attempt to do something sane with these broken commits that fast-import
barfs on.  Also, fix an old bug in the handling of ahead-of-UTC timezones.
6 years ago
Elijah Newren 067522c06c buffer subprocess stdout to significantly improve performance
Apparently, the default for subprocess stdout is unbuffered; switching
it to buffered yields a huge 40% speedup.  Doing this also exposes the
need to add fi_input.flush() calls, highlighting another performance
issue.  We may be able to have fewer such calls with some refactoring,
but that is a bigger separate change.  Just having them highlighted to
remind about them as a performance issue is good for now.
6 years ago
Elijah Newren e57e9aef96 Notify user we are writing reports at start, since it can take a while 6 years ago
Elijah Newren 841f7b4be8 Switch --analyze to use rev-list|diff-tree pipeline
As suggested by Peff, use rev-list & diff-tree to get the information we
need, instead of relying on fast-export (with some out-of-tree patches)
to get that information.

Parsing the quoted filename strings was slightly tricky.  See
https://stackoverflow.com/a/51904799 for discussion of codecs.  I didn't
do the final utf-8 conversion because of the following investigation:

    s = 'naïve \\t test'
now comparing
    ' '.join(hex(ord(x))[2:] for x in s)
    ' '.join(hex(ord(x))[2:] for x in codecs.decode(s, 'unicode_escape'))
    ' '.join(hex(ord(x))[2:] for x in codecs.decode(s, 'unicode_escape').encode('latin-1'))
    ' '.join(hex(ord(x))[2:] for x in codecs.decode(s, 'unicode_escape').encode('latin-1').decode('utf-8'))
I saw the following:
    '6e 61 c3 af 76 65 20 5c 74 20 74 65 73 74'
    '6e 61 c3 af 76 65 20 9 20 74 65 73 74'
    '6e 61 c3 af 76 65 20 9 20 74 65 73 74'
    '6e 61 ef 76 65 20 9 20 74 65 73 74'
also printing the four related strings from python shows:
    naïve \t test
    naïve 	 test
    naïve 	 test
    naïve 	 test

In other words, the 'unicode_escape' correctly translated 5c 74 ('\t')
into 09 (a tab character), the encoding into latin-1 didn't change any
bytes, but the final decode into utf-8 did.  Also, the translation into
latin-1 correctly prints the string so let's just keep it.
6 years ago
Elijah Newren f1813edbda Show progress parsing blob sizes 6 years ago
Elijah Newren 84e8da5d3d Add ProgressWriter class and switch FastExportFilter to it 6 years ago
Elijah Newren cd8615389b Add packed sizes to --analyze reports 6 years ago
Elijah Newren 3e0846198e Split analysis reports into separate files 6 years ago
Elijah Newren 7a48f4321b Handle tags pointing at commits pruned along with their history
If a tag points at a commit whose changes are all filtered out and thus
becomes empty and gets pruned, and all of its ancestors are likewise
pruned, then there is no need for the tag; just nuke it.
6 years ago
Elijah Newren 4d57be572b Add some preventative sanity checks 6 years ago
Elijah Newren 97eface514 Ensure we parse all merge parents, even if some became pruned 6 years ago
Elijah Newren 22fe167701 Add optional newline to make --dry-run output easier to parse 6 years ago
Elijah Newren ea06d423cc Add --subdirectory-filter and --to-subdirectory-filter 6 years ago
Elijah Newren 433fa39667 Add tag renaming 6 years ago
Elijah Newren 9457735069 Start revamping the --help page 6 years ago
Elijah Newren 6367604f96 Rename to repo-filter and strip old stuff 6 years ago
Elijah Newren 0b187cf667 Add README.md explaing new repo-filter tool 6 years ago
Elijah Newren ad59fffed0 Switchover to git test-lib.sh style testing 6 years ago
Elijah Newren 66bd860d5a Nuke git_fast_filter.py 6 years ago