git-filter-repo

Commit Graph

Author	SHA1	Message	Date
Elijah Newren	f164f2b2e6	Merge branch 'kf/fix-example-typo' into master Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Kate F	420aa32dac	git-filter-repo.txt: Fix typo for example Signed-off-by: Kate F <kate@elide.org>	5 years ago
Elijah Newren	3a394ca152	Makefile: a few sanity checks for releasing Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	9928b7cb3e	t9390: add missing '&&' in command chain Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	e11343e504	filter-repo: handle typechange modifications when first parent is pruned Commit `509a624b` (filter-repo: fix issue with pruning of empty commits, 2019-10-03) added code to get a new list of file changes when the first parent was pruned. However, this logic did not handle cases where one of the file modifications was a typechange. Add the necessary logic to handle that case. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	4f84a74ada	filter-repo: use more expensive prunability checks when needed When users are inserting new objects into the stream, we cannot make as many assumptions and need to do more careful checks for whether commits become empty or not. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	b1fae4819a	filter-repo: relax the definition of freshly packed transfer.unpackLimit defaults to 100, meaning that if less than 100 objects exist in the repository, git will automatically unpack the objects to be loose as part of the clone operation. So, if there are no packs and less than 100 objects, consider the repo to be freshly packed for purposes of our fresh clone sanity checks. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	fe33fc42b3	filter-repo: avoid dying with --analyze on commits with unseen parents analyze_commit() calls add_commit_and_parents() which does a sanity check that we have seen all parents previously. --refs breaks that assumption, so we need to workaround that check when ref limiting is in effect. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	46549e7d3f	lint-history: point people to issue with more linting examples Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	4c28ed6b8a	Merge branch 'sb/setup-idempotency' into master Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Sirio Balmelli	9cf87ae036	setup.py: test for FileExistsError on symlink Multiple runs of setuptools encounter a FileExistsError exception trying to re-symlink the same files. This exception is safe to ignore: the files were already symlinked so the call can be considered successful. Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>	5 years ago
Elijah Newren	b9c62540b7	filter-repo: fix cache of file renames Users may have long lists of --path, --path-rename, --path-regex, etc. flags (or even a --paths-from-file option with a lot of entries in the file). In such cases, we may have to compare any given path against a lot of different values. In order to avoid having to repeat that long list of comparisons every time a given path is updated, we long ago added a cache of the renames so that we can compute the new name for a path once and then just reuse it each time a new commit updates the old filepath. Sadly, I flubbed the implementation and instead of setting cache[oldname] = newname I somehow did the boneheaded cache[newname] = newname For most repositories and rewrites, this would just have the effect of making the cache useless, but it could wreak various kinds of havoc if a newname matched the oldname of some other file. Make sure we record the mapping from OLDNAME to newname to fix these issues. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	85c8e3660d	filter-repo: accelerate is_ancestor() for --analyze mode The --analyze mode was extremely slow for the freebsd/freebsd repo on github; digging in, the is_ancestor() function was being called a huge number of times -- about 22 times per commit on average (and about 17 million times overall). The analyze mode uses is_ancestor() to determine whether a rename equivalency class should be broken (i.e. renaming A->B mean all versions of A and B are just different versions of the same file, but if someone adds a new A in some commit which contains the A->B rename in its history then this equivalence class no longer holds). Each is_ancestor() call potentially has to walk a tree of dependencies all the way back to a sufficient depth where it can realize that the commit cannot be an ancestor; this can be a very long walk. We can speed this up by keeping track of some previous is_ancestor() results. If commit F is not an ancestor of commit G, then F cannot be an ancestor of children of G (unless that child has multiple parents; but even in that case F can only be an ancestor through one of the parents other than G). Similarly, if F is an ancestor of commit G, then F will always be an ancestor of any children of G. Cache results from previous calls to is_ancestor() and use them to accelerate subsequent calls. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	f2dccbc2ef	filter-repo: avoid repeatedly translating the same string with --analyze Translating "Processed %d blob sizes" or "Processed %d commits" hundreds of thousands or millions of times is a waste and turns out to be pretty expensive. Translate it once, cache the string, and then re-use it. Note that a similar issue was noted in commit `3999349be4` (filter-repo: fix perf regression; avoid excessive translation, 2019-05-21), but I did not think to check --analyze mode for similar issues back then. Fix it now. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	9d3d99593c	lint-history: avoid dying when we get file deletions When a file is deleted, there is nothing to lint, so we can just keep the deletion as-is. Reported-by: Thorben Kröger <dev@thorben.net> Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	4ea19c0bf8	filter-repo (README): streamline prerequisite wording a little bit Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	bcd9964537	filter-repo (README): link to upstream docs Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	96e217355c	Contributing.md: start with git guidelines, then mention exceptions Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	18f98295e4	git-filter-repo.txt: fix nested bullets to render correctly Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	1dae85ee9a	filter-repo: permit trailing slash for --[to-]subdirectory-filter argument There was code to allow the argument of --to-subdirectory-filter and --subdirectory-filter to have a trailing slash, but it was broken due to a bug in python3's bytestring design: b'somestring/'[-1] != b'/', despite that being the obvious expectation. One either has to compare b'somestring/'[-1:] to b'/' or else compare b'somestring/'[-1] to b'/'[0]. So lame. Note that this is essentially a follow-up to commit `385b0586ca` ("filter-repo (python3): bytestr splicing and iterating is different", 2019-04-27). Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	a1d20f8e77	INSTALL: a few small tweaks and clarifications Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	9d51a90648	filter-repo: fix pruning of empty commits with blob callbacks Blob callbacks, either implicit (via e.g. --replace-text) or explicit, can modify blobs in ways that make them match other blobs, which in turn can result in some commits becoming empty. We need to detect such cases and ensure we prune these empty commits when --prune-empty=auto. Reported-by: John Gietzen <john@gietzen.us> Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	3a3cd3d15e	git-filter-repo.txt: fix example of editing blob contents You can call bytes.replace() or re.sub(), but you can't call bytes.sub(). Oops. Fix the example in the documentation. Reported-by: John Gietzen <john@gietzen.us> Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	8994b4e55d	filter-repo: fix bad column label in path-all-sizes.txt report Reported-by: John Gietzen <john@gietzen.us> Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	5e04dff097	filter-repo: add new --no-ff option Some projects have a strict --no-ff merging policy. With the default behavior of --prune-degenerate, we can prune merge commits in a way that transforms the history into a fast-forward merge. Consider this example: * There are two independent commits or branches, named B & C, which are both built on top of A so that history look like this diagram: A \ \ \ B \ -C * Someone runs the following sequence of commands: * git checkout A * git merge --no-ff B * git merge --no-ff C * This will result in a history that looks like: A---AB---AC \ \ / / \ B / \ / -C- * Later, someone comes along and runs filter-repo, specifying to remove the only path(s) that were modified by B. That would naturally remove commit B and the no-longer-necessary merge commit AB. For someone using a strict no-ff policy, the desired history is A---AC \ / C However, the default handling for --prune-degenerate would notice that AC merely merges C into its own ancestor A, whereas the original AC merged C into something separate (namely, AB). So, it would say that AC has become degenerate and prune it, leaving the simple history of A \ C For projects not using a strict no-ff policy, this simpler history is probably better, but for folks that want a strict no-ff policy, it is unfortunate. Provide a --no-ff option to tweak the --prune-degenerate behavior so that it ignores the first parent being an ancestor of another parent (leaving the first parent unpruned even if it is or becomes degenerate in this fashion). Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	41787ff365	Merge branch 'kl/mailmap-corner-case-and-misc-fixes' into master Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Karl Lenz	caf85b68ec	filter-repo: allow --dry-run and --debug to be used together Prior to this commit, git-filter-repo could only be used with either the --dry-run flag or the --debug flag, not both. When run in debug mode, git-filter-repo expected to be able to read from the output stream, which obviously isn't created when doing a dry run, so it stack traced when it tried to use the non-existent output stream. This commit fixes that bug with an equally simple sanity check for the existence of the output stream when run in debug mode. Signed-off-by: Karl Lenz <xorangekiller@gmail.com>	5 years ago
Karl Lenz	780c74b218	filter-repo: parse mailmap entries with no email address The mailmap format parsed by the "git shortlog" command allows for matching mailmap entries with no email address. This is admittedly an edge case, because most Git commits will have an email address associated with them as well as a name, but technically the address isn't required, and "git shortlog" accommodates that in its mailmap format. This commit teaches git-filter-repo to do the same thing. Signed-off-by: Karl Lenz <xorangekiller@gmail.com>	5 years ago
Karl Lenz	5c960b5a64	.gitignore: ignore the test result directories Signed-off-by: Karl Lenz <xorangekiller@gmail.com>	5 years ago
Elijah Newren	99432eb5ef	Merge branch 'as/update-gpl-address' into master Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	7cfef09e9b	filter-repo: warn users who try to use invalid path components It's hard to be exhaustive, but if users try something like: --path-rename foo/bar/baz:. or --path ../other-dir then bad things happen. In the first case, filter-repo will try to ask fast-import to create a directory named '.' and move everything from foo/bar/baz/ into it but of course '.' is a reserved directory name so we can't create it. In the second case, they are probably running from a subdirectory, but filter-repo doesn't work from a subdirectory. I hard-coded the assumption that everything was in the toplevel directory and all paths were relative from there pretty early on. So, if the user tries to use any of these components anywhere, just throw an early error. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	3bdfa91768	Contributing.md: clarify reasons for using git.git submission guidelines Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	dab9386c47	contrib: update bfg-ish and filter-lamely with windows workaround In commit `f2729153` (filter-repo: workaround Windows' insistence that cwd not be a bytestring, 2019-10-19), filter-repo was made to use a special SubprocessWrapper class instead of the normal subprocess calls, due to what appears to be in bugs in the python implementation on Windows not working with arguments being bytestrings. Add the same workarounds to bfg-ish and filter-lamely. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	f9ebe6a3f7	filter-repo: avoid clobbering files whose names differ in case only git fast-import, in an attempt to be friendly, allows the same file to be specified multiple times within a commit and just takes the last version of the file listed. It determines whether files are the same via fspathncmp, which is defined differently depending on the setting of core.ignorecase. Unfortunately, this means that if someone is trying to do filtering of history and using a broken (case-insensitive) filesystem and the history they are filtering has some paths that differed in name only, then fast-import will delete whichever of the "colliding" files is listed first. Avoid these problems by just turning off core.ignorecase while fast-import is running. This will prevent silently modifying the repo in an unexpected way. Users on such filesystems may have difficulty checking out commits with files which differ in case only, but that is a separate problem for them to deal with after rewriting history. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	b1a35a3057	Merge branch 'jb/release-to-pypi' into master Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	525ecc8f8e	release: tweak packaging scripts for uploading to PyPI Clean up the PyPI dist packages, remove unnecessary files, and streamline the release process: * Avoid adding extra unnecessary files to the repo; setup.py is code and can copy the necessary files into place. * Make sure README.md is included so we don't get an UNKNOWN Description field. * Add a long_description_content_type to avoid parsing errors on the README.md file and rejecting the upload. * Define the license and platform fields so they don't show up as UNKNOWN either. * Remove unnecessary pyproject.toml. This makes sense for most python projects, but since I already have a Makefile with installation rules (because I'm trying to be more compatible with git.git just in case we ever get merged into it), the pyproject.toml file is somewhat duplicative. Sure, the Makefile won't specify the exact versions needed but...meh. * Split the release target of the Makefile into github_release and pypi_release substeps, to allow them to be run semi-independently. Make the pypi_release run a few more steps for me. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Julian Berman	6f4fc07d53	release: add packaging scripts for uploading to PyPI Signed-off-by: Julian Berman <Julian@GrayVines.com> Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	975419288b	Merge branch 'en/fix-empty-pruning-for-realz' into master Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	a9a93d9d83	filter-repo: actually fix issue with pruning of empty commits In commit `509a624` (filter-repo: fix issue with pruning of empty commits, 2019-10-03), it was noted that when the first parent is pruned away, then we need to generate a corrected list of file changes relative to the new first parent. Unfortunately, we did not apply our set of file filters to that new list of file changes, causing us to possibly introduce many unwanted files from the second parent into the history. The testcase added at the time was rather lax and totally missed this problem (which possibly exacerbated the original bug being fixed rather than helping). Tighten the testcase, and fix the error by filtering the generated list of file changes. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	2b32276ca3	filter-repo: move file filtering out of _tweak_commit() for re-use RepoFilter._tweak_commit() was a bit unwieldy, and we have a reason for wanting to re-use the file filtering logic in it, so break that out into a separate function. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Andreas Schneider	65103890d5	Update GPL license file The license file is outdated pointing to an incorrect FSF address. Signed-off-by: Andreas Schneider <asn@cryptomilk.org>	5 years ago
Elijah Newren	eec9b081ee	filter-repo: don't have analyze choke on typechange types The analyze mode will handle type changes (e.g. normal file to symlink) in combination with adds and modifies, but the similar logic below didn't allow for type changes in combination with renames. Fix the oversight. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	b56ca0437a	Contributing.md: clarify notes about PEP-8 Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	e1d126a1ea	Reference package managers in installation instructions Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	0590c4193d	contrib: clarify a few points of usage Make it clearer that absolute paths should not be used for pathnames within a git repository. Also, fix the comment about how the insert-beginning script could be implemented as a one-liner; the example commented-out code should have used bytestrings. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	117dd28883	Merge branch 'en/flesh-out-docs' into master The prerequisites and installation docs were not quite detailed enough, and no code of conduct or contribution guidelines were included. Flesh out the docs to cover these issues. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	d07a2fe2ea	Contributing.md: mention testsuite line coverage Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	64aa9359ed	run_coverage: prefer coverage3 to python3-coverage Some of the systems I ran on had a 'python3-coverage' and some had a 'coverage3' program. More were of the latter name, but more importantly, the upstream tarball only creates the latter name; apparently the former was just added by some distros. So, switch to the more official name of the program. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	b3eb2cf461	filter-repo (README): add code of conduct and contributing guidelines Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	1762b99573	Explain how to use a python3 executable not named "python3" Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago

1 2 3 4 5 ...

412 Commits (cefeef1c0a72aa9a928801815fbe330cc61a5774) All Branches Search

412 Commits (cefeef1c0a72aa9a928801815fbe330cc61a5774)

All Branches