git-filter-repo

mirror of https://github.com/newren/git-filter-repo.git synced 2024-11-19 03:25:33 +00:00

Author	SHA1	Message	Date
Elijah Newren	a1d20f8e77	INSTALL: a few small tweaks and clarifications Signed-off-by: Elijah Newren <newren@gmail.com>	2020-01-11 12:14:01 -08:00
Elijah Newren	9d51a90648	filter-repo: fix pruning of empty commits with blob callbacks Blob callbacks, either implicit (via e.g. --replace-text) or explicit, can modify blobs in ways that make them match other blobs, which in turn can result in some commits becoming empty. We need to detect such cases and ensure we prune these empty commits when --prune-empty=auto. Reported-by: John Gietzen <john@gietzen.us> Signed-off-by: Elijah Newren <newren@gmail.com>	2020-01-11 11:45:43 -08:00
Elijah Newren	3a3cd3d15e	git-filter-repo.txt: fix example of editing blob contents You can call bytes.replace() or re.sub(), but you can't call bytes.sub(). Oops. Fix the example in the documentation. Reported-by: John Gietzen <john@gietzen.us> Signed-off-by: Elijah Newren <newren@gmail.com>	2020-01-11 11:45:43 -08:00
Elijah Newren	8994b4e55d	filter-repo: fix bad column label in path-all-sizes.txt report Reported-by: John Gietzen <john@gietzen.us> Signed-off-by: Elijah Newren <newren@gmail.com>	2020-01-11 11:45:43 -08:00
Elijah Newren	5e04dff097	filter-repo: add new --no-ff option Some projects have a strict --no-ff merging policy. With the default behavior of --prune-degenerate, we can prune merge commits in a way that transforms the history into a fast-forward merge. Consider this example: * There are two independent commits or branches, named B & C, which are both built on top of A so that history look like this diagram: A \ \ \ B \ -C * Someone runs the following sequence of commands: * git checkout A * git merge --no-ff B * git merge --no-ff C * This will result in a history that looks like: A---AB---AC \ \ / / \ B / \ / -C- * Later, someone comes along and runs filter-repo, specifying to remove the only path(s) that were modified by B. That would naturally remove commit B and the no-longer-necessary merge commit AB. For someone using a strict no-ff policy, the desired history is A---AC \ / C However, the default handling for --prune-degenerate would notice that AC merely merges C into its own ancestor A, whereas the original AC merged C into something separate (namely, AB). So, it would say that AC has become degenerate and prune it, leaving the simple history of A \ C For projects not using a strict no-ff policy, this simpler history is probably better, but for folks that want a strict no-ff policy, it is unfortunate. Provide a --no-ff option to tweak the --prune-degenerate behavior so that it ignores the first parent being an ancestor of another parent (leaving the first parent unpruned even if it is or becomes degenerate in this fashion). Signed-off-by: Elijah Newren <newren@gmail.com>	2020-01-01 10:49:56 -08:00
Elijah Newren	41787ff365	Merge branch 'kl/mailmap-corner-case-and-misc-fixes' into master Signed-off-by: Elijah Newren <newren@gmail.com>	2019-12-27 07:47:07 -08:00
Karl Lenz	caf85b68ec	filter-repo: allow --dry-run and --debug to be used together Prior to this commit, git-filter-repo could only be used with either the --dry-run flag or the --debug flag, not both. When run in debug mode, git-filter-repo expected to be able to read from the output stream, which obviously isn't created when doing a dry run, so it stack traced when it tried to use the non-existent output stream. This commit fixes that bug with an equally simple sanity check for the existence of the output stream when run in debug mode. Signed-off-by: Karl Lenz <xorangekiller@gmail.com>	2019-12-27 09:29:49 -05:00
Karl Lenz	780c74b218	filter-repo: parse mailmap entries with no email address The mailmap format parsed by the "git shortlog" command allows for matching mailmap entries with no email address. This is admittedly an edge case, because most Git commits will have an email address associated with them as well as a name, but technically the address isn't required, and "git shortlog" accommodates that in its mailmap format. This commit teaches git-filter-repo to do the same thing. Signed-off-by: Karl Lenz <xorangekiller@gmail.com>	2019-12-27 09:25:25 -05:00
Karl Lenz	5c960b5a64	.gitignore: ignore the test result directories Signed-off-by: Karl Lenz <xorangekiller@gmail.com>	2019-12-27 09:25:25 -05:00
Elijah Newren	99432eb5ef	Merge branch 'as/update-gpl-address' into master Signed-off-by: Elijah Newren <newren@gmail.com>	2019-12-26 16:01:28 -08:00
Elijah Newren	7cfef09e9b	filter-repo: warn users who try to use invalid path components It's hard to be exhaustive, but if users try something like: --path-rename foo/bar/baz:. or --path ../other-dir then bad things happen. In the first case, filter-repo will try to ask fast-import to create a directory named '.' and move everything from foo/bar/baz/ into it but of course '.' is a reserved directory name so we can't create it. In the second case, they are probably running from a subdirectory, but filter-repo doesn't work from a subdirectory. I hard-coded the assumption that everything was in the toplevel directory and all paths were relative from there pretty early on. So, if the user tries to use any of these components anywhere, just throw an early error. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-12-26 15:54:47 -08:00
Elijah Newren	3bdfa91768	Contributing.md: clarify reasons for using git.git submission guidelines Signed-off-by: Elijah Newren <newren@gmail.com>	2019-12-26 15:54:47 -08:00
Elijah Newren	dab9386c47	contrib: update bfg-ish and filter-lamely with windows workaround In commit `f2729153` (filter-repo: workaround Windows' insistence that cwd not be a bytestring, 2019-10-19), filter-repo was made to use a special SubprocessWrapper class instead of the normal subprocess calls, due to what appears to be in bugs in the python implementation on Windows not working with arguments being bytestrings. Add the same workarounds to bfg-ish and filter-lamely. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-12-26 15:54:47 -08:00
Elijah Newren	f9ebe6a3f7	filter-repo: avoid clobbering files whose names differ in case only git fast-import, in an attempt to be friendly, allows the same file to be specified multiple times within a commit and just takes the last version of the file listed. It determines whether files are the same via fspathncmp, which is defined differently depending on the setting of core.ignorecase. Unfortunately, this means that if someone is trying to do filtering of history and using a broken (case-insensitive) filesystem and the history they are filtering has some paths that differed in name only, then fast-import will delete whichever of the "colliding" files is listed first. Avoid these problems by just turning off core.ignorecase while fast-import is running. This will prevent silently modifying the repo in an unexpected way. Users on such filesystems may have difficulty checking out commits with files which differ in case only, but that is a separate problem for them to deal with after rewriting history. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-12-26 13:17:55 -08:00
Elijah Newren	b1a35a3057	Merge branch 'jb/release-to-pypi' into master Signed-off-by: Elijah Newren <newren@gmail.com>	2019-12-26 12:47:50 -08:00
Elijah Newren	525ecc8f8e	release: tweak packaging scripts for uploading to PyPI Clean up the PyPI dist packages, remove unnecessary files, and streamline the release process: * Avoid adding extra unnecessary files to the repo; setup.py is code and can copy the necessary files into place. * Make sure README.md is included so we don't get an UNKNOWN Description field. * Add a long_description_content_type to avoid parsing errors on the README.md file and rejecting the upload. * Define the license and platform fields so they don't show up as UNKNOWN either. * Remove unnecessary pyproject.toml. This makes sense for most python projects, but since I already have a Makefile with installation rules (because I'm trying to be more compatible with git.git just in case we ever get merged into it), the pyproject.toml file is somewhat duplicative. Sure, the Makefile won't specify the exact versions needed but...meh. * Split the release target of the Makefile into github_release and pypi_release substeps, to allow them to be run semi-independently. Make the pypi_release run a few more steps for me. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-12-26 12:32:53 -08:00
Julian Berman	6f4fc07d53	release: add packaging scripts for uploading to PyPI Signed-off-by: Julian Berman <Julian@GrayVines.com> Signed-off-by: Elijah Newren <newren@gmail.com>	2019-12-26 12:07:26 -08:00
Elijah Newren	975419288b	Merge branch 'en/fix-empty-pruning-for-realz' into master Signed-off-by: Elijah Newren <newren@gmail.com>	2019-12-25 19:11:40 -08:00
Elijah Newren	a9a93d9d83	filter-repo: actually fix issue with pruning of empty commits In commit `509a624` (filter-repo: fix issue with pruning of empty commits, 2019-10-03), it was noted that when the first parent is pruned away, then we need to generate a corrected list of file changes relative to the new first parent. Unfortunately, we did not apply our set of file filters to that new list of file changes, causing us to possibly introduce many unwanted files from the second parent into the history. The testcase added at the time was rather lax and totally missed this problem (which possibly exacerbated the original bug being fixed rather than helping). Tighten the testcase, and fix the error by filtering the generated list of file changes. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-12-25 09:10:46 -08:00
Elijah Newren	2b32276ca3	filter-repo: move file filtering out of _tweak_commit() for re-use RepoFilter._tweak_commit() was a bit unwieldy, and we have a reason for wanting to re-use the file filtering logic in it, so break that out into a separate function. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-12-25 07:59:43 -08:00
Andreas Schneider	65103890d5	Update GPL license file The license file is outdated pointing to an incorrect FSF address. Signed-off-by: Andreas Schneider <asn@cryptomilk.org>	2019-12-22 18:33:01 +01:00
Elijah Newren	eec9b081ee	filter-repo: don't have analyze choke on typechange types The analyze mode will handle type changes (e.g. normal file to symlink) in combination with adds and modifies, but the similar logic below didn't allow for type changes in combination with renames. Fix the oversight. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-12-09 14:36:49 -08:00
Elijah Newren	b56ca0437a	Contributing.md: clarify notes about PEP-8 Signed-off-by: Elijah Newren <newren@gmail.com>	2019-11-25 10:38:46 -08:00
Elijah Newren	e1d126a1ea	Reference package managers in installation instructions Signed-off-by: Elijah Newren <newren@gmail.com>	2019-11-25 10:38:46 -08:00
Elijah Newren	0590c4193d	contrib: clarify a few points of usage Make it clearer that absolute paths should not be used for pathnames within a git repository. Also, fix the comment about how the insert-beginning script could be implemented as a one-liner; the example commented-out code should have used bytestrings. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-11-25 08:42:29 -08:00
Elijah Newren	117dd28883	Merge branch 'en/flesh-out-docs' into master The prerequisites and installation docs were not quite detailed enough, and no code of conduct or contribution guidelines were included. Flesh out the docs to cover these issues. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-11-22 13:33:04 -08:00
Elijah Newren	d07a2fe2ea	Contributing.md: mention testsuite line coverage Signed-off-by: Elijah Newren <newren@gmail.com>	2019-11-21 16:19:26 -08:00
Elijah Newren	64aa9359ed	run_coverage: prefer coverage3 to python3-coverage Some of the systems I ran on had a 'python3-coverage' and some had a 'coverage3' program. More were of the latter name, but more importantly, the upstream tarball only creates the latter name; apparently the former was just added by some distros. So, switch to the more official name of the program. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-11-21 16:19:26 -08:00
Elijah Newren	b3eb2cf461	filter-repo (README): add code of conduct and contributing guidelines Signed-off-by: Elijah Newren <newren@gmail.com>	2019-11-21 16:19:26 -08:00
Elijah Newren	1762b99573	Explain how to use a python3 executable not named "python3" Signed-off-by: Elijah Newren <newren@gmail.com>	2019-11-21 16:19:26 -08:00
Elijah Newren	5c35bb7a8d	filter-repo (README): add sections on prerequisites and installation Signed-off-by: Elijah Newren <newren@gmail.com>	2019-11-21 16:19:26 -08:00
Elijah Newren	1810051a58	Merge branch 'mh/generated-readme-typo-fix' into master Signed-off-by: Elijah Newren <newren@gmail.com>	2019-11-21 07:48:12 -08:00
Matthisk Heimensen	22cc153395	filter-repo: fix typo in generated analysis README Signed-off-by: Matthisk Heimensen <m@tthisk.nl>	2019-11-21 10:55:33 +01:00
Elijah Newren	33cf19376d	Merge branch 'bf/installation-fixes' into master Signed-off-by: Elijah Newren <newren@gmail.com>	2019-11-12 17:26:42 -08:00
Benoit Fouletier	2cbd4a46a7	Makefile: fix path installation issues - quote paths that may have spaces - force ln in case the file already exists Signed-off-by: Benoit Fouletier <bennews@free.fr>	2019-11-13 00:32:21 +01:00
Benoit Fouletier	ca2fd07dfa	Makefile: fix documentation installation - correct paths to including missing "Documentation/" prefix - use fully specified "origin/docs" branch in case the "docs" branch is not checked out locally Signed-off-by: Benoit Fouletier <bennews@free.fr>	2019-11-13 00:32:21 +01:00
Elijah Newren	8d8410e2b2	Makefile: use the right token environment variable Signed-off-by: Elijah Newren <newren@gmail.com>	2019-11-02 23:49:48 -07:00
Elijah Newren	84fddfe262	git-filter-repo.txt: fix typesetting of --partial Signed-off-by: Elijah Newren <newren@gmail.com>	2019-10-31 14:03:02 -07:00
Elijah Newren	ceb924ea8f	filter-repo (README): add link to predecessor project Signed-off-by: Elijah Newren <newren@gmail.com>	2019-10-31 14:03:02 -07:00
Elijah Newren	2fc5596455	filter-repo (README): add note about requiring a recent version of git Signed-off-by: Elijah Newren <newren@gmail.com>	2019-10-31 11:03:39 -07:00
Elijah Newren	904e03f963	filter-repo: workaround Windows' insistence that command args be strings It appears that in addition to Windows requiring cwd be a string (and not a bytestring), it also requires the command line arguments to be unicode strings. This appears to be a python-on-Windows issue at the surface (attempts to quote things that assumes the arguments are all strings), but whether it's solely a python-on-Windows issue or there is also a deeper Windows issue, we can workaround this brain-damage by extending the SubprocessWrapper slightly. As with the cwd changes, only apply this on Windows and not elsewhere because there are perfectly legitimate reasons to pass non-unicode parameters (e.g. filenames that are not valid unicode). Signed-off-by: Elijah Newren <newren@gmail.com>	2019-10-30 09:14:02 -07:00
Elijah Newren	f2729153fe	filter-repo: workaround Windows' insistence that cwd not be a bytestring Unfortunately, it appears that Windows does not allow the 'cwd' argument of various subprocess calls to be a bytestring. That may be functional on Windows since Windows-related filesystems are allowed to require that all file and directory names be valid unicode, but not all platforms enforce such restrictions. As such, I certainly cannot change cwd=directory to cwd=decode(directory) because that could break on other platforms (and perhaps even on Windows if someone is trying to read a non-native filesystem). Instead, create a SubprocessWrapper class that will always call decode on the cwd argument before passing along to the real subprocess class. Use these wrappers on Windows, and do not use them elsewhere. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-10-22 08:51:04 -07:00
Elijah Newren	da2a969157	Makefile: add a few new targets to streamline my release workflow Signed-off-by: Elijah Newren <newren@gmail.com>	2019-10-21 14:51:22 -07:00
Elijah Newren	d70b29a165	filter-repo: fix import sort order During the python3 transition, StringIO was renamed to io -- but the import wasn't moved to preserve appropriate sorting. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-10-21 09:09:44 -07:00
Elijah Newren	e333be7b17	filter-repo: consistently use bytestrings for directory names Signed-off-by: Elijah Newren <newren@gmail.com>	2019-10-21 09:09:23 -07:00
Elijah Newren	e6dd613e3f	filter-repo: add a --version option Note that this isn't a version number or even the more generalized version string that folks are used to seeing, but a version hash (or leading portion thereof). A few import points: * These version hashes are not strictly monotonically increasing values. Like I said, these aren't version numbers. If that bothers you, read on... * This scheme has incredibly nice semantics satisfying a pair of properties that most version schemes would assume are mutually incompatible: This scheme works even if the user doesn't have a clone of filter-repo and doesn't require any build step to inject the version into the program; it works even if people just download git-filter-repo.py off GitHub without any of the other sources. And: This scheme means that a user is running precisely version X of the code, with the version not easily faked or misrepresented when third parties edit the code. Given the wonderful semantics provided by satisfying this pair of properties that all other versioning schemes seem to miss out on, I think I should name this scheme. How about "Semantic Versioning"? (Hehe...) * The version hash is super easy to use; I just go to my own clone of filter-repo and run either: git show $VERSION_HASH or git describe $VERSION_HASH * A human consumable version might suggest to folks that this software is something they might frequently use and upgrade. This program should only be used in exceptional cases (because rewriting history is not for the faint of heart). * A human consumable version (i.e. a version number or even the more relaxed version strings in more common use) might suggest to folks that they can rely on strict backward compatibility. It's nice to subtly undercut any such assumption. * Despite all that, I will make releases (downloadable tarballs with real version numbers in the tarball name; I'm just going to re-use whatever version git is released with at the time). But those version numbers won't be used by the --version option; instead the version hash will. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-10-19 14:06:08 -07:00
Elijah Newren	1e21d6e2ec	Add installation instructions Try to make it a little more friendly for distros to package. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-10-17 18:59:23 -07:00
Elijah Newren	62c311c69f	filter-repo: fix an unmarked bytestring to be marked as such Signed-off-by: Elijah Newren <newren@gmail.com>	2019-10-17 18:56:37 -07:00
Elijah Newren	e0140bb2ad	git-filter-repo.txt: minor updates to docs A few changes: * Include notes about git-2.24.0 changes * Make it clearer that messing with the first parent could have negative side-effects if the file_changes aren't also updated. * Fix wrapping of a line that was too long. Also, update the README.md: * Note the upstream improvements made in (not yet released) git-2.24.0 Signed-off-by: Elijah Newren <newren@gmail.com>	2019-10-17 18:55:09 -07:00
Elijah Newren	320c85f941	filter-repo: improve support for partial history rewrites Partial history rewrites were possible before with the (previously hidden) --refs flag, but the defaults were wrong. That could be worked around with the --source or --target flags, but that disabled --no-data for fast-export and thus slowed things down, and also would require overridding --replace-refs. And the defaults for --source and --target may diverge further from what is wanted/needed for partial history rewrites in the future. So, add --partial as a first-class supported option with scary documentation about how it permits mixing new and old history. Make --refs imply that flag. Make the behavioral similarities (in regards to which steps are skipped) between --source, --target, and --partial more clear. Add relevant documentation to round it out. Signed-off-by: Elijah Newren <newren@gmail.com>	2019-10-17 18:55:09 -07:00

1 2 3 4 5 ...

442 Commits