Commit Graph

442 Commits

Author SHA1 Message Date
Elijah Newren
a1d20f8e77 INSTALL: a few small tweaks and clarifications
Signed-off-by: Elijah Newren <newren@gmail.com>
2020-01-11 12:14:01 -08:00
Elijah Newren
9d51a90648 filter-repo: fix pruning of empty commits with blob callbacks
Blob callbacks, either implicit (via e.g. --replace-text) or explicit,
can modify blobs in ways that make them match other blobs, which in turn
can result in some commits becoming empty.  We need to detect such cases
and ensure we prune these empty commits when --prune-empty=auto.

Reported-by: John Gietzen <john@gietzen.us>
Signed-off-by: Elijah Newren <newren@gmail.com>
2020-01-11 11:45:43 -08:00
Elijah Newren
3a3cd3d15e git-filter-repo.txt: fix example of editing blob contents
You can call bytes.replace() or re.sub(), but you can't call
bytes.sub().  Oops.  Fix the example in the documentation.

Reported-by: John Gietzen <john@gietzen.us>
Signed-off-by: Elijah Newren <newren@gmail.com>
2020-01-11 11:45:43 -08:00
Elijah Newren
8994b4e55d filter-repo: fix bad column label in path-all-sizes.txt report
Reported-by: John Gietzen <john@gietzen.us>
Signed-off-by: Elijah Newren <newren@gmail.com>
2020-01-11 11:45:43 -08:00
Elijah Newren
5e04dff097 filter-repo: add new --no-ff option
Some projects have a strict --no-ff merging policy.  With the default
behavior of --prune-degenerate, we can prune merge commits in a way that
transforms the history into a fast-forward merge.  Consider this
example:
  * There are two independent commits or branches, named B & C, which
    are both built on top of A so that history look like this diagram:
        A
        \ \
         \ B
          \
           -C
  * Someone runs the following sequence of commands:
    * git checkout A
    * git merge --no-ff B
    * git merge --no-ff C
  * This will result in a history that looks like:
        A---AB---AC
        \ \ /   /
         \ B   /
          \   /
           -C-
  * Later, someone comes along and runs filter-repo, specifying to
    remove the only path(s) that were modified by B.  That would
    naturally remove commit B and the no-longer-necessary merge
    commit AB.  For someone using a strict no-ff policy, the desired
    history is
        A---AC
         \ /
          C
    However, the default handling for --prune-degenerate would
    notice that AC merely merges C into its own ancestor A, whereas
    the original AC merged C into something separate (namely, AB).
    So, it would say that AC has become degenerate and prune it,
    leaving the simple history of
        A
         \
          C
    For projects not using a strict no-ff policy, this simpler history
    is probably better, but for folks that want a strict no-ff policy,
    it is unfortunate.

Provide a --no-ff option to tweak the --prune-degenerate behavior so
that it ignores the first parent being an ancestor of another parent
(leaving the first parent unpruned even if it is or becomes degenerate
in this fashion).

Signed-off-by: Elijah Newren <newren@gmail.com>
2020-01-01 10:49:56 -08:00
Elijah Newren
41787ff365 Merge branch 'kl/mailmap-corner-case-and-misc-fixes' into master
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-12-27 07:47:07 -08:00
Karl Lenz
caf85b68ec filter-repo: allow --dry-run and --debug to be used together
Prior to this commit, git-filter-repo could only be used with either the
--dry-run flag or the --debug flag, not both. When run in debug mode,
git-filter-repo expected to be able to read from the output stream,
which obviously isn't created when doing a dry run, so it stack traced
when it tried to use the non-existent output stream. This commit fixes
that bug with an equally simple sanity check for the existence of the
output stream when run in debug mode.

Signed-off-by: Karl Lenz <xorangekiller@gmail.com>
2019-12-27 09:29:49 -05:00
Karl Lenz
780c74b218 filter-repo: parse mailmap entries with no email address
The mailmap format parsed by the "git shortlog" command allows for
matching mailmap entries with no email address. This is admittedly an
edge case, because most Git commits will have an email address
associated with them as well as a name, but technically the address
isn't required, and "git shortlog" accommodates that in its mailmap
format. This commit teaches git-filter-repo to do the same thing.

Signed-off-by: Karl Lenz <xorangekiller@gmail.com>
2019-12-27 09:25:25 -05:00
Karl Lenz
5c960b5a64 .gitignore: ignore the test result directories
Signed-off-by: Karl Lenz <xorangekiller@gmail.com>
2019-12-27 09:25:25 -05:00
Elijah Newren
99432eb5ef Merge branch 'as/update-gpl-address' into master
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-12-26 16:01:28 -08:00
Elijah Newren
7cfef09e9b filter-repo: warn users who try to use invalid path components
It's hard to be exhaustive, but if users try something like:
   --path-rename foo/bar/baz:.
or
   --path ../other-dir
then bad things happen.  In the first case, filter-repo will try to
ask fast-import to create a directory named '.' and move everything
from foo/bar/baz/ into it but of course '.' is a reserved directory
name so we can't create it.  In the second case, they are probably
running from a subdirectory, but filter-repo doesn't work from a
subdirectory.  I hard-coded the assumption that everything was in the
toplevel directory and all paths were relative from there pretty
early on.  So, if the user tries to use any of these components
anywhere, just throw an early error.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-12-26 15:54:47 -08:00
Elijah Newren
3bdfa91768 Contributing.md: clarify reasons for using git.git submission guidelines
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-12-26 15:54:47 -08:00
Elijah Newren
dab9386c47 contrib: update bfg-ish and filter-lamely with windows workaround
In commit f2729153 (filter-repo: workaround Windows' insistence that cwd
not be a bytestring, 2019-10-19), filter-repo was made to use a special
SubprocessWrapper class instead of the normal subprocess calls, due to
what appears to be in bugs in the python implementation on Windows not
working with arguments being bytestrings.  Add the same workarounds to
bfg-ish and filter-lamely.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-12-26 15:54:47 -08:00
Elijah Newren
f9ebe6a3f7 filter-repo: avoid clobbering files whose names differ in case only
git fast-import, in an attempt to be friendly, allows the same file to
be specified multiple times within a commit and just takes the last
version of the file listed.  It determines whether files are the same
via fspathncmp, which is defined differently depending on the setting of
core.ignorecase.  Unfortunately, this means that if someone is trying to
do filtering of history and using a broken (case-insensitive) filesystem
and the history they are filtering has some paths that differed in name
only, then fast-import will delete whichever of the "colliding" files is
listed first.

Avoid these problems by just turning off core.ignorecase while
fast-import is running.  This will prevent silently modifying the repo
in an unexpected way.  Users on such filesystems may have difficulty
checking out commits with files which differ in case only, but that is
a separate problem for them to deal with after rewriting history.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-12-26 13:17:55 -08:00
Elijah Newren
b1a35a3057 Merge branch 'jb/release-to-pypi' into master
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-12-26 12:47:50 -08:00
Elijah Newren
525ecc8f8e release: tweak packaging scripts for uploading to PyPI
Clean up the PyPI dist packages, remove unnecessary files, and
streamline the release process:
  * Avoid adding extra unnecessary files to the repo; setup.py is code
    and can copy the necessary files into place.
  * Make sure README.md is included so we don't get an UNKNOWN
    Description field.
  * Add a long_description_content_type to avoid parsing errors on the
    README.md file and rejecting the upload.
  * Define the license and platform fields so they don't show up as
    UNKNOWN either.
  * Remove unnecessary pyproject.toml.  This makes sense for most python
    projects, but since I already have a Makefile with installation
    rules (because I'm trying to be more compatible with git.git just in
    case we ever get merged into it), the pyproject.toml file is
    somewhat duplicative.  Sure, the Makefile won't specify the exact
    versions needed but...meh.
  * Split the release target of the Makefile into github_release and
    pypi_release substeps, to allow them to be run semi-independently.
    Make the pypi_release run a few more steps for me.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-12-26 12:32:53 -08:00
Julian Berman
6f4fc07d53 release: add packaging scripts for uploading to PyPI
Signed-off-by: Julian Berman <Julian@GrayVines.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-12-26 12:07:26 -08:00
Elijah Newren
975419288b Merge branch 'en/fix-empty-pruning-for-realz' into master
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-12-25 19:11:40 -08:00
Elijah Newren
a9a93d9d83 filter-repo: actually fix issue with pruning of empty commits
In commit 509a624 (filter-repo: fix issue with pruning of empty commits,
2019-10-03), it was noted that when the first parent is pruned away,
then we need to generate a corrected list of file changes relative to
the new first parent.  Unfortunately, we did not apply our set of file
filters to that new list of file changes, causing us to possibly
introduce many unwanted files from the second parent into the history.
The testcase added at the time was rather lax and totally missed this
problem (which possibly exacerbated the original bug being fixed rather
than helping).  Tighten the testcase, and fix the error by filtering the
generated list of file changes.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-12-25 09:10:46 -08:00
Elijah Newren
2b32276ca3 filter-repo: move file filtering out of _tweak_commit() for re-use
RepoFilter._tweak_commit() was a bit unwieldy, and we have a reason for
wanting to re-use the file filtering logic in it, so break that out into
a separate function.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-12-25 07:59:43 -08:00
Andreas Schneider
65103890d5 Update GPL license file
The license file is outdated pointing to an incorrect FSF address.

Signed-off-by: Andreas Schneider <asn@cryptomilk.org>
2019-12-22 18:33:01 +01:00
Elijah Newren
eec9b081ee filter-repo: don't have analyze choke on typechange types
The analyze mode will handle type changes (e.g. normal file to symlink)
in combination with adds and modifies, but the similar logic below
didn't allow for type changes in combination with renames.  Fix the
oversight.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-12-09 14:36:49 -08:00
Elijah Newren
b56ca0437a Contributing.md: clarify notes about PEP-8
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-11-25 10:38:46 -08:00
Elijah Newren
e1d126a1ea Reference package managers in installation instructions
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-11-25 10:38:46 -08:00
Elijah Newren
0590c4193d contrib: clarify a few points of usage
Make it clearer that absolute paths should not be used for pathnames
within a git repository.  Also, fix the comment about how the
insert-beginning script could be implemented as a one-liner; the
example commented-out code should have used bytestrings.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-11-25 08:42:29 -08:00
Elijah Newren
117dd28883 Merge branch 'en/flesh-out-docs' into master
The prerequisites and installation docs were not quite detailed enough,
and no code of conduct or contribution guidelines were included.  Flesh
out the docs to cover these issues.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-11-22 13:33:04 -08:00
Elijah Newren
d07a2fe2ea Contributing.md: mention testsuite line coverage
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-11-21 16:19:26 -08:00
Elijah Newren
64aa9359ed run_coverage: prefer coverage3 to python3-coverage
Some of the systems I ran on had a 'python3-coverage' and some had a
'coverage3' program.  More were of the latter name, but more
importantly, the upstream tarball only creates the latter name;
apparently the former was just added by some distros.  So, switch to the
more official name of the program.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-11-21 16:19:26 -08:00
Elijah Newren
b3eb2cf461 filter-repo (README): add code of conduct and contributing guidelines
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-11-21 16:19:26 -08:00
Elijah Newren
1762b99573 Explain how to use a python3 executable not named "python3"
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-11-21 16:19:26 -08:00
Elijah Newren
5c35bb7a8d filter-repo (README): add sections on prerequisites and installation
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-11-21 16:19:26 -08:00
Elijah Newren
1810051a58 Merge branch 'mh/generated-readme-typo-fix' into master
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-11-21 07:48:12 -08:00
Matthisk Heimensen
22cc153395 filter-repo: fix typo in generated analysis README
Signed-off-by: Matthisk Heimensen <m@tthisk.nl>
2019-11-21 10:55:33 +01:00
Elijah Newren
33cf19376d Merge branch 'bf/installation-fixes' into master
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-11-12 17:26:42 -08:00
Benoit Fouletier
2cbd4a46a7 Makefile: fix path installation issues
- quote paths that may have spaces
- force ln in case the file already exists

Signed-off-by: Benoit Fouletier <bennews@free.fr>
2019-11-13 00:32:21 +01:00
Benoit Fouletier
ca2fd07dfa Makefile: fix documentation installation
- correct paths to including missing "Documentation/" prefix
- use fully specified "origin/docs" branch in case the "docs" branch is
not checked out locally

Signed-off-by: Benoit Fouletier <bennews@free.fr>
2019-11-13 00:32:21 +01:00
Elijah Newren
8d8410e2b2 Makefile: use the right token environment variable
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-11-02 23:49:48 -07:00
Elijah Newren
84fddfe262 git-filter-repo.txt: fix typesetting of --partial
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-10-31 14:03:02 -07:00
Elijah Newren
ceb924ea8f filter-repo (README): add link to predecessor project
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-10-31 14:03:02 -07:00
Elijah Newren
2fc5596455 filter-repo (README): add note about requiring a recent version of git
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-10-31 11:03:39 -07:00
Elijah Newren
904e03f963 filter-repo: workaround Windows' insistence that command args be strings
It appears that in addition to Windows requiring cwd be a string (and
not a bytestring), it also requires the command line arguments to be
unicode strings.  This appears to be a python-on-Windows issue at the
surface (attempts to quote things that assumes the arguments are all
strings), but whether it's solely a python-on-Windows issue or there is
also a deeper Windows issue, we can workaround this brain-damage by
extending the SubprocessWrapper slightly.  As with the cwd changes, only
apply this on Windows and not elsewhere because there are perfectly
legitimate reasons to pass non-unicode parameters (e.g. filenames that
are not valid unicode).

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-10-30 09:14:02 -07:00
Elijah Newren
f2729153fe filter-repo: workaround Windows' insistence that cwd not be a bytestring
Unfortunately, it appears that Windows does not allow the 'cwd' argument
of various subprocess calls to be a bytestring.  That may be functional
on Windows since Windows-related filesystems are allowed to require that
all file and directory names be valid unicode, but not all platforms
enforce such restrictions.  As such, I certainly cannot change
   cwd=directory
to
   cwd=decode(directory)
because that could break on other platforms (and perhaps even on Windows
if someone is trying to read a non-native filesystem).  Instead, create
a SubprocessWrapper class that will always call decode on the cwd
argument before passing along to the real subprocess class.  Use these
wrappers on Windows, and do not use them elsewhere.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-10-22 08:51:04 -07:00
Elijah Newren
da2a969157 Makefile: add a few new targets to streamline my release workflow
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-10-21 14:51:22 -07:00
Elijah Newren
d70b29a165 filter-repo: fix import sort order
During the python3 transition, StringIO was renamed to io -- but the
import wasn't moved to preserve appropriate sorting.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-10-21 09:09:44 -07:00
Elijah Newren
e333be7b17 filter-repo: consistently use bytestrings for directory names
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-10-21 09:09:23 -07:00
Elijah Newren
e6dd613e3f filter-repo: add a --version option
Note that this isn't a version *number* or even the more generalized
version string that folks are used to seeing, but a version hash (or
leading portion thereof).

A few import points:

  * These version hashes are not strictly monotonically increasing
    values.  Like I said, these aren't version numbers.  If that
    bothers you, read on...

  * This scheme has incredibly nice semantics satisfying a pair of
    properties that most version schemes would assume are mutually
    incompatible:
       This scheme works even if the user doesn't have a clone of
       filter-repo and doesn't require any build step to inject the
       version into the program; it works even if people just download
       git-filter-repo.py off GitHub without any of the other sources.
    And:
       This scheme means that a user is running precisely version X of
       the code, with the version not easily faked or misrepresented
       when third parties edit the code.
    Given the wonderful semantics provided by satisfying this pair of
    properties that all other versioning schemes seem to miss out on, I
    think I should name this scheme.  How about "Semantic Versioning"?
    (Hehe...)

  * The version hash is super easy to use; I just go to my own clone of
    filter-repo and run either:
        git show $VERSION_HASH
    or
        git describe $VERSION_HASH

  * A human consumable version might suggest to folks that this software
    is something they might frequently use and upgrade.  This program
    should only be used in exceptional cases (because rewriting history
    is not for the faint of heart).

  * A human consumable version (i.e. a version number or even the
    more relaxed version strings in more common use) might suggest to
    folks that they can rely on strict backward compatibility.  It's
    nice to subtly undercut any such assumption.

  * Despite all that, I will make releases (downloadable tarballs with
    real version numbers in the tarball name; I'm just going to re-use
    whatever version git is released with at the time).  But those
    version numbers won't be used by the --version option; instead the
    version hash will.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-10-19 14:06:08 -07:00
Elijah Newren
1e21d6e2ec Add installation instructions
Try to make it a little more friendly for distros to package.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-10-17 18:59:23 -07:00
Elijah Newren
62c311c69f filter-repo: fix an unmarked bytestring to be marked as such
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-10-17 18:56:37 -07:00
Elijah Newren
e0140bb2ad git-filter-repo.txt: minor updates to docs
A few changes:
  * Include notes about git-2.24.0 changes
  * Make it clearer that messing with the first parent could have
    negative side-effects if the file_changes aren't also updated.
  * Fix wrapping of a line that was too long.

Also, update the README.md:
  * Note the upstream improvements made in (not yet released) git-2.24.0

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-10-17 18:55:09 -07:00
Elijah Newren
320c85f941 filter-repo: improve support for partial history rewrites
Partial history rewrites were possible before with the (previously
hidden) --refs flag, but the defaults were wrong.  That could be worked
around with the --source or --target flags, but that disabled --no-data
for fast-export and thus slowed things down, and also would require
overridding --replace-refs.  And the defaults for --source and --target
may diverge further from what is wanted/needed for partial history
rewrites in the future.

So, add --partial as a first-class supported option with scary
documentation about how it permits mixing new and old history.  Make
--refs imply that flag.  Make the behavioral similarities (in regards to
which steps are skipped) between --source, --target, and --partial more
clear.  Add relevant documentation to round it out.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-10-17 18:55:09 -07:00