Commit Graph

523 Commits (main)
 

Author SHA1 Message Date
Elijah Newren 4cfc765eb1 filter-repo: allow removing .git directories from history
Commit 7cfef09e9b (filter-repo: warn users who try to use invalid path
components, 2019-12-26) attempt to protect against using invalid path
components, but also added a check against a path that has sometimes
been valid in the past and which users might want to be able to remove
from their history.  Relax the check so that users can remove '.git'
directories in subdirectories (or even at the toplevel) from their
history.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren db9ac1fffe git-filter-repo.txt: add documentation of --no-ff option
Commit 5e04dff097 (filter-repo: add new --no-ff option, 2020-01-01)
added support for a --no-ff option, but only added documentation in the
built-in output, not in the intended-to-be-more-complete manual.  Add
documentation to the manual for this option.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 2833ef275f filter-repo: throw an error if user specifies any path starting with a slash
All paths are intended to be relative paths, relative to the project
root, not to the filesystem root.  There have been a few people who
didn't understand this, and then ended up with fast-import crashes that
are not very clear.  Check for it early and throw a simple error message
instead.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 764e0e00dd git-filter-repo.txt: add examples for --[to-]subdirectory-filter
I had lots of examples of these being horribly mis-used and being used in place
of each other; add some examples with some description of the repository layout
to try to avoid all that confusion.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren e834379254 filter-repo: clarify usage of --use-base-name
fast-export/fast-import only work with filenames (using full path from
the root of the repository); thus that's all that filter-repo works
with.  Full pathnames implicitly include all leading directories as part
of the pathname, which is what allows us to match against directories.
However, it obviously means --use-base-name can't be used to match paths
against directories.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 7c877cd750 filter-repo: make --version more robust against modified shebangs
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren e9c2d9adb5 filter-repo: ensure we write final newline after final progress update
We try to write 'Parsed %d commits' messages only after enough time has
past to avoid writing to stdout becoming a bottleneck.  However, there
was a slight logic error that would cause it to only print the final
newline if there was a new message since the last progress update,
leaving a small race condition where we might miss it.

Reported-by: Valentyn Shtronda (@valiko-ua)
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 011c646ee8 filter-repo: suggest --no-local when cloning local repos
Cloning local repos by default makes a bunch of hardlinks, giving you a
non-packed repository, and leading folks to use and suggest --force.
That, of course, bypasses the important fresh clone checks to prevent
people from accidentally and irrecoverably deleting their non-backed-up
data.  Let's make it easier for people to avoid (and suggest) that
mistake.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren c0c37a7656 filter-repo: fix bitrotted documentation links
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 427b265195 Merge branch 'mr/filter-lamely-and-special-filenames' into master
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Marius Renner 3427ee171b contrib: fix special character handling in filter-lamely
filter-lamely does not handle filenames with special characters (such as
äöü or even \n and \t) properly when using a tree filter or index
filter. It either does not quote the input to git correctly or parses
git output incorrectly, causing affected filenames to be mangled with
extraneous double quotes in the history or even crashing the program.

Make filter-lamely correctly handle such filenames by using
NUL-delimited input and output modes for the affected git commands.

Signed-off-by: Marius Renner <marius@mariusrenner.de>
4 years ago
Elijah Newren f164f2b2e6 Merge branch 'kf/fix-example-typo' into master
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Kate F 420aa32dac git-filter-repo.txt: Fix typo for example
Signed-off-by: Kate F <kate@elide.org>
4 years ago
Elijah Newren 3a394ca152 Makefile: a few sanity checks for releasing
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 9928b7cb3e t9390: add missing '&&' in command chain
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren e11343e504 filter-repo: handle typechange modifications when first parent is pruned
Commit 509a624b (filter-repo: fix issue with pruning of empty commits,
2019-10-03) added code to get a new list of file changes when the first
parent was pruned.  However, this logic did not handle cases where one
of the file modifications was a typechange.  Add the necessary logic to
handle that case.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 4f84a74ada filter-repo: use more expensive prunability checks when needed
When users are inserting new objects into the stream, we cannot make as
many assumptions and need to do more careful checks for whether commits
become empty or not.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren b1fae4819a filter-repo: relax the definition of freshly packed
transfer.unpackLimit defaults to 100, meaning that if less than 100
objects exist in the repository, git will automatically unpack the
objects to be loose as part of the clone operation.  So, if there are no
packs and less than 100 objects, consider the repo to be freshly packed
for purposes of our fresh clone sanity checks.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren fe33fc42b3 filter-repo: avoid dying with --analyze on commits with unseen parents
analyze_commit() calls add_commit_and_parents() which does a sanity
check that we have seen all parents previously.  --refs breaks that
assumption, so we need to workaround that check when ref limiting is in
effect.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 46549e7d3f lint-history: point people to issue with more linting examples
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 4c28ed6b8a Merge branch 'sb/setup-idempotency' into master
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Sirio Balmelli 9cf87ae036
setup.py: test for FileExistsError on symlink
Multiple runs of setuptools encounter a FileExistsError exception
trying to re-symlink the same files.

This exception is safe to ignore: the files were already symlinked
so the call can be considered successful.

Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
4 years ago
Elijah Newren b9c62540b7 filter-repo: fix cache of file renames
Users may have long lists of --path, --path-rename, --path-regex, etc.
flags (or even a --paths-from-file option with a lot of entries in the
file).  In such cases, we may have to compare any given path against a
lot of different values.  In order to avoid having to repeat that long
list of comparisons every time a given path is updated, we long ago
added a cache of the renames so that we can compute the new name for a
path once and then just reuse it each time a new commit updates the old
filepath.

Sadly, I flubbed the implementation and instead of setting
   cache[oldname] = newname
I somehow did the boneheaded
   cache[newname] = newname
For most repositories and rewrites, this would just have the effect of
making the cache useless, but it could wreak various kinds of havoc if
a newname matched the oldname of some other file.

Make sure we record the mapping from OLDNAME to newname to fix these
issues.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 85c8e3660d filter-repo: accelerate is_ancestor() for --analyze mode
The --analyze mode was extremely slow for the freebsd/freebsd repo on
github; digging in, the is_ancestor() function was being called a huge
number of times -- about 22 times per commit on average (and about 17
million times overall).  The analyze mode uses is_ancestor() to
determine whether a rename equivalency class should be broken (i.e.
renaming A->B mean all versions of A and B are just different versions
of the same file, but if someone adds a new A in some commit which
contains the A->B rename in its history then this equivalence class no
longer holds).  Each is_ancestor() call potentially has to walk a tree
of dependencies all the way back to a sufficient depth where it can
realize that the commit cannot be an ancestor; this can be a very long
walk.

We can speed this up by keeping track of some previous is_ancestor()
results.  If commit F is not an ancestor of commit G, then F cannot be
an ancestor of children of G (unless that child has multiple parents;
but even in that case F can only be an ancestor through one of the
parents other than G).  Similarly, if F is an ancestor of commit G, then
F will always be an ancestor of any children of G.  Cache results from
previous calls to is_ancestor() and use them to accelerate subsequent
calls.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren f2dccbc2ef filter-repo: avoid repeatedly translating the same string with --analyze
Translating "Processed %d blob sizes" or "Processed %d commits" hundreds
of thousands or millions of times is a waste and turns out to be pretty
expensive.  Translate it once, cache the string, and then re-use it.
Note that a similar issue was noted in commit 3999349be4 (filter-repo:
fix perf regression; avoid excessive translation, 2019-05-21), but I did
not think to check --analyze mode for similar issues back then.  Fix it
now.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 9d3d99593c lint-history: avoid dying when we get file deletions
When a file is deleted, there is nothing to lint, so we can just keep
the deletion as-is.

Reported-by: Thorben Kröger <dev@thorben.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 4ea19c0bf8 filter-repo (README): streamline prerequisite wording a little bit
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren bcd9964537 filter-repo (README): link to upstream docs
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 96e217355c Contributing.md: start with git guidelines, then mention exceptions
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 18f98295e4 git-filter-repo.txt: fix nested bullets to render correctly
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 1dae85ee9a filter-repo: permit trailing slash for --[to-]subdirectory-filter argument
There was code to allow the argument of --to-subdirectory-filter and
--subdirectory-filter to have a trailing slash, but it was broken due to
a bug in python3's bytestring design: b'somestring/'[-1] != b'/',
despite that being the obvious expectation.  One either has to compare
b'somestring/'[-1:] to b'/' or else compare b'somestring/'[-1] to
b'/'[0].  So lame.  Note that this is essentially a follow-up to commit
385b0586ca ("filter-repo (python3): bytestr splicing and iterating is
different", 2019-04-27).

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren a1d20f8e77 INSTALL: a few small tweaks and clarifications
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 9d51a90648 filter-repo: fix pruning of empty commits with blob callbacks
Blob callbacks, either implicit (via e.g. --replace-text) or explicit,
can modify blobs in ways that make them match other blobs, which in turn
can result in some commits becoming empty.  We need to detect such cases
and ensure we prune these empty commits when --prune-empty=auto.

Reported-by: John Gietzen <john@gietzen.us>
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 3a3cd3d15e git-filter-repo.txt: fix example of editing blob contents
You can call bytes.replace() or re.sub(), but you can't call
bytes.sub().  Oops.  Fix the example in the documentation.

Reported-by: John Gietzen <john@gietzen.us>
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 8994b4e55d filter-repo: fix bad column label in path-all-sizes.txt report
Reported-by: John Gietzen <john@gietzen.us>
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 5e04dff097 filter-repo: add new --no-ff option
Some projects have a strict --no-ff merging policy.  With the default
behavior of --prune-degenerate, we can prune merge commits in a way that
transforms the history into a fast-forward merge.  Consider this
example:
  * There are two independent commits or branches, named B & C, which
    are both built on top of A so that history look like this diagram:
        A
        \ \
         \ B
          \
           -C
  * Someone runs the following sequence of commands:
    * git checkout A
    * git merge --no-ff B
    * git merge --no-ff C
  * This will result in a history that looks like:
        A---AB---AC
        \ \ /   /
         \ B   /
          \   /
           -C-
  * Later, someone comes along and runs filter-repo, specifying to
    remove the only path(s) that were modified by B.  That would
    naturally remove commit B and the no-longer-necessary merge
    commit AB.  For someone using a strict no-ff policy, the desired
    history is
        A---AC
         \ /
          C
    However, the default handling for --prune-degenerate would
    notice that AC merely merges C into its own ancestor A, whereas
    the original AC merged C into something separate (namely, AB).
    So, it would say that AC has become degenerate and prune it,
    leaving the simple history of
        A
         \
          C
    For projects not using a strict no-ff policy, this simpler history
    is probably better, but for folks that want a strict no-ff policy,
    it is unfortunate.

Provide a --no-ff option to tweak the --prune-degenerate behavior so
that it ignores the first parent being an ancestor of another parent
(leaving the first parent unpruned even if it is or becomes degenerate
in this fashion).

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 41787ff365 Merge branch 'kl/mailmap-corner-case-and-misc-fixes' into master
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Karl Lenz caf85b68ec filter-repo: allow --dry-run and --debug to be used together
Prior to this commit, git-filter-repo could only be used with either the
--dry-run flag or the --debug flag, not both. When run in debug mode,
git-filter-repo expected to be able to read from the output stream,
which obviously isn't created when doing a dry run, so it stack traced
when it tried to use the non-existent output stream. This commit fixes
that bug with an equally simple sanity check for the existence of the
output stream when run in debug mode.

Signed-off-by: Karl Lenz <xorangekiller@gmail.com>
4 years ago
Karl Lenz 780c74b218 filter-repo: parse mailmap entries with no email address
The mailmap format parsed by the "git shortlog" command allows for
matching mailmap entries with no email address. This is admittedly an
edge case, because most Git commits will have an email address
associated with them as well as a name, but technically the address
isn't required, and "git shortlog" accommodates that in its mailmap
format. This commit teaches git-filter-repo to do the same thing.

Signed-off-by: Karl Lenz <xorangekiller@gmail.com>
4 years ago
Karl Lenz 5c960b5a64 .gitignore: ignore the test result directories
Signed-off-by: Karl Lenz <xorangekiller@gmail.com>
4 years ago
Elijah Newren 99432eb5ef Merge branch 'as/update-gpl-address' into master
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 7cfef09e9b filter-repo: warn users who try to use invalid path components
It's hard to be exhaustive, but if users try something like:
   --path-rename foo/bar/baz:.
or
   --path ../other-dir
then bad things happen.  In the first case, filter-repo will try to
ask fast-import to create a directory named '.' and move everything
from foo/bar/baz/ into it but of course '.' is a reserved directory
name so we can't create it.  In the second case, they are probably
running from a subdirectory, but filter-repo doesn't work from a
subdirectory.  I hard-coded the assumption that everything was in the
toplevel directory and all paths were relative from there pretty
early on.  So, if the user tries to use any of these components
anywhere, just throw an early error.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 3bdfa91768 Contributing.md: clarify reasons for using git.git submission guidelines
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren dab9386c47 contrib: update bfg-ish and filter-lamely with windows workaround
In commit f2729153 (filter-repo: workaround Windows' insistence that cwd
not be a bytestring, 2019-10-19), filter-repo was made to use a special
SubprocessWrapper class instead of the normal subprocess calls, due to
what appears to be in bugs in the python implementation on Windows not
working with arguments being bytestrings.  Add the same workarounds to
bfg-ish and filter-lamely.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren f9ebe6a3f7 filter-repo: avoid clobbering files whose names differ in case only
git fast-import, in an attempt to be friendly, allows the same file to
be specified multiple times within a commit and just takes the last
version of the file listed.  It determines whether files are the same
via fspathncmp, which is defined differently depending on the setting of
core.ignorecase.  Unfortunately, this means that if someone is trying to
do filtering of history and using a broken (case-insensitive) filesystem
and the history they are filtering has some paths that differed in name
only, then fast-import will delete whichever of the "colliding" files is
listed first.

Avoid these problems by just turning off core.ignorecase while
fast-import is running.  This will prevent silently modifying the repo
in an unexpected way.  Users on such filesystems may have difficulty
checking out commits with files which differ in case only, but that is
a separate problem for them to deal with after rewriting history.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren b1a35a3057 Merge branch 'jb/release-to-pypi' into master
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 525ecc8f8e release: tweak packaging scripts for uploading to PyPI
Clean up the PyPI dist packages, remove unnecessary files, and
streamline the release process:
  * Avoid adding extra unnecessary files to the repo; setup.py is code
    and can copy the necessary files into place.
  * Make sure README.md is included so we don't get an UNKNOWN
    Description field.
  * Add a long_description_content_type to avoid parsing errors on the
    README.md file and rejecting the upload.
  * Define the license and platform fields so they don't show up as
    UNKNOWN either.
  * Remove unnecessary pyproject.toml.  This makes sense for most python
    projects, but since I already have a Makefile with installation
    rules (because I'm trying to be more compatible with git.git just in
    case we ever get merged into it), the pyproject.toml file is
    somewhat duplicative.  Sure, the Makefile won't specify the exact
    versions needed but...meh.
  * Split the release target of the Makefile into github_release and
    pypi_release substeps, to allow them to be run semi-independently.
    Make the pypi_release run a few more steps for me.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Julian Berman 6f4fc07d53 release: add packaging scripts for uploading to PyPI
Signed-off-by: Julian Berman <Julian@GrayVines.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 975419288b Merge branch 'en/fix-empty-pruning-for-realz' into master
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren a9a93d9d83 filter-repo: actually fix issue with pruning of empty commits
In commit 509a624 (filter-repo: fix issue with pruning of empty commits,
2019-10-03), it was noted that when the first parent is pruned away,
then we need to generate a corrected list of file changes relative to
the new first parent.  Unfortunately, we did not apply our set of file
filters to that new list of file changes, causing us to possibly
introduce many unwanted files from the second parent into the history.
The testcase added at the time was rather lax and totally missed this
problem (which possibly exacerbated the original bug being fixed rather
than helping).  Tighten the testcase, and fix the error by filtering the
generated list of file changes.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago