Not all setups have `dos2unix`. Most notably, the Ubuntu and macOS
agents of GitHub Actions don't.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The test case t9391.12 specifically wants to test LF vs CR/LF line
ending issues, expecting `core.autoCRLF` to default to `false`. This is
true on Linux and macOS and pretty much everywhere else, except on
Windows.
Let's make sure that the test operates with the `core.autoCRLF` value it
assumes to operate under.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Windows, there is no absolute path `/fake/path`, but MSYS2 (which Git
for Windows uses e.g. for running Bash scripts) pretends that it exists.
This only works within MSYS2 applications, of course, so... when MSYS2
sees that we hand a parameter to a non-MSYS2 application in a shell
script, it helpfully converts it to the full path (prepending MSYS2's
pseudo root directory).
Let's work around that by using a Win32-compatible path to begin with:
`$(pwd)` produces that on Windows. On other platforms, it still works.
As a bonus, this safe-guards our test against a setup where `/fake/path`
_actually exists_. Stranger things have been seen in the wild, after
all.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
MSYS2 tries to be very helpful, and in most cases it even works, by
converting parameters passed from inside an MSYS2 Bash to a non-MSYS2
application (such as `git.exe`) if they look like Unix-style paths or
path lists.
Sometimes, however, this automatic path conversion is unhelpful, e.g.
when passing the parameter `foo:.` to Git, which MSYS2 will readily
convert to a Windows-style path list: `foo;bar` (i.e. using a semicolon
instead of a colon).
Happily, there is a way to avoid that: the `MSYS_NO_PATHCONV` variable.
Let's use it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
While it is true that `colrm` is available on macOS by default, and even
in Ubuntu (thanks to the `bsdmainutils` package), it is not available on
Windows.
Let's use `cut` instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The problem with this is that on Windows, we use the MSYS2 Bash which
uses the POSIX emulation layer called "MSYS2 runtime" that pretends that
there _is_ something like the `/dev/fd/` namespace, and tells `git.exe`
about it, but `git.exe` does not use the POSIX emulation layer, and
hence has no idea what Bash is talking about.
Besides, we should avoid pipes, just as we do in the Git project.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In that test case, we expect the line count to be 5, but it is actually
6 lines that we should expect:
numbers/medium.num
numbers/small.num
sequence/know
whatever
words/know
Note the empty line at the top: this list is generated via `git log
--format=%n`, and that `%n` stands for "newline", meaning that we _must_
expect an empty line.
This expectation seems to have been broken already in the commit that
added the test case: b6a35f8 (filter-repo: implement
--strip-blobs-with-ids, 2019-05-30). It was hidden for such a long time
by a broken &&-chain, which we will fix next.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Some commits may have a valid author email, but no valid author name.
Old versions of git didn't enforce a non-empty name.
Setting the author data from the committer is wrong in this case.
Also add a test case for this to t9390.
Example: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c6295cdf656de63d6d1123def71daba6cd91939c
(en: replaced with a dedicated test instead of tweaking existing ones)
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
When the user specifies some kind of criteria to filter commits by (e.g.
--subdirectory-filter mysubdir), we rewrite parents commits that are
entirely filtered out to the most recent ancestor that still exists, or
just prune the parent if there isn't one. That works great when the
parent is a commit, but nested tags have parents that are tags. If we
only prune the first tag (i.e. the tag of a commit), then letting any
tags through that had that tag as a parent will result in a fast-import
crash with a message of the form
fatal: mark :35390 not declared
Ensure that when a tag gets pruned, the pruning is recorded as such...so
that any children tags will get pruned as well.
Signed-off-by: Elijah Newren <newren@gmail.com>
In repositories with annotated tags filter-lamely crashes with the
message: "NameError: name 'Reset' is not defined".
This is because of a missing "fr" module prefix in the code, which this
commit adds.
Signed-off-by: Marius Renner <marius@mariusrenner.de>
When filtering with --refs, parents can be a hash rather than an
integer. There was a code path in RepoFilter._prunable() that was
written assuming the first parent would always be an integer; fix it to
handle a hash as well.
Reported-by: Niklas Hambüchen <mail@nh2.me>
Signed-off-by: Elijah Newren <newren@gmail.com>
fast-import gained a new raw-permissive date format explictly for
allowing people to import repositories as-is. Make use of the flag, and
stop rewriting the bogus timezone found in rails.git.
If users do not like these bogus times, they can of course write a
filter to fix them (or even make them bogus in a different way). For
example:
git filter-repo ... --commit-callback '
if commit.author_date.endswith(b"+051800"):
commit.author_date.replace(b"+051800", b"+0261")
'
Signed-off-by: Elijah Newren <newren@gmail.com>
"no-op" might suggest that it doesn't do anything, when in reality it
does exactly what filter-repo does. Rename it to barebones-example.
Signed-off-by: Elijah Newren <newren@gmail.com>
Homebrew and scoop are both package managers and package repositories.
Fedora 32 is not a package manager, but does map to a package
repository. Clarify wording that the list from repology.org is a list
of package repositories, not package managers.
Signed-off-by: Elijah Newren <newren@gmail.com>
Useful commit and reference mappings are created on every run. These are
helpful in a number of situations, and should be documented so that
end-users and Git hosts can understand how to use the output.
The commit-map is particularly useful for Git hosts to override
retention mechanisms, like hidden refs. This allows end-users to purge
large files and sensitive data.
Signed-off-by: James Ramsay <james@jramsay.com.au>
Apparently, despite the fact that *overwrite* *repo* *history* are three
important words that each individually convey a lot of important
meaning, people ignore it and instinctively add --force. Insert the
word "destructively" to get people to pause.
Further, change the end of the warning not to how to get around the
warning with the current repository, but instead with a suggestion that
they should instead be operating on a fresh clone and only then make a
side comment that the --force flag can be used to override.
Signed-off-by: Elijah Newren <newren@gmail.com>
Online blogs/articles/Q&A as well as direct feedback suggests that
people use the --force flag rather cavalierly. Add words like
"irreversible" and "immediate pruning" to discourage such blithe
application of this flag. I hope this encourages folks to either learn
the ramifications of irreversible full-repository entire history
rewrites first, or to follow the recommendation of only operating on a
fresh clone.
Signed-off-by: Elijah Newren <newren@gmail.com>
People keep trying to read this file, unaware that it is the source code
for generating the documentation, not the generated documentation. Add
a comment at the top that explains this and points people in the right
direction.
Signed-off-by: Elijah Newren <newren@gmail.com>
Allow lines starting with '#' to be treated as a comment and be ignored.
Update the documentation to note that both blank lines and comment lines
are ignored, and mention how filenames starting with '#' can be matched
(namely, the same way that filenames startwith with 'regex:', 'glob:',
or 'literal:' can be -- by prefixing the filename with 'literal:').
Signed-off-by: Elijah Newren <newren@gmail.com>
I added special code to filter-repo so that --path expressions could
match filenames or some leading directory name. --path-regex, since it
does not implicitly add anchorings, can also match a leading path, and
can thus be used to match against directories. --path-glob could not be
used to match a leading directory of a path, since fnmatch.fnmatch()
requires the full string to match. But users like being able to specify
directory names, such as '*/bin', so let's take any glob expression and
treat it as two: '<glob>' and '<glob>/*' and try to match against either
one; this will allow it to match against file or directory names like
the other two types of path matching.
Signed-off-by: Elijah Newren <newren@gmail.com>
Make use of `git --man-path` and `git --html-path` to simplify the
manual installation instructions a bit. Also, there appears to be a
site.getsitepackages() call in python to give similar information about
where git_filter_repo.py can be installed.
Signed-off-by: Elijah Newren <newren@gmail.com>
Flags like --local, --shared, --reference (and --dissociate), and
--origin would all mess up the fresh clone checker. Attempting to
defend against all of them would not only be costly, but make it harder
to draw the line about guesses as to whether a repository is a fresh
clone or not. --origin also has problems in that filter-repo has
special handling for the 'origin' remote that I don't want to apply to
other random remotes.
Flags like --depth, --single-branch, and --no-tags could prevent enough
data from being downloaded to do a full rewrite and result in a
partially rewritten or possibly even corrupt history (no idea how
shallow clones interact; probably badly). --filter would also make the
repo start without enough info though it'd at least be downloaded on
demand; it'd still be a really slow way to do it, though, so it's a bad
idea.
filter-repo doesn't really provide an easy mechanism to rewrite a repo
and its submodule simultaneously, so recursing submodules seems useless
and unhelpful. --shallow-submodules would be bad for at least the same
reasons --depth is for the parent module, assuming we handled
submodules. --remote-submodules just provides a way to make the repo
dirty to start, which is counter-productive. --jobs could be useful, if
recursing submodules was.
--no-checkout might be safe to use and --sparse might also be okay for
as long as it only affects the working tree, but in both cases why not
go --bare or --mirror if you're doing that? Likewise, --no-hardlinks is
useless given that we're already saying people need to use --no-local.
-b would be okay to use, but why wouldn't you just change the default
branch on the server rather than just within this one clone used for
rewriting the history? Whether you push back to the original repository
or to a new repo, you'd have to take a separate step to change it in
that remote repo. And if you really will use this new local repository
as the official source, then you can switch branches at the end of the
rewrite just as easily.
--separate-git-dir and --template might be okay to use, I haven't
tested. If either doesn't work now, or breaks at any point in the
future, I feel much better being able to say, "I told you to only use
these three flags to git clone."
-u only affects the ability to receive the clone; it's fine to use.
Also, -q only affects the console output during the clone operation, so
you could use it.
There will probably be more flags added to git-clone over time. Testing
against all of them is insanity. Recommend people only use --no-local,
--bare, and --mirror, with the first only needed when cloning from a
local filesystem, and the other two never needed but allowed for those
that prefer.
Signed-off-by: Elijah Newren <newren@gmail.com>
This reverts commit df6c8652a2. The
motivating example was wrong; path renaming should not be involved in
path filtering, it only says how paths should be renamed if they happen
to be selected. A subsequent commit will improve the documentation.
Signed-off-by: Elijah Newren <newren@gmail.com>
The desire to format or lint code throughout history has arisen several
times. It's more natural to do this in filter-branch since it somewhat
forces people to run external commands, but we have an example contrib
demo that shows how to run an external command on each file in history
that I created even before any of these requests came in and yet I still
periodically get requests about it.
Make lint-history ever-so-slightly easier to apply to a subset of
filenames, and include its usage as an extra cheat sheet comparison for
filter-branch-vs-filter-repo commands.
Signed-off-by: Elijah Newren <newren@gmail.com>