Commit Graph

51 Commits (main)

Author SHA1 Message Date
Elijah Newren f57759de44 Merge branch 'jm/grammo-fix'
Signed-off-by: Elijah Newren <newren@gmail.com>
2 years ago
Jonathan Malmaud 6731278c45 git-filter-repo.txt: fix small grammar error
Signed-off-by: Jonathan Malmaud <malmaud@gmail.com>
2 years ago
Elijah Newren 16abc01792 Merge branch 'mg/documentation-typos'
Signed-off-by: Elijah Newren <newren@gmail.com>
2 years ago
Todd Zullinger 70d781b05d git-filter-repo.txt: add missing `git-` prefix to fast-import link
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
2 years ago
Mo-Gul 7b2840a948
doc: fix some typos
Signed-off-by: Stefan Pinnow <Mo-Gul@gmx.net>
2 years ago
Markus Heidelberg 3fe2b5c3c9 filter-repo: prepend the header line to the "ref-map" file
The existance of a header has already been specified in the documentation.
Further adapt it to the real text implemented now.

Signed-off-by: Markus Heidelberg <markus.heidelberg@web.de>
2 years ago
Elijah Newren 933475ecf1 Make it clearer that --path* do not follow renames
The wording "exact paths" appears to not be clear enough for folks and I
keep repeatedly getting bug reports about filter-repo not following
renames.  Make it very explicit.

Signed-off-by: Elijah Newren <newren@gmail.com>
3 years ago
Gwyneth Morgan 129a3bcb8b filter-repo: add new --replace-message option
Like --replace-text, add an option --replace-message which replaces text
in commit/tag message bodies, so that users can easily replace text
without constructing a --message-callback.

Signed-off-by: Gwyneth Morgan <gwymor@tilde.club>
Signed-off-by: Elijah Newren <newren@gmail.com>
3 years ago
Cody Martin 8abc4770e7 git-filter-repo.txt: fix typo in paths-from-file example
The "Filtering based on many paths" section includes this code snippet,
```
regex:^.*/.*/[0-9]{4}-[0-9]{2}-[0-9]{2}.txt$
```
and this text
```
files whose name
was of the form YYYY.MM-DD.txt at least two subdirectories deep
```
Update the text to YYYY-MM-DD.txt to correctly match the regex
in the code snippet.

Signed-off-by: Cody Martin <codytylermartin@gmail.com>
3 years ago
Elijah Newren 75e67bcd44 git-filter-repo.txt: link to GitHub docs on purging old history
Signed-off-by: Elijah Newren <newren@gmail.com>
3 years ago
Elijah Newren 12743def48 git-filter-repo.txt: add some clarifications around replace refs
Signed-off-by: Elijah Newren <newren@gmail.com>
3 years ago
Elijah Newren 3f181531df README.md: link to external formatting of user manual
Some people don't like htmlpreview.github.io.  I once or twice saw a
case where it appeared to be affected by load limits.  Since external
sites are making the manual available, and it's unlikely there are too
many changes between the last release and the current manual, just link
to it as an alternative for folks.

Signed-off-by: Elijah Newren <newren@gmail.com>
3 years ago
Elijah Newren 9282a33a02 git-filter-repo.txt: regexes & globs apply to entire file, not to lines
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Tom Matthews 96959d1174
converting-from-bfg-repo-cleaner.md: fix typo
Signed-off-by: Tom Matthews <trcm@pm.me>
4 years ago
Elijah Newren ed6f410088 Contributing.md: link to Nicolai Hähnle's code review comments
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren cefeef1c0a filter-repo: use new --date-format=raw-permissive fast-import option
fast-import gained a new raw-permissive date format explictly for
allowing people to import repositories as-is.  Make use of the flag, and
stop rewriting the bogus timezone found in rails.git.

If users do not like these bogus times, they can of course write a
filter to fix them (or even make them bogus in a different way).  For
example:

    git filter-repo ... --commit-callback '
      if commit.author_date.endswith(b"+051800"):
        commit.author_date.replace(b"+051800", b"+0261")
    '

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren b74eb6b69d Merge branch 'jr/document-commit-and-ref-map' into main
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
James Ramsay f867bb6ad7 git-filter-repo.txt: document mapping output
Useful commit and reference mappings are created on every run. These are
helpful in a number of situations, and should be documented so that
end-users and Git hosts can understand how to use the output.

The commit-map is particularly useful for Git hosts to override
retention mechanisms, like hidden refs. This allows end-users to purge
large files and sensitive data.

Signed-off-by: James Ramsay <james@jramsay.com.au>
4 years ago
Elijah Newren 8abf8faec8 git-filter-repo.txt: be more forceful on the wording of --force
Online blogs/articles/Q&A as well as direct feedback suggests that
people use the --force flag rather cavalierly.  Add words like
"irreversible" and "immediate pruning" to discourage such blithe
application of this flag.  I hope this encourages folks to either learn
the ramifications of irreversible full-repository entire history
rewrites first, or to follow the recommendation of only operating on a
fresh clone.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren f8c14d159c git-filter-repo.txt: point people at the generated documentation
People keep trying to read this file, unaware that it is the source code
for generating the documentation, not the generated documentation.  Add
a comment at the top that explains this and points people in the right
direction.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 38e70b69e8 filter-repo: ignore comment lines in --paths-from-file
Allow lines starting with '#' to be treated as a comment and be ignored.
Update the documentation to note that both blank lines and comment lines
are ignored, and mention how filenames starting with '#' can be matched
(namely, the same way that filenames startwith with 'regex:', 'glob:',
or 'literal:' can be -- by prefixing the filename with 'literal:').

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren a238e3b7e6 git-filter-repo.txt: discourage use of random clone flags
Flags like --local, --shared, --reference (and --dissociate), and
--origin would all mess up the fresh clone checker.  Attempting to
defend against all of them would not only be costly, but make it harder
to draw the line about guesses as to whether a repository is a fresh
clone or not.  --origin also has problems in that filter-repo has
special handling for the 'origin' remote that I don't want to apply to
other random remotes.

Flags like --depth, --single-branch, and --no-tags could prevent enough
data from being downloaded to do a full rewrite and result in a
partially rewritten or possibly even corrupt history (no idea how
shallow clones interact; probably badly).  --filter would also make the
repo start without enough info though it'd at least be downloaded on
demand; it'd still be a really slow way to do it, though, so it's a bad
idea.

filter-repo doesn't really provide an easy mechanism to rewrite a repo
and its submodule simultaneously, so recursing submodules seems useless
and unhelpful.  --shallow-submodules would be bad for at least the same
reasons --depth is for the parent module, assuming we handled
submodules.  --remote-submodules just provides a way to make the repo
dirty to start, which is counter-productive.  --jobs could be useful, if
recursing submodules was.

--no-checkout might be safe to use and --sparse might also be okay for
as long as it only affects the working tree, but in both cases why not
go --bare or --mirror if you're doing that?  Likewise, --no-hardlinks is
useless given that we're already saying people need to use --no-local.

-b would be okay to use, but why wouldn't you just change the default
branch on the server rather than just within this one clone used for
rewriting the history?  Whether you push back to the original repository
or to a new repo, you'd have to take a separate step to change it in
that remote repo.  And if you really will use this new local repository
as the official source, then you can switch branches at the end of the
rewrite just as easily.

--separate-git-dir and --template might be okay to use, I haven't
tested.  If either doesn't work now, or breaks at any point in the
future, I feel much better being able to say, "I told you to only use
these three flags to git clone."

-u only affects the ability to receive the clone; it's fine to use.
Also, -q only affects the console output during the clone operation, so
you could use it.

There will probably be more flags added to git-clone over time.  Testing
against all of them is insanity.  Recommend people only use --no-local,
--bare, and --mirror, with the first only needed when cloning from a
local filesystem, and the other two never needed but allowed for those
that prefer.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 49d6f02ff8 filter-repo: clarify interactions between path filtering and path renaming
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren a4c12253a8 git-filter-repo.txt: briefly explain steps for pushing to original url
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 86569ee7ac Contributing.md: add a small clarification about line coverage
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 23bec32283 contrib, docs: make discovery of code formatting and linting easier
The desire to format or lint code throughout history has arisen several
times.  It's more natural to do this in filter-branch since it somewhat
forces people to run external commands, but we have an example contrib
demo that shows how to run an external command on each file in history
that I created even before any of these requests came in and yet I still
periodically get requests about it.

Make lint-history ever-so-slightly easier to apply to a subset of
filenames, and include its usage as an extra cheat sheet comparison for
filter-branch-vs-filter-repo commands.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 859e66ae1c converting-from-filter-branch.md: add a small clarification
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren d32f6258a8 converting-from-bfg-repo-cleaner.md: add a small clarification
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren d87b665ed4 git-filter-repo.txt: connect --no-local and fresh clones more thoroughly
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 2bfb9cf261 git-filter-repo.txt: fix extraneous space
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 1e2d0e91cb Documentation: add more detailed explanation of safety checks and --force
I occasionally get people doing special things, or see people
recommending to others to just use --force.  Add some explanations
behind the safety checks so that those doing special things know when
it's okay, and to explain why it's a really bad idea to casually or
haphazardly recommend others use --force.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 7e1184cd42 git-filter-repo.txt: add more --paths-from-file examples with large filtering lists
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 5c4637ff81 Documentation: add guides for people converting from filter-branch or BFG
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren db9ac1fffe git-filter-repo.txt: add documentation of --no-ff option
Commit 5e04dff097 (filter-repo: add new --no-ff option, 2020-01-01)
added support for a --no-ff option, but only added documentation in the
built-in output, not in the intended-to-be-more-complete manual.  Add
documentation to the manual for this option.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 764e0e00dd git-filter-repo.txt: add examples for --[to-]subdirectory-filter
I had lots of examples of these being horribly mis-used and being used in place
of each other; add some examples with some description of the repository layout
to try to avoid all that confusion.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren e834379254 filter-repo: clarify usage of --use-base-name
fast-export/fast-import only work with filenames (using full path from
the root of the repository); thus that's all that filter-repo works
with.  Full pathnames implicitly include all leading directories as part
of the pathname, which is what allows us to match against directories.
However, it obviously means --use-base-name can't be used to match paths
against directories.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 011c646ee8 filter-repo: suggest --no-local when cloning local repos
Cloning local repos by default makes a bunch of hardlinks, giving you a
non-packed repository, and leading folks to use and suggest --force.
That, of course, bypasses the important fresh clone checks to prevent
people from accidentally and irrecoverably deleting their non-backed-up
data.  Let's make it easier for people to avoid (and suggest) that
mistake.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Kate F 420aa32dac git-filter-repo.txt: Fix typo for example
Signed-off-by: Kate F <kate@elide.org>
4 years ago
Elijah Newren 96e217355c Contributing.md: start with git guidelines, then mention exceptions
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 18f98295e4 git-filter-repo.txt: fix nested bullets to render correctly
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 3a3cd3d15e git-filter-repo.txt: fix example of editing blob contents
You can call bytes.replace() or re.sub(), but you can't call
bytes.sub().  Oops.  Fix the example in the documentation.

Reported-by: John Gietzen <john@gietzen.us>
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren 3bdfa91768 Contributing.md: clarify reasons for using git.git submission guidelines
Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren f9ebe6a3f7 filter-repo: avoid clobbering files whose names differ in case only
git fast-import, in an attempt to be friendly, allows the same file to
be specified multiple times within a commit and just takes the last
version of the file listed.  It determines whether files are the same
via fspathncmp, which is defined differently depending on the setting of
core.ignorecase.  Unfortunately, this means that if someone is trying to
do filtering of history and using a broken (case-insensitive) filesystem
and the history they are filtering has some paths that differed in name
only, then fast-import will delete whichever of the "colliding" files is
listed first.

Avoid these problems by just turning off core.ignorecase while
fast-import is running.  This will prevent silently modifying the repo
in an unexpected way.  Users on such filesystems may have difficulty
checking out commits with files which differ in case only, but that is
a separate problem for them to deal with after rewriting history.

Signed-off-by: Elijah Newren <newren@gmail.com>
4 years ago
Elijah Newren b56ca0437a Contributing.md: clarify notes about PEP-8
Signed-off-by: Elijah Newren <newren@gmail.com>
5 years ago
Elijah Newren d07a2fe2ea Contributing.md: mention testsuite line coverage
Signed-off-by: Elijah Newren <newren@gmail.com>
5 years ago
Elijah Newren b3eb2cf461 filter-repo (README): add code of conduct and contributing guidelines
Signed-off-by: Elijah Newren <newren@gmail.com>
5 years ago
Elijah Newren 84fddfe262 git-filter-repo.txt: fix typesetting of --partial
Signed-off-by: Elijah Newren <newren@gmail.com>
5 years ago
Elijah Newren e0140bb2ad git-filter-repo.txt: minor updates to docs
A few changes:
  * Include notes about git-2.24.0 changes
  * Make it clearer that messing with the first parent could have
    negative side-effects if the file_changes aren't also updated.
  * Fix wrapping of a line that was too long.

Also, update the README.md:
  * Note the upstream improvements made in (not yet released) git-2.24.0

Signed-off-by: Elijah Newren <newren@gmail.com>
5 years ago
Elijah Newren 320c85f941 filter-repo: improve support for partial history rewrites
Partial history rewrites were possible before with the (previously
hidden) --refs flag, but the defaults were wrong.  That could be worked
around with the --source or --target flags, but that disabled --no-data
for fast-export and thus slowed things down, and also would require
overridding --replace-refs.  And the defaults for --source and --target
may diverge further from what is wanted/needed for partial history
rewrites in the future.

So, add --partial as a first-class supported option with scary
documentation about how it permits mixing new and old history.  Make
--refs imply that flag.  Make the behavioral similarities (in regards to
which steps are skipped) between --source, --target, and --partial more
clear.  Add relevant documentation to round it out.

Signed-off-by: Elijah Newren <newren@gmail.com>
5 years ago
Elijah Newren 71bb8d26a9 filter-repo: add a --state-branch option for incremental exporting
Allow folks to periodically update the export of a live repo without
re-exporting from the beginning.  This is a performance improvement, but
can also be important for collaboration.  For example, for sensitivity
reasons, folks might want to export a subset of a repo and update the
export periodically.  While this could be done by just re-exporting the
repository anew each time, there is a risk that the paths used to
specify the wanted subset might need to change in the future; making the
user verify that their paths (including globs or regexes) don't also
pick up anything from history that was previously excluded so that they
don't get a divergent history is not very user friendly.  Allowing them
to just export stuff that is new since the last export works much better
for them.

Signed-off-by: Elijah Newren <newren@gmail.com>
5 years ago