This is by far the largest python3 change; it consists basically of
* using b'<str>' instead of '<str>' in lots of places
* adding a .encode() if we really do work with a string but need to
get it converted to a bytestring
* replace uses of .format() with interpolation via the '%' operator,
since bytestrings don't have a .format() method.
Signed-off-by: Elijah Newren <newren@gmail.com>
While the underlying fast-export and fast-import streams explicitly
separate 'from' commit (first parent) and 'merge' commits (all other
parents), foisting that separation into the Commit object for
filter-repo forces additional places in the code to deal with that
distinction. It results in less clear code, and especially does not
make sense to push upon folks who may want to use filter-repo as a
library.
Signed-off-by: Elijah Newren <newren@gmail.com>
Use UTF-8 chars in user names, filenames, branch names, tag names, and
file contents. Also include invalid UTF-8 in file contents; should be
able to handle binary data.
Signed-off-by: Elijah Newren <newren@gmail.com>
The sorting order of entries written to files in the analysis directory
didn't specify a secondary sort, thus making the order dependent on the
random-ish sorting order of dictionaries and making it inconsistent
between python versions. While the secondary order didn't matter much,
having a defined order makes it slightly easier to define a single
testcase that can work across versions.
Signed-off-by: Elijah Newren <newren@gmail.com>
Assuming filter-repo will be merged into git.git, use "git" for the
TEXTDOMAIN, and assume its build system will replace "@@LOCALEDIR@@"
appropriately.
Note that the xgettext command used to grab string translations is
nearly identical to the one for C files in git.git; just use
--language=python instead and add --join-existing to avoid overwriting
the po/git.pot file. In other words, use the command:
xgettext -o../git/po/git.pot --join-existing --force-po \
--add-comments=TRANSLATORS: \
--msgid-bugs-address="Git Mailing List <git@vger.kernel.org>" \
--from-code=UTF-8 --language=python \
--keyword=_ --keyword=N_ --keyword="Q_:1,2" \
git-filter-repo
To create or update the translation, go to git.git/po and run either of:
msginit --locale=XX
msgmerge --add-location --backup=off -U XX.po git.pot
Once you've updated the translation, within git.git just build as
normal. That's all that's needed.
Signed-off-by: Elijah Newren <newren@gmail.com>
The AncestryGraph setup assumed we had previously seen all commits which
would be used as parents; that interacted badly with doing an
incremental import. Add a function which can be used to record external
commits, each of which we'll treat like a root commit (i.e. depth 1 and
having no parents of its own). Add a test to prevent regressions.
Signed-off-by: Elijah Newren <newren@gmail.com>
There are a number of things not present in "normal" imports that we
nevertheless support and need to be tested:
* broken timezone adjustment (+051800->+0261; observed in the wild
in real repos, and adjustment prevents fast-import from dying)
* commits missing an author (observed in the wild in a real repo;
just sets author to committer)
* optional additional linefeeds in the input allowed by
git-fast-import but usually not written by git-fast-export
* progress and checkpoint objects
* progress, checkpoint, and 'everything' callbacks
Signed-off-by: Elijah Newren <newren@gmail.com>
This also generates line coverage statistics for t/t9391/*.py, but the
point is line coverage of git-filter-repo.
Signed-off-by: Elijah Newren <newren@gmail.com>
The test does check the exact output for the report, meaning if the
output is changed at all this test will need to be updated, but it at
least makes sure we are getting all the right kinds of information. I
do not expect the output format will change very often.
Signed-off-by: Elijah Newren <newren@gmail.com>
Pruning of commits which become empty can result in a variety of
topology changes: a merge may have lost all its ancestors corresponding
to one of (or more) of its parents, a merge may end up merging a commit
with itself, or a merge may end up merging a commit with its own
ancestor. Merging a commit with itself makes no sense, so we'd rather
prune down to one parent and hopefully prune the merge commit, but we do
need to worry about whether the are changes in the commit and whether
the original merge commit also merged something with itself. We have
similar cases for dealing with a merge of some commit with its own
ancestor: if the original topology did the same, or the merge commit has
additional file changes, then we cannot remove the commit. But,
otherwise, the commit can be pruned.
Add testcases covering the variety of changes that can occur to make
sure we get them right.
Signed-off-by: Elijah Newren <newren@gmail.com>
There are several cases to worry about with commit pruning; commits
that start empty and had no parent, commits that start empty and
had a parent which may or may not get pruned, commits which had
changes but became empty, commits which were merges but lost a line
of ancestry and have no changes of their own, etc. Add testcases
covering these cases, though most topology related ones will be
deferred to a later set of tests.
Signed-off-by: Elijah Newren <newren@gmail.com>
Many of the callback functions might only be a single line, and as such
instead of forcing the user to write a full blown program with an import
and everything, let them just specify the body of the callback function
as a command line parameter. Add several tests of this functionality as
well.
Signed-off-by: Elijah Newren <newren@gmail.com>
Python wants filenames with underscores instead of hyphens and with a
.py extension. We really want the main file named git-filter-repo, but
we can add a git_filter_repo.py symlink. Doing so dramatically
simplifies the steps needed to import it as a library in external python
scripts.
Signed-off-by: Elijah Newren <newren@gmail.com>
Most filtering operations are not interested in the time that commits
were authored or committed, or when tags were tagged. As such,
translating the string representation of the date into a datetime object
is wasted effort, and causes us to waste more time later as we have to
translate it back into a string.
Instead, provide string_to_date() and date_to_string() functions so that
callers can perform the translation if wanted, and let the normal case
be fast.
Provides a small but noticable speedup when just filtering based on
paths; about a 3.5% improvement in execution time for writing the new
history.
Signed-off-by: Elijah Newren <newren@gmail.com>
Make it easy for users to search and replace text throughout the
repository history. Instead of inventing some new syntax, reuse the
same syntax used by BFG repo filter's --replace-text option, namely,
a file with one expression per line of the form
[regex:|glob:|literal:]$MATCH_EXPR[==>$REPLACEMENT_EXPR]
Where "$MATCH_EXPR" is by default considered to be literal text, but
could be a regex or a glob if the appropriate prefix is used. Also,
$REPLACEMENT_EXPR defaults to '***REMOVED***' if not specified. If
you want a literal '==>' to be part of your $MATCH_EXPR, then you
must also manually specify a replacement expression instead of taking
the default. Some examples:
sup3rs3kr3t
(replaces 'sup3rs3kr3t' with '***REMOVED***')
HeWhoShallNotBeNamed==>Voldemort
(replaces 'HeWhoShallNotBeNamed' with 'Voldemort')
very==>
(replaces 'very' with the empty string)
regex:(\d{2})/(\d{2})/(\d{4})==>\2/\1/\3
(replaces '05/17/2012' with '17/05/2012', and vice-versa)
The format for regex is as from
re.sub(<pattern>, <repl>, <string>) from
https://docs.python.org/2/library/re.html
The <string> comes from file contents of the repo, and you specify
the <pattern> and <repl>.
glob:Copy*t==>Cartel
(replaces 'Copyright' or 'Copyleft' or 'Copy my st' with 'Cartel')
Signed-off-by: Elijah Newren <newren@gmail.com>