git-filter-repo

Commit Graph

Author	SHA1	Message	Date
Johannes Schindelin	6967fad156	t9390: avoid using `colrm` While it is true that `colrm` is available on macOS by default, and even in Ubuntu (thanks to the `bsdmainutils` package), it is not available on Windows. Let's use `cut` instead. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	4 years ago
Johannes Schindelin	e6ffeded2e	t9390: avoid using Bash-ism `<(...)` The problem with this is that on Windows, we use the MSYS2 Bash which uses the POSIX emulation layer called "MSYS2 runtime" that pretends that there _is_ something like the `/dev/fd/` namespace, and tells `git.exe` about it, but `git.exe` does not use the POSIX emulation layer, and hence has no idea what Bash is talking about. Besides, we should avoid pipes, just as we do in the Git project. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	4 years ago
Johannes Schindelin	8bc195673c	t9390: close link of broken &&-chain Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	4 years ago
Johannes Schindelin	f1ee28d78f	t9390: expect the correct line count in `--strip-blobs-with-ids` In that test case, we expect the line count to be 5, but it is actually 6 lines that we should expect: numbers/medium.num numbers/small.num sequence/know whatever words/know Note the empty line at the top: this list is generated via `git log --format=%n`, and that `%n` stands for "newline", meaning that we _must_ expect an empty line. This expectation seems to have been broken already in the commit that added the test case: `b6a35f8` (filter-repo: implement --strip-blobs-with-ids, 2019-05-30). It was hidden for such a long time by a broken &&-chain, which we will fix next. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	4 years ago
Johannes Schindelin	6c475a7e09	t9390: use the correct prereq when using "funny" file names Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	4 years ago
Elijah Newren	93ee4ae907	Merge branch 'mw/empty-author-name' into main Signed-off-by: Elijah Newren <newren@gmail.com>	4 years ago
Martin Wilck	282f8ddb9b	filter-repo: only set author from committer if author email not set Some commits may have a valid author email, but no valid author name. Old versions of git didn't enforce a non-empty name. Setting the author data from the committer is wrong in this case. Also add a test case for this to t9390. Example: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c6295cdf656de63d6d1123def71daba6cd91939c (en: replaced with a dedicated test instead of tweaking existing ones) Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Elijah Newren <newren@gmail.com>	4 years ago
Elijah Newren	7eaaf191de	filter-repo: correctly prune nested tags not matching filtering criteria When the user specifies some kind of criteria to filter commits by (e.g. --subdirectory-filter mysubdir), we rewrite parents commits that are entirely filtered out to the most recent ancestor that still exists, or just prune the parent if there isn't one. That works great when the parent is a commit, but nested tags have parents that are tags. If we only prune the first tag (i.e. the tag of a commit), then letting any tags through that had that tag as a parent will result in a fast-import crash with a message of the form fatal: mark :35390 not declared Ensure that when a tag gets pruned, the pruning is recorded as such...so that any children tags will get pruned as well. Signed-off-by: Elijah Newren <newren@gmail.com>	4 years ago
Elijah Newren	d79ea709b7	filter-repo: fix crash from assuming parent is an int When filtering with --refs, parents can be a hash rather than an integer. There was a code path in RepoFilter._prunable() that was written assuming the first parent would always be an integer; fix it to handle a hash as well. Reported-by: Niklas Hambüchen <mail@nh2.me> Signed-off-by: Elijah Newren <newren@gmail.com>	4 years ago
Elijah Newren	38e70b69e8	filter-repo: ignore comment lines in --paths-from-file Allow lines starting with '#' to be treated as a comment and be ignored. Update the documentation to note that both blank lines and comment lines are ignored, and mention how filenames starting with '#' can be matched (namely, the same way that filenames startwith with 'regex:', 'glob:', or 'literal:' can be -- by prefixing the filename with 'literal:'). Signed-off-by: Elijah Newren <newren@gmail.com>	4 years ago
Elijah Newren	25b226b1de	t9390: make tests individually re-runnable Signed-off-by: Elijah Newren <newren@gmail.com>	4 years ago
Elijah Newren	49d6f02ff8	filter-repo: clarify interactions between path filtering and path renaming Signed-off-by: Elijah Newren <newren@gmail.com>	4 years ago
Elijah Newren	3e1bff264c	Revert "filter-repo: fix ugly bug with mixing path filtering and renaming" This reverts commit `df6c8652a2`. The motivating example was wrong; path renaming should not be involved in path filtering, it only says how paths should be renamed if they happen to be selected. A subsequent commit will improve the documentation. Signed-off-by: Elijah Newren <newren@gmail.com>	4 years ago
Elijah Newren	df6c8652a2	filter-repo: fix ugly bug with mixing path filtering and renaming There's also a fix in here to make sure to throw an error if users are trying to rename paths and use --invert-paths; it's not clear at all what that would even mean. But that also becomes important later... Due to the ability to either filter wanted paths (default), or to just specify unwanted paths (with --invert-paths), I keep a special args.inclusive variable to track whether a "match" means we want the path or not. There are some special cases, notably when there are no filters present (meaning e.g. no --path specifications, at most there are some --path-rename values provided). When there are no filters present, that means we should keep paths even if we don't "find a match" against any of the filters. Now, since the rename code was embedded in the same loop as the filter checks, it unfortunately was also being checked against the args.inclusive setting despite never setting whether it found a match. That happened to work in the special case that there were no filtering paths but only because of the special logic for that case. Since renaming only makes sense if --invert-paths is not specified, any path we rename is one we always want to keep. Make sure we do. Reported-by: Nadège (@nagreme on GitHub) Signed-off-by: Elijah Newren <newren@gmail.com>	4 years ago
Elijah Newren	2833ef275f	filter-repo: throw an error if user specifies any path starting with a slash All paths are intended to be relative paths, relative to the project root, not to the filesystem root. There have been a few people who didn't understand this, and then ended up with fast-import crashes that are not very clear. Check for it early and throw a simple error message instead. Signed-off-by: Elijah Newren <newren@gmail.com>	4 years ago
Elijah Newren	011c646ee8	filter-repo: suggest --no-local when cloning local repos Cloning local repos by default makes a bunch of hardlinks, giving you a non-packed repository, and leading folks to use and suggest --force. That, of course, bypasses the important fresh clone checks to prevent people from accidentally and irrecoverably deleting their non-backed-up data. Let's make it easier for people to avoid (and suggest) that mistake. Signed-off-by: Elijah Newren <newren@gmail.com>	4 years ago
Elijah Newren	9928b7cb3e	t9390: add missing '&&' in command chain Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	e11343e504	filter-repo: handle typechange modifications when first parent is pruned Commit `509a624b` (filter-repo: fix issue with pruning of empty commits, 2019-10-03) added code to get a new list of file changes when the first parent was pruned. However, this logic did not handle cases where one of the file modifications was a typechange. Add the necessary logic to handle that case. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	85c8e3660d	filter-repo: accelerate is_ancestor() for --analyze mode The --analyze mode was extremely slow for the freebsd/freebsd repo on github; digging in, the is_ancestor() function was being called a huge number of times -- about 22 times per commit on average (and about 17 million times overall). The analyze mode uses is_ancestor() to determine whether a rename equivalency class should be broken (i.e. renaming A->B mean all versions of A and B are just different versions of the same file, but if someone adds a new A in some commit which contains the A->B rename in its history then this equivalence class no longer holds). Each is_ancestor() call potentially has to walk a tree of dependencies all the way back to a sufficient depth where it can realize that the commit cannot be an ancestor; this can be a very long walk. We can speed this up by keeping track of some previous is_ancestor() results. If commit F is not an ancestor of commit G, then F cannot be an ancestor of children of G (unless that child has multiple parents; but even in that case F can only be an ancestor through one of the parents other than G). Similarly, if F is an ancestor of commit G, then F will always be an ancestor of any children of G. Cache results from previous calls to is_ancestor() and use them to accelerate subsequent calls. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	1dae85ee9a	filter-repo: permit trailing slash for --[to-]subdirectory-filter argument There was code to allow the argument of --to-subdirectory-filter and --subdirectory-filter to have a trailing slash, but it was broken due to a bug in python3's bytestring design: b'somestring/'[-1] != b'/', despite that being the obvious expectation. One either has to compare b'somestring/'[-1:] to b'/' or else compare b'somestring/'[-1] to b'/'[0]. So lame. Note that this is essentially a follow-up to commit `385b0586ca` ("filter-repo (python3): bytestr splicing and iterating is different", 2019-04-27). Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	9d51a90648	filter-repo: fix pruning of empty commits with blob callbacks Blob callbacks, either implicit (via e.g. --replace-text) or explicit, can modify blobs in ways that make them match other blobs, which in turn can result in some commits becoming empty. We need to detect such cases and ensure we prune these empty commits when --prune-empty=auto. Reported-by: John Gietzen <john@gietzen.us> Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	8994b4e55d	filter-repo: fix bad column label in path-all-sizes.txt report Reported-by: John Gietzen <john@gietzen.us> Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	5e04dff097	filter-repo: add new --no-ff option Some projects have a strict --no-ff merging policy. With the default behavior of --prune-degenerate, we can prune merge commits in a way that transforms the history into a fast-forward merge. Consider this example: * There are two independent commits or branches, named B & C, which are both built on top of A so that history look like this diagram: A \ \ \ B \ -C * Someone runs the following sequence of commands: * git checkout A * git merge --no-ff B * git merge --no-ff C * This will result in a history that looks like: A---AB---AC \ \ / / \ B / \ / -C- * Later, someone comes along and runs filter-repo, specifying to remove the only path(s) that were modified by B. That would naturally remove commit B and the no-longer-necessary merge commit AB. For someone using a strict no-ff policy, the desired history is A---AC \ / C However, the default handling for --prune-degenerate would notice that AC merely merges C into its own ancestor A, whereas the original AC merged C into something separate (namely, AB). So, it would say that AC has become degenerate and prune it, leaving the simple history of A \ C For projects not using a strict no-ff policy, this simpler history is probably better, but for folks that want a strict no-ff policy, it is unfortunate. Provide a --no-ff option to tweak the --prune-degenerate behavior so that it ignores the first parent being an ancestor of another parent (leaving the first parent unpruned even if it is or becomes degenerate in this fashion). Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Karl Lenz	caf85b68ec	filter-repo: allow --dry-run and --debug to be used together Prior to this commit, git-filter-repo could only be used with either the --dry-run flag or the --debug flag, not both. When run in debug mode, git-filter-repo expected to be able to read from the output stream, which obviously isn't created when doing a dry run, so it stack traced when it tried to use the non-existent output stream. This commit fixes that bug with an equally simple sanity check for the existence of the output stream when run in debug mode. Signed-off-by: Karl Lenz <xorangekiller@gmail.com>	5 years ago
Karl Lenz	780c74b218	filter-repo: parse mailmap entries with no email address The mailmap format parsed by the "git shortlog" command allows for matching mailmap entries with no email address. This is admittedly an edge case, because most Git commits will have an email address associated with them as well as a name, but technically the address isn't required, and "git shortlog" accommodates that in its mailmap format. This commit teaches git-filter-repo to do the same thing. Signed-off-by: Karl Lenz <xorangekiller@gmail.com>	5 years ago
Elijah Newren	7cfef09e9b	filter-repo: warn users who try to use invalid path components It's hard to be exhaustive, but if users try something like: --path-rename foo/bar/baz:. or --path ../other-dir then bad things happen. In the first case, filter-repo will try to ask fast-import to create a directory named '.' and move everything from foo/bar/baz/ into it but of course '.' is a reserved directory name so we can't create it. In the second case, they are probably running from a subdirectory, but filter-repo doesn't work from a subdirectory. I hard-coded the assumption that everything was in the toplevel directory and all paths were relative from there pretty early on. So, if the user tries to use any of these components anywhere, just throw an early error. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	a9a93d9d83	filter-repo: actually fix issue with pruning of empty commits In commit `509a624` (filter-repo: fix issue with pruning of empty commits, 2019-10-03), it was noted that when the first parent is pruned away, then we need to generate a corrected list of file changes relative to the new first parent. Unfortunately, we did not apply our set of file filters to that new list of file changes, causing us to possibly introduce many unwanted files from the second parent into the history. The testcase added at the time was rather lax and totally missed this problem (which possibly exacerbated the original bug being fixed rather than helping). Tighten the testcase, and fix the error by filtering the generated list of file changes. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	e6dd613e3f	filter-repo: add a --version option Note that this isn't a version number or even the more generalized version string that folks are used to seeing, but a version hash (or leading portion thereof). A few import points: * These version hashes are not strictly monotonically increasing values. Like I said, these aren't version numbers. If that bothers you, read on... * This scheme has incredibly nice semantics satisfying a pair of properties that most version schemes would assume are mutually incompatible: This scheme works even if the user doesn't have a clone of filter-repo and doesn't require any build step to inject the version into the program; it works even if people just download git-filter-repo.py off GitHub without any of the other sources. And: This scheme means that a user is running precisely version X of the code, with the version not easily faked or misrepresented when third parties edit the code. Given the wonderful semantics provided by satisfying this pair of properties that all other versioning schemes seem to miss out on, I think I should name this scheme. How about "Semantic Versioning"? (Hehe...) * The version hash is super easy to use; I just go to my own clone of filter-repo and run either: git show $VERSION_HASH or git describe $VERSION_HASH * A human consumable version might suggest to folks that this software is something they might frequently use and upgrade. This program should only be used in exceptional cases (because rewriting history is not for the faint of heart). * A human consumable version (i.e. a version number or even the more relaxed version strings in more common use) might suggest to folks that they can rely on strict backward compatibility. It's nice to subtly undercut any such assumption. * Despite all that, I will make releases (downloadable tarballs with real version numbers in the tarball name; I'm just going to re-use whatever version git is released with at the time). But those version numbers won't be used by the --version option; instead the version hash will. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	62c311c69f	filter-repo: fix an unmarked bytestring to be marked as such Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	509a624b6a	filter-repo: fix issue with pruning of empty commits In order to build the correct tree for a commit, git-fast-import always takes a list of file changes for a merge commit relative to the first parent. When the entire first-parent history of a merge commit is pruned away and the merge had paths with no difference relative to the first parent but which differed relative to later parents, then we really need to generate a new list of file changes in order to have one of those other parents become the new first parent. An example might help clarify... Let's say that there is a merge commit, and: * it resolved differences in pathA between its two parents by taking the version of pathA from the first parent. * pathB was added in the history of the second parent (it is not present in the first parent) and is NOT included in the merge commit (either being deleted, or via rename treated as deleted and added as something else) For this merge commit, neither pathA nor pathB differ from the first parent, and thus wouldn't appear in the list of file changes shown by fast-export. However, when our filtering rules determine that the first parent (and all its parents) should be pruned away, then the second parent has to become the new first parent of the merge commit. But to end up with the right files in the merge commit despite using a different parent, we need a list of file changes that specifies the changes for both pathA and pathB. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	71bb8d26a9	filter-repo: add a --state-branch option for incremental exporting Allow folks to periodically update the export of a live repo without re-exporting from the beginning. This is a performance improvement, but can also be important for collaboration. For example, for sensitivity reasons, folks might want to export a subset of a repo and update the export periodically. While this could be done by just re-exporting the repository anew each time, there is a risk that the paths used to specify the wanted subset might need to change in the future; making the user verify that their paths (including globs or regexes) don't also pick up anything from history that was previously excluded so that they don't get a divergent history is not very user friendly. Allowing them to just export stuff that is new since the last export works much better for them. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	7a12d7a38b	filter-repo: add ability to parse and dump encoding Commit `346f2ba891` (filter-repo: make reencoding of commit messages togglable, 2019-05-11) made reencoding of commit messages togglable but forgot to add parsing and outputting of the encoding header itself. Add such ability now. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	88c1269d5a	filter-repo: ensure branches are updated as we go When we prune a commit for being empty, there is no update to the branch associated with the commit in the fast-import stream. If the parent commit had been associated with a different branch, then the branch associated with the pruned commit would not be updated without additional measures. In the past, we resolved this by recording that the branch needed an update in _seen_refs. While this works, it is a bit more complicated than just issuing an immediate Reset. Also, note that we need to avoid calling callbacks on that Reset because those could rename branches (again, if the commit-callback already renamed once) causing us to not update the intended branch. There was actually one testcase where the old method didn't work: when a branch was pruned away to nothing. A testcase accidentally encoded the wrong behavior, hiding this problem. Fix the testcase to check for correct behavior. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	b6a35f8dcd	filter-repo: implement --strip-blobs-with-ids Add a flag allowing for specifying a file filled with blob-ids which will be stripped from the repository. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	89f9fbbb6d	filter-repo: partial repo filtering considerations Fix a few issues and add a token testcase for partial repo filtering. Add a note about how I think this is not a particularly interesting or core usecase for filter-repo, even if I have put some good effort into the fast-export side to ensure it worked. If there is a core usecase that can be addressed without causing usability problems (particularly the "don't mix old and new history" edict for normal rewrites), then I'll be happy to add more testcases, document it better, etc. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	1a887c5c13	filter-repo: more careful handling of --source and --target Make several fixes around --source and --target: * Explain steps we skip when source or target locations are specified * Only write reports to the target directory, never the source * Query target git repo for final ref values, not the source * Make sure --debug messages avoid throwing TypeErrors due to mixing strings and bytes * Make sure to include entries in ref-map that weren't in the original target repo * Don't: * worry about mixing old and new history (i.e. nuking refs that weren't updated, expiring reflogs, gc'ing) * attempt to map refs/remotes/origin/* -> refs/heads/* * disconnect origin remote * Continue (but only in target repo): * fresh-clone sanity checks * writing replace refs * doing a 'git reset --hard' Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	587f727d19	filter-repo: implement --strip-blobs-bigger-than Add a flag for filtering out blob based on their size, and allow the size to be specified using 'K', 'M', or 'G' suffixes. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	6fb7da0f0a	filter-repo: rename to --prune-empty and --prune-degenerate Imperative form sounds better than --empty-pruning and --degenerate-pruning, and it probably works better with command line completion. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	4c25fe7a37	filter-repo: handle reset to specific ref and deletion The reset directive can specify a commit hash for the 'from' directive, which can be used to reset to a specify commit, or, if the hash is all zeros, then it can be used to delete the ref. Support such operations. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	2472d1c93f	filter-repo: implement --paths-from-file This allows the user to put a whole bunch of paths they want to keep (or want to remove) in a file and then just provide the path to it. They can also use globs or regexes (similar to --replace-text) and can also do renames. In fact, this allows regex renames, despite the fact that I never added a --path-rename-regex option. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	9744c57106	filter-repo: change --path-rename to work on matches instead of prefixes Using an exact path (file or directory) for --path-rename instead of a prefix removes an ugly caveat from the documentation, makes it operate similarly to --path, and will make it easier to reuse common code when I add the --paths-from-file option. Switch over, and replace the startswith() check by a call to filename_matches(). Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	092d0163d4	filter-repo: implement --use-base-name This new flag allows people to filter files solely based on their basename rather than on their full path within the repo, making it easier to e.g. remove all .DS_Store files or keep all README.md files. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	76b71fe92d	filter-repo: allow rewriting of hashes in commit messages to be toggled Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	1fa8c2c70b	filter-repo: add --replace-refs option This adds the ability to automatically add new replacement refs for each rewritten commit (as well as delete or update replacement refs that existed before the run). This will allow users to use either new or old commit hashes to reference commits locally, though old commit hashes will need to be unabbreviated. The only requirement for this to work, is that the person who does the rewrite also needs to push the replace refs up where other users can grab them, and users who want to use them need to modify their fetch refspecs to grab the replace refs. However, other tools external to git may not understand replace refs... Tools like Gerrit and GitHub apparently do not yet natively understand replace refs. Trying to view "commits" by the replacement ref will yield various forms of "Not Found" in each tool. One has to instead try to view it as a branch with an odd name (including "refs/replace/"), and often branches are accessed via a different URL style than commits so it becomes very non-obvious to users how to access the info associated with an old commit hash. * In Gerrit, instead of being able to search on the sha1sum or use a pre-defined URL to search and auto-redirect to the appropriate code review with https://gerrit.SITE.COM/#/q/${OLD_SHA1SUM},n,z one instead has to have a special plugin and go to a URL like https://gerrit.SITE.COM/plugins/gitiles/ORG/REPO/+/refs/replace/${OLD_SHA1SUM} but then the user isn't shown the actual code review and will need to guess which link to click on to get to it (and it'll only be there if the user included a Change-Id in the commit message). * In GitHub, instead of being able to go to a URL like https://github.SITE.COM/ORG/REPO/commit/${OLD_SHA1SUM} one instead has to navigate based on branch using https://github.SITE.COM/ORG/REPO/tree/refs/replace/${OLD_SHA1SUM} but that will show a listing of commits instead of information about a specific commit; the user has to manually click on the first commit to get to the desired location. For now, providing replace refs at least allows users to access information locally using old IDs; perhaps in time as other external tools will gain a better understanding of how to use replace refs, the barrier to history rewrites will decrease enough that big projects that really need it (e.g. those that have committed many sins by commiting stupidly large useless binary blobs) can at least seriously contemplate the undertaking. History rewrites will always have some drawbacks and pain associated with them, as they should, but when warranted it's nice to have transition plans that are more smooth than a massive flag day. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	2c8f763426	filter-repo: allow users to adjust pruning of empty & degenerate commits We have a good default for pruning of empty commits and degenerate merge commits: only pruning such commits that didn't start out that way (i.e. that couldn't intentionally have been empty or degenerate). However, users may have reasons to want to aggressively prune such commits (maybe they used BFG repo filter or filter-branch previously and have lots of cruft commits that they want remoed), and we may as well allow them to specify that they don't want pruning too, just to be flexible. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	89e5c43805	filter-repo: include additional worktrees in sanity startup check Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	35052f673d	filter-repo (python3): replace strings with bytestrings This is by far the largest python3 change; it consists basically of * using b'<str>' instead of '<str>' in lots of places * adding a .encode() if we really do work with a string but need to get it converted to a bytestring * replace uses of .format() with interpolation via the '%' operator, since bytestrings don't have a .format() method. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	bec6bd8d3c	filter-repo: add testcase with funny characters Use UTF-8 chars in user names, filenames, branch names, tag names, and file contents. Also include invalid UTF-8 in file contents; should be able to handle binary data. Signed-off-by: Elijah Newren <newren@gmail.com>	6 years ago
Elijah Newren	0ca3988953	filter-repo: specify sorting order in greater detail The sorting order of entries written to files in the analysis directory didn't specify a secondary sort, thus making the order dependent on the random-ish sorting order of dictionaries and making it inconsistent between python versions. While the secondary order didn't matter much, having a defined order makes it slightly easier to define a single testcase that can work across versions. Signed-off-by: Elijah Newren <newren@gmail.com>	6 years ago
Elijah Newren	4cb3bc3459	filter-repo: mark messages for translation Signed-off-by: Elijah Newren <newren@gmail.com>	6 years ago

1 2

71 Commits (6967fad156dacf790fc04eb668e5be7c4b8bbf75)