Make sure to consistently calculate tiebreak scores based on the
original line.
This change may not be preferable if you filter aligned tabular input on
a subset of columns using --nth. However, if we calculate length
tiebreak only on the matched components instead of the entire line, the
result can be very confusing when multiple --nth components are
specified, so let's keep it simple and consistent.
Close#926
This is a breaking change, but I believe it makes much more sense. It is
almost impossible to predict which entries will be filtered out due to
a fuzzy inverse term. You can still perform inverse-fuzzy-match by
prepending `!'` to the term.
| Token | Match type | Description |
| -------- | -------------------------- | --------------------------------- |
| `sbtrkt` | fuzzy-match | Items that match `sbtrkt` |
| `^music` | prefix-exact-match | Items that start with `music` |
| `.mp3$` | suffix-exact-match | Items that end with `.mp3` |
| `'wild` | exact-match (quoted) | Items that include `wild` |
| `!fire` | inverse-exact-match | Items that do not include `fire` |
| `!.mp3$` | inverse-suffix-exact-match | Items that do not end with `.mp3` |
- Make structs smaller
- Introduce Result struct and use it to represent matched items instead of
reusing Item struct for that purpose
- Avoid unnecessary memory allocation
- Avoid growing slice from the initial capacity
- Code cleanup
In the best case (all ascii), this reduces the memory footprint by 60%
and the response time by 15% to 20%. In the worst case (every line has
non-ascii characters), 3 to 4% overhead is observed.
When we prepend a single quote to our query in --exact mode, we are not
supposed to limit the scope of the new search to the previous
exact-match result.
Based on the patch by Matt Westcott (@mjwestcott).
But with a more conservative approach:
- Does not use linearly increasing penalties; It is agreed upon that we
should prefer matching characters at the beginnings of the words, but
it's not always clear that the relevance is inversely proportional to
the distance from the beginning.
- The approach here is more conservative in that the bonus is never
large enough to override the matchlen, so it can be thought of as the
first implicit tiebreak criterion.
- One may argue the change breaks the contract of --tiebreak, but the
judgement depends on the definition of "tie".
This change improves sort ordering for aligned tabular input.
Given the following input:
apple juice 100
apple pie 200
fzf --nth=2 will now prefer the one with pie. Before this change fzf
compared "juice " and "pie ", both of which have the same length.
I profiled fzf and it turned out that it was spending significant amount
of time repeatedly converting character arrays into Unicode codepoints.
This commit greatly improves search performance after the initial scan
by memoizing the converted results.
This commit also addresses the problem of unbounded memory usage of fzf.
fzf is a short-lived process that usually processes small input, so it
was implemented to cache the intermediate results very aggressively with
no notion of cache expiration/eviction. I still think a proper
implementation of caching scheme is definitely an overkill. Instead this
commit introduces limits to the maximum size (or minimum selectivity) of
the intermediate results that can be cached.
When fzf works in filtering mode (--filter) and sorting is disabled
(--no-sort), there's no need to block until input is complete. This
commit makes fzf print the matches on-the-fly when the following
condition is met:
--filter FILTER --no-sort [--no-tac --no-sync]
or simply:
-f FILTER +s
This removes unnecessary delay in use cases like the following:
fzf -f xxx +s | head -5
However, in this case, fzf processes the input lines sequentially, so it
cannot utilize multiple cores, which makes it slightly slower than the
previous mode of execution where filtering is done in parallel after the
entire input is loaded. If the user is concerned about the performance
problem, one can add --sync option to re-enable buffering.