This primarily consists of some spies added to ensure that the
LanguageSupport plugin is actually being called at the right time.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
This plugin provides support for Japanese deinflection during lookup as
well as making long-hold word selection actually select whole words
properly. With this plugin, word lookups in Japanese text in KOReader
become much easier, and no longer requires users to use special
dictionaries that have synonym-based deinflection rules defined (which
were always fairly annoying to use).
The basic idea and deinflection data for this plugin come from
Yomichan (which is also a GPL-3.0+ project), but everything was
implemented specifically for KOReader.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
This creates a new plugin system which hooks into a handful of reader
operations in order to allow plugins to add language-specific support
where the default reader falls short. The two hooks added are:
* During hold-without-pan taps, language plugins can modify the
selection in order to better match what users expect koreader to
highlight when selecting a single word.
The vast majority of CJK language words are more than one character,
but KOReader treats all CJK characters as a single word by default,
so adding this hook means that readers no longer need to manually
select the whole word every time they need to look something.
* During dictionary lookup, language plugins can propose alternative
candidate words to look up if the selected word could not be found in
the dictionary.
This is pretty necessary for Japanese and Korean, both of which are
highly agglutinative languages and the fuzzy searching system of
StarDict is simply not usable because often the inflection of the
word is so much longer than the dictionary form that sdcv decides to
chop off the actual word and search for the inflection (which yields
useless results).
This system is of particular interest for readers of CJK languages
(without this, looking up words using KOReader was fairly painful) but
this system is designed to be minimal and language-agnostic enough that
other languages could make use of it by creating their own plugins if
the default "whole word" highlight and fuzzy-search system doesn't match
their needs.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
This is necessary in order to allow the language support module to be
added to the menu outside of the common_settings menu table definition.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
In order to make startSdcv usable for plugins that might need to do
dictionary lookups in order to work, it is necessary to split out the
core of startSdcv into another method which returns the raw data from
sdcv.
In addition, in order to make it possible to amortise the cost of each
lookup (which could be fairly expensive) make it possible to pass
multiple words to rawSdcv at the same time. Sdcv supports passing
multiple words as arguments (which it then looks up in order and returns
a separate JSON array per line for each word) so we just need to tweak
the return style accordingly.
All of the deduplication and dummy results handling remains in startSdcv
because plugins might strongly depend on whether sdcv returned actual
results.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
There were two ways of specifing selected text for a highlight depending
on whether it was a "single word" or text selected using hold-and-pan.
In addition to being a bit more complicated than is necessary, with the
addition of the language support plugin system (where the "single word"
selected might be expanded), it makes more sense to simply use the same
logic and table structure for both cases.
The dictionary lookup special case (hold-without-pan triggering a
dictionary lookup by default) still works as before.
In addition, this patch fixes a minor inefficiency during dictionary
quick lookup -- before this patch, the highlight would be re-selected
because the quick lookup window is run concurrently and tries to fetch
ReaderHighlight.selected_text but this is set to nil immediately after
triggering the lookup. This is unnecessary because :clear() will be
called anyway when the quick pop-up closes, and so clearing this can be
left until then.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
There were a handful of cases where if there was no cached kctx there
was no fallback and several KoptInterface methods would return nil,
causing issues in various parts of KOReader (this happened with the
migration to selected_text everywhere but it's unclear how that change
caused this regression).
In any case, from a correctness perspective it makes sense to have the
corresponding fallback paths to create a new kctx if we couldn't find a
cached one.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
With the latest koreader-base update, we can now create native
selections using getTextFromXPointers. In order to make the wrapper less
annoying to use, always enable segmented selection if selections are
enabled (to match getTextFromPositions).
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
It is a bit cleaner to do all of the necessary looping over lists of
Geoms within a straight-forward Geom.boundingBox function rather than
looping over :combine every time (or reimplementing :combine in some
cases). Geom:combine can be trivially reimplemented in terms of
Geom.boundingBox as well.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Previously the CJK character detection defined only characters in the
range U+4000..U+AFFF as "CJK characters". This excludes an incredibly
large number of CJK characters within the BMP, let alone the whole two
planes dedicated to rarer CJK characters (the SIP and TIP). As a result,
a very large number of Chinese, Japanese, and Korean characters were not
detected as being CJK characters.
While slightly less elegant-looking, it is far more accurate to compute
the codepoint from the utf8 character and then see if it falls within
one of the defined CJK blocks. This is not future-proof against future
CJK ideograph extensions in future Unicode versions, but there is no
real way to accurately predict such changes so this is the best we can
do without accidentally treating characters explicitily defined as being
non-CJK in Unicode as CJK.
While we're at it, copy Lua 5.3's utf8.charpattern constant definition
so that we can more easily write utf8 iterators with string.gmatch (at
least in the interim until there is a rework of utf8 handling in
KOReader and everything is rebuilt on top of utf8proc).
Some unit tests are added for Korean and Japanese text, and the existing
unit tests needed a minor adjustment to handle the fact that
isSplittable now correctly detects CJK punctuation as a character to
compare against the forbidden split rules.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
* Use paintRect and plain colors instead of lightenRect and a weird
dimming factor.
* Update call sites to the new API
* Handle FP maths properly (i.e., floor coordinates & ceil dimensions at
the latest possible time).
* Fix border handling in the fill bar (make sure we actually honor it
when computin the x position, and that we won't overflow into it when
computing the width).
* Update docs
Reword "Purge .sdr" to "Reset settings".
When purging, remove only the known document metadata
files, and not those for a document with the same name but
a different suffix.
Bookmarks list:
- page numbers are displayed
- page bookmarks are marked with a star
- new setting: Sort by largest page number (default: checked)
New bookmark setting: Add page number / timestamp to bookmark
- If enabled (default), bookmark name is 'Page # notes @ time'.
- If disabled, bookmark name is equal to the notes field.
Rename bookmark dialog:
- page number and timestamp are displayed in the input
dialog description
- blank input renames bookmark to the default name in
accordance with the new setting
Also fix: changing boundaries of the highlight: the name of the
highlight is not changed if it was previously edited by the user.