Previously getTextFromBoxes would just pass the first and last three
bytes of the current and previous words when trying to detect CJK
characters (which shouldn't have spaces inserted).
However, this handling was not correct because CJK characters can be
longer than 3 bytes, and internally BaseUtil.utf8charcode doesn't ensure
that it was only given a single utf8 character (it blindly does the bit
operations on whatever length code you give it).
As a result, before this patch selections in PDF documents would have
lots of spaces stripped because getTextFromBoxes would think that almost
all characters were CJK characters.
Fixes: 6f1b70e5eb ("util.utf8: improve CJK character detection")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
This layout is far more commonly used on mobile devices, and allows for
much easier typing. The keyboard primarily functions through gestures in
the four cardinal directions to select which vowel kana to select. In
addition, users can cycle through each kana row by tapping the key
within a 2-second window (this is the equivalent to T9 input for
Japanese phone keyboards).
This also resolves the long-standing issue that the old keyboard did not
correctly handle dakuten (there was a standalone dakuten key which added
a stray dakuten mark, and the umlat mode which added dakuten to all of
the keys it could) and could not input handakuten characters at all.
In order to allow adding dakuten and cycling through the various
modifiers for the previous kana, we need to wrap the input-box (similar
to korean) but luckily we don't need any state machine magic since we
just need to modify the last character in the character buffer. However
because the tap timeout for T9-like-cycling needs to be reset after any
non-tap key we need to add some basic wrappers around a few other
input-box methods.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
A layout might want to make some specific feature configurable, so
create an addToMainMenu-like system for allowing layouts to add their
own configuration sub-menu to the keyboard configuration menu.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
This allows for InputText wrappers (namely the Japanese keyboard which
needs to be able to apply modifiers to the character before the cursor)
to nicely access the character list.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
In some cases, it's useful to be able to wrap a function and either
replace its contents entirely or have some callback be run before
calling the underlying function.
The most obvious users for this feature are the Japanese and Korean
keyboards (both of which need to wrap the inputbox methods with either
their own versions or have basic callbacks be run before the method is
executed).
This is loosely based on how busted/luassert spies work.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
It's possible for the user to have selected nothing, and trying to
operate on the nil highlight can cause confusion or crashes. This
restores the behaviour before commit 7a0e3d5e68 ("readerhighlight:
remove selected_word and use selected_text everywhere"), which missed
this case.
In addition, add some debug guards to ReaderHighlight methods which
cannot handle selected_text being nil (or at least, shouldn't be called
with selected_text being nil).
Fixes: 7a0e3d5e68 ("readerhighlight: remove selected_word and use selected_text everywhere")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
The previous version of JMdict comes from 2009 and doesn't appear to
work at all when trying to do basic lookups (likely due to some kind of
encoding problem). In addition, the license information and sourcing was
not really in line with the requirements specified by the JMdict
license. This version is far more up-to-date and also includes synonym-based
deinflection (though because KOReader has a Japanese plugin now, this is
technically not necessary).
Since there didn't exist a nicely-maintained place to download these
dictionaries (because StarDict is not widely used for Japanese
dictionaries), I've set up a personal GitHub repository where I've
hosted them. Note that we're intentionally not pinning the commit hash
because GitHub only recommends we use gh-pages for CDN purposes, and one
of the requirements of the JMdict license is that you need to be able to
update to later versions.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
If we had a local prefered AP with a higher RSSI,
we attempted to associate with it over wpa_supplicant
being already attempting to associate with its own preferred AP.
That... failed horribly.
Also adapt to the new lj-wpaclient API, fixing a few other edge-cases,
and making the whole thing slightly faster because we no longer
uselessly sleep.
And more reliable because we now actually wait for replies to our
requests.
Bump base
https://github.com/koreader/koreader-base/pull/1424
screen.
Otherwise, on the Sage, weird flash glitches may happen, depending on
what was on screen...
(e.g., there's some weird update merging shenanigans going on
despite those updates being flagged NO_MERGE...).
The second argument is a ddjvu_render_mode_t
Try to actually honor the user settings instead of enforcing COLOR
while we're there.
Fix#8376
Regression since #8250
Now that FileManager registers its UI modules in the same way as Reader,
this shouldn't be necessary but this protects us against some other app
creating a ReaderDictionary instance without having ui.languagesupport
registered properly.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
With the addition of the language support module, ReaderDictionary
tries to use modules registered with the UI instance, but the
FileManager doesn't provide key-based registration of its UI modules.
In order to allow the same code to be used by both FileManager and
Reader seamlessly, copy the :registerPlugin() method from Reader and use
it with FileManager. This will ensure any other hidden assumptions about
UI module registration are handled properly.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
- Add an icon to distinguish between page bookmarks, plain
highlights, and highlights with an added note
- Bookmark details: show both highlighted text and added note
- Bookmark list: allow filtering by type and/or by keyword
- New bookmark selection mode, to allow multiple removals
- New option: show separator line
This creates a new plugin system which hooks into a handful of reader
operations in order to allow plugins to add language-specific support
where the default reader falls short. The two hooks added are:
* During hold-without-pan taps, language plugins can modify the
selection in order to better match what users expect koreader to
highlight when selecting a single word.
The vast majority of CJK language words are more than one character,
but KOReader treats all CJK characters as a single word by default,
so adding this hook means that readers no longer need to manually
select the whole word every time they need to look something.
* During dictionary lookup, language plugins can propose alternative
candidate words to look up if the selected word could not be found in
the dictionary.
This is pretty necessary for Japanese and Korean, both of which are
highly agglutinative languages and the fuzzy searching system of
StarDict is simply not usable because often the inflection of the
word is so much longer than the dictionary form that sdcv decides to
chop off the actual word and search for the inflection (which yields
useless results).
This system is of particular interest for readers of CJK languages
(without this, looking up words using KOReader was fairly painful) but
this system is designed to be minimal and language-agnostic enough that
other languages could make use of it by creating their own plugins if
the default "whole word" highlight and fuzzy-search system doesn't match
their needs.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
This is necessary in order to allow the language support module to be
added to the menu outside of the common_settings menu table definition.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
In order to make startSdcv usable for plugins that might need to do
dictionary lookups in order to work, it is necessary to split out the
core of startSdcv into another method which returns the raw data from
sdcv.
In addition, in order to make it possible to amortise the cost of each
lookup (which could be fairly expensive) make it possible to pass
multiple words to rawSdcv at the same time. Sdcv supports passing
multiple words as arguments (which it then looks up in order and returns
a separate JSON array per line for each word) so we just need to tweak
the return style accordingly.
All of the deduplication and dummy results handling remains in startSdcv
because plugins might strongly depend on whether sdcv returned actual
results.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
There were two ways of specifing selected text for a highlight depending
on whether it was a "single word" or text selected using hold-and-pan.
In addition to being a bit more complicated than is necessary, with the
addition of the language support plugin system (where the "single word"
selected might be expanded), it makes more sense to simply use the same
logic and table structure for both cases.
The dictionary lookup special case (hold-without-pan triggering a
dictionary lookup by default) still works as before.
In addition, this patch fixes a minor inefficiency during dictionary
quick lookup -- before this patch, the highlight would be re-selected
because the quick lookup window is run concurrently and tries to fetch
ReaderHighlight.selected_text but this is set to nil immediately after
triggering the lookup. This is unnecessary because :clear() will be
called anyway when the quick pop-up closes, and so clearing this can be
left until then.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
There were a handful of cases where if there was no cached kctx there
was no fallback and several KoptInterface methods would return nil,
causing issues in various parts of KOReader (this happened with the
migration to selected_text everywhere but it's unclear how that change
caused this regression).
In any case, from a correctness perspective it makes sense to have the
corresponding fallback paths to create a new kctx if we couldn't find a
cached one.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
With the latest koreader-base update, we can now create native
selections using getTextFromXPointers. In order to make the wrapper less
annoying to use, always enable segmented selection if selections are
enabled (to match getTextFromPositions).
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
It is a bit cleaner to do all of the necessary looping over lists of
Geoms within a straight-forward Geom.boundingBox function rather than
looping over :combine every time (or reimplementing :combine in some
cases). Geom:combine can be trivially reimplemented in terms of
Geom.boundingBox as well.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Previously the CJK character detection defined only characters in the
range U+4000..U+AFFF as "CJK characters". This excludes an incredibly
large number of CJK characters within the BMP, let alone the whole two
planes dedicated to rarer CJK characters (the SIP and TIP). As a result,
a very large number of Chinese, Japanese, and Korean characters were not
detected as being CJK characters.
While slightly less elegant-looking, it is far more accurate to compute
the codepoint from the utf8 character and then see if it falls within
one of the defined CJK blocks. This is not future-proof against future
CJK ideograph extensions in future Unicode versions, but there is no
real way to accurately predict such changes so this is the best we can
do without accidentally treating characters explicitily defined as being
non-CJK in Unicode as CJK.
While we're at it, copy Lua 5.3's utf8.charpattern constant definition
so that we can more easily write utf8 iterators with string.gmatch (at
least in the interim until there is a rework of utf8 handling in
KOReader and everything is rebuilt on top of utf8proc).
Some unit tests are added for Korean and Japanese text, and the existing
unit tests needed a minor adjustment to handle the fact that
isSplittable now correctly detects CJK punctuation as a character to
compare against the forbidden split rules.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
* Use paintRect and plain colors instead of lightenRect and a weird
dimming factor.
* Update call sites to the new API
* Handle FP maths properly (i.e., floor coordinates & ceil dimensions at
the latest possible time).
* Fix border handling in the fill bar (make sure we actually honor it
when computin the x position, and that we won't overflow into it when
computing the width).
* Update docs
Reword "Purge .sdr" to "Reset settings".
When purging, remove only the known document metadata
files, and not those for a document with the same name but
a different suffix.
Bookmarks list:
- page numbers are displayed
- page bookmarks are marked with a star
- new setting: Sort by largest page number (default: checked)
New bookmark setting: Add page number / timestamp to bookmark
- If enabled (default), bookmark name is 'Page # notes @ time'.
- If disabled, bookmark name is equal to the notes field.
Rename bookmark dialog:
- page number and timestamp are displayed in the input
dialog description
- blank input renames bookmark to the default name in
accordance with the new setting
Also fix: changing boundaries of the highlight: the name of the
highlight is not changed if it was previously edited by the user.
- ButtonTable, ButtonDialog, ButtonDialogTitle
- ConfirmBox, MultiConfirmBox, SkimToWidget
- KeyboardLayoutDialog (and initially move the dialog
down to show the title in landscape mode)
- InputText's Clipboard dialog
Also: Notification: truncate long text
cores
* Only keep a single core online most of the time.
* Device: Add an enableCPUCores method to allow controlling the amount of
online CPU cores.
* Move the initial core onlining setup to Kobo:init, instead of the startup script.
* Enable two CPU cores while hinting new (e.g., cache miss) pages in PDF land.
* Enable two CPU cores while processing book metadata.
* Drive-by fix to isolate the DocCache pressure check to KoptInterface
and actually apply it when it matters most (e.g., k2pdfopt stuff).
* Make sure we have a BB to measure in getSize, in case the instance is
recycled. (fix#8241)
* nil `line_num_to_image` early in `:free`
* Hide the _renderText calls that are used across the whole module to
simply update the text layout & instantiate the inner bb behind a
wrapper function with a slightly less obscure name.
Discussion in #6409.
Highlight action renamed to Long-press on text and moved from Gear - Document to Gear - Taps and gestures.
Added action Do nothing.
Removed menu item Typeset - Highlighting - Allow highlighting.
Fixed untranslated strings in the Cycle highlight action notification.
Long-press on images always opens ImageViewer. Closes#6409.
* Gestures: standardize hold to long-press
* Common settings menu: standardize hold to long-press
* Readerstyletweak: standardize hold to long-press
* Readersearch: standardize hold to long-press
* Geom:transformByScale:
* Apply the right scaling factor to the y axis
* Round in a more sensible fashion (àla fz_round_rect, since we pretty much exclusively use it in a similar fashion).
* Bump base (https://github.com/koreader/koreader-base/pull/1407)
Also: Text editor now closes its keyboard when calling Find
and Go to line (which open their own keyboards) to avoid
conflicts between multiple keyboards.
Includes among others:
- (Upstream) Various CHM handling fixes, and others
- HTML documents: rebuild TOC from headings after load
- Font: use metrics for underline offset and thickness
- epub.css, html5.css: tweak ruby styling
- CSS: fix EPUB's head>style content encoding
- CSS: add support for 'box-sizing: content-box/border-box'
- CSS: support for styling the <html> element
Also bump KoboUSBMS to v1.2.2 and FBInk to v1.24.0.
ReaderFont's "Generate font test document": update the
generated HTML so its ToC is build from proper HTML headings.
Store list of layouts in settings file as array of enabled
layouts only (up to 4 elements). Optimize code.
Allows sorting of the abbreviations in the globe popup:
just check layouts in the desired order (the first checked
will be northeast).
Requires onetime migration to clean up the settings.
If this is not done, the URL when the file is downloaded will be
something like hostdir/path, rather than host/dir/path.
Also add a debug log to make it more clear when a bogus URL
is being fetched,
* Cleanup util.secondsFrom*Clock stuff (simpler maths, tail calls, meaningful printf tokens).
* Use util.secondsToClockDuration in ReadTimer instead of reinventing the wheel three different ways.
* Reschedule unexpired timers properly on resume (as best as we can, given the unreliable nature of REALTIME).
* Make clock timers tick on the dot, instead of at the same second as when being set.
* Speaking of clock timers, leave the math to os.date & os.time, don't reinvent the wheel yet again.
We should always unschedule suspend before scheduling it again (i.e.,
use rescheduleSuspend ;)).
Fix#8097 (many thanks to @Mel-kior for the detailed repro!)
When enabled, if the book has some supported language tag
in its metadata, use it as the source language. Otherwise,
fallback to the current settings (auto-detect or selected
source language).
Before: when holding the input box in input dialogs
for calling the Clipboard, hold release was passed to
MovableContainer and input dialog moved a little bit.
If the button text would be truncated, try to avoid
it by reducing the font size, and even switching to
a 2-lines TextBoxWidget.
TextBoxWidget: fix possible glyphs truncations when
a small line_height is used. Also avoid some bad
result from getFontSizeToFitHeight(), possible due
to some rounding errors.
* keep_dialog_open, default to false.
Set to true to keep dialog open upon pressing any button, except Cancel and dismissable tap.
* other_buttons_first, default to false.
Set to true to put other buttons above Cancel - OK row
Go to the directory of the deleted file, instead of the folder you happend to switch into the reader from as this may have changed (via changing books from history etc)
It turns out that the kernel needs a little push now that the dedicated
wifi power control module is gone ;).
Issue was only exposed if you booted KOReader while the Wi-Fi was down.
* Decode EV_KEY:KEY_BATTERY
* Input: Only drop hovering *pen* events.
There are currently too many broken 0-pressure *finger* events being
reported on the Elipsa, making a dumb rejection highly annoying.
* Bump base
https://github.com/koreader/koreader-base/pull/1393
* Rely on actual events to detect loss of contact for the "snow"
protocol.
Allows simplifying the whole thing.
* Use `ipairs` over `pairs` for pure arrays.
Make "Taps and gestures - Page turns" available only in reader.
Move there other page turn related menu items from Navigation.
Remove duplicated code. Added standard "star" for default RTL.
TileCacheItem: add created_ts property.
Document: manage a tile_cache_validity_ts and ignore
older cached tiles.
This timestamps is updated when highlights are written
as annotations in, or deleted from, the PDF, so we can
get the most current rendered bitmap from MuPDF and
avoid highlight ghosts on old tiles.
Save this timestamp in doc settings so older cached to
disk tiles will also be ignored across re-openings.
Bump base for: mupdf.lua: update frontend pboxes with
MuPDF adjusted ones.
We may get multiple boxes when selecting texts, one for each
word, and we have to add spaces between the extracted words
ourselves. Previously, we were only adding a space if the
last char of previous word was ASCII, so missing spaces
after accents or greek words.
Try to do better by measuring the distances between boxes
and comparing to box heights, with a few heuristics.
Bump base for cre.cpp cleanup and utf8proc FFI.
Add a checkbutton for case sensitive search in FileBrowser,
and use Utf8Proc.lowercase() for case insensitive search.
Also use it in ReaderUserHyph as a replacement for
crengine getLowercasedWord().
- bump crengine: findText(): add support for regular
expression search.
- bump base: add thirdparty/srell/srell.hpp, a C++ library
that provides Unicode regex support, used by crengine.
- ReaderSearch: with credocuments, add checkboxes for case
sensitive and regular expression search.
I've encountered an issue when Calibre Content Server's OPDS feed produced ``text/fb2-xml`` mimetype. Don't know if it is actually Calibre to blame, but thought this simple fix will save some poor souls' time.
- New way to hide the VirtualKeyboard: to hide the keyboard
tap any point of the screen outside the inputbox and above
the keyboard; to show the keyboard tap the inputbox.
(Removed hacky "holding the arrow-down key" which is no
longer needed).
- InputDialog windows are movable/translucent by default
- Redesign of the Clipboard dialog
get any boxes
Exposed by #7624, but we were arguably putting garbage in the Cache
before that anyway, so, it w<asn't all that great either ;p.
Fix#7850