2
0
mirror of https://github.com/koreader/koreader synced 2024-11-04 12:00:25 +00:00
koreader/frontend/document
Aleksa Sarai 5709b4c2f1
kopt: correctly handle CJK character detection for space insertion (#8438)
Previously getTextFromBoxes would just pass the first and last three
bytes of the current and previous words when trying to detect CJK
characters (which shouldn't have spaces inserted).

However, this handling was not correct because CJK characters can be
longer than 3 bytes, and internally BaseUtil.utf8charcode doesn't ensure
that it was only given a single utf8 character (it blindly does the bit
operations on whatever length code you give it).

As a result, before this patch selections in PDF documents would have
lots of spaces stripped because getTextFromBoxes would think that almost
all characters were CJK characters.

Fixes: 6f1b70e5eb ("util.utf8: improve CJK character detection")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2021-11-11 16:09:05 +01:00
..
canvascontext.lua add hasSystemFonts device property (#7535) 2021-04-19 09:04:31 +02:00
credocument.lua credocument: update getTextFromXPointers wrapper to support selections 2021-10-23 15:49:54 +02:00
djvudocument.lua Kobo/Elipsa: More fine-grained control over the amount of online CPU 2021-09-25 02:47:06 +02:00
doccache.lua DocCache: Only compute cache size once 2021-09-12 00:30:16 +02:00
document.lua Kobo/Elipsa: More fine-grained control over the amount of online CPU 2021-09-25 02:47:06 +02:00
documentregistry.lua DocumentRegistry: Downgrade refcount warnings to debug logging. 2021-05-21 01:58:00 +02:00
koptinterface.lua kopt: correctly handle CJK character detection for space insertion (#8438) 2021-11-11 16:09:05 +01:00
pdfdocument.lua Kobo/Elipsa: More fine-grained control over the amount of online CPU 2021-09-25 02:47:06 +02:00
picdocument.lua DocumentRegistry: Downgrade refcount warnings to debug logging. 2021-05-21 01:58:00 +02:00
tilecacheitem.lua PDF written highlights: fix boxes, trash cached tiles 2021-07-20 15:19:59 +02:00