mirror of
https://github.com/koreader/koreader
synced 2024-11-04 12:00:25 +00:00
5709b4c2f1
Previously getTextFromBoxes would just pass the first and last three
bytes of the current and previous words when trying to detect CJK
characters (which shouldn't have spaces inserted).
However, this handling was not correct because CJK characters can be
longer than 3 bytes, and internally BaseUtil.utf8charcode doesn't ensure
that it was only given a single utf8 character (it blindly does the bit
operations on whatever length code you give it).
As a result, before this patch selections in PDF documents would have
lots of spaces stripped because getTextFromBoxes would think that almost
all characters were CJK characters.
Fixes:
|
||
---|---|---|
.. | ||
canvascontext.lua | ||
credocument.lua | ||
djvudocument.lua | ||
doccache.lua | ||
document.lua | ||
documentregistry.lua | ||
koptinterface.lua | ||
pdfdocument.lua | ||
picdocument.lua | ||
tilecacheitem.lua |