33.2.0 bug with Cyrillic text and digits

Talk about code development, features, specific bugs, enhancements, patches, and similar things.
Forum rules
Please keep everything here strictly on-topic.
This board is meant for Pale Moon source code development related subjects only like code snippets, patches, specific bugs, git, the repositories, etc.

This is not for tech support! Please do not post tech support questions in the "Development" board!
Please make sure not to use this board for support questions. Please post issues with specific websites, extensions, etc. in the relevant boards for those topics.

Please keep things on-topic as this forum will be used for reference for Pale Moon development. Expect topics that aren't relevant as such to be moved or deleted.
_yuyu_
Lunatic
Lunatic
Posts: 262
Joined: 2015-03-02, 14:18

33.2.0 bug with Cyrillic text and digits

Unread post by _yuyu_ » 2024-06-24, 23:34

In Pale Moon 33.2.0 on Windows when a series of digits occurs in a Cyrillic text, those digits cannot be highlighted individually, but only as a whole block. For example: абвгд 12345 иклмн
That was not the case with PM 33.1.1.

User avatar
RealityRipple
Keeps coming back
Keeps coming back
Posts: 853
Joined: 2018-05-17, 02:34
Location: Los Berros Canyon, California

Re: 33.2.0 bug with Cyrillic text and digits

Unread post by RealityRipple » 2024-06-25, 00:44

Sounds like it's related to the changes in UXP PR#2514, since that dealt with preventing multi-character sequence "clusters" from being split up, but I don't know why standard Cyrillic characters would trigger it. There is a known problem with some emoji characters not being detected due to the underlying Unicode (ICU) version, but I don't think it would mess up so badly as to mark any of the Cyrillic set as part of the various Emoji and Pictographic lists.

Currently, I'm running into issues just trying to get a working build (some Visual Studio update baloney) but I'll try to do some tests and see if I can narrow this down.

Tested. Yes, this seems to be caused by the latest Emoji-related changes. I'm narrowing the scope of the problem down, comparing FF behavior in some areas, and determining the best course of action.

Firefox does suffer from the same bug, but only with the skin tone set, which is the way their IsClusterExtender() function works. Apparently, so does Chrome. And Notepad in Windows 11. I'm detecting a theme.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 37494
Joined: 2011-08-28, 17:27
Location: Motala, SE

Re: 33.2.0 bug with Cyrillic text and digits

Unread post by Moonchild » 2024-06-25, 08:55

Off-topic:
RealityRipple wrote:
2024-06-25, 00:44
only with the skin tone set
If you ask me, this whole skin tone modifier stuff is stupid and unnecessary. Aside from creating racial profiling by people using certain emoji, which I'm sure was the opposite of what was intended when people pushed hard for being "inclusive" in their emoji sets, demanding human emoji get their skin colour instead of the generic, accepted, banana yellow for all, why would this be necessary to express anything?

User avatar
RealityRipple
Keeps coming back
Keeps coming back
Posts: 853
Joined: 2018-05-17, 02:34
Location: Los Berros Canyon, California

Re: 33.2.0 bug with Cyrillic text and digits

Unread post by RealityRipple » 2024-06-25, 11:33

Moonchild wrote:
2024-06-25, 08:55
Off-topic:
RealityRipple wrote:
2024-06-25, 00:44
only with the skin tone set
If you ask me, this whole skin tone modifier stuff is stupid and unnecessary. Aside from creating racial profiling by people using certain emoji, which I'm sure was the opposite of what was intended when people pushed hard for being "inclusive" in their emoji sets, demanding human emoji get their skin colour instead of the generic, accepted, banana yellow for all, why would this be necessary to express anything?
I completely agree, but Unicode has decided to make everyone's life hell instead.

Also, long-winded, overly verbose issue with a quick solution, or a better-than-other-browsers-but-needs-a-little-research solution: https://repo.palemoon.org/MoonchildProductions/UXP/issues/2538.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 37494
Joined: 2011-08-28, 17:27
Location: Motala, SE

Re: 33.2.0 bug with Cyrillic text and digits

Unread post by Moonchild » 2024-06-25, 11:55

Thanks for making that extensive analysis and write-up. See my comment on it to answer your questions there, and keep implementation discussion on the repo.

SemiKebab
Moonbather
Moonbather
Posts: 58
Joined: 2021-05-30, 03:48

Re: 33.2.0 bug with Cyrillic text and digits

Unread post by SemiKebab » 2024-06-28, 12:07

Another problem, I am encountering issues in a textearea on Wikipedia, where in some cases, I cannot properly navigate/select the characters. It behaves a bit like RTL text (the caret "jumps" to the opposite side), but it's even impossible to do the precise text selection I want.

It seems to occur around sequences of digits, and if the « ↗ » character is present in the textarea (but depending on the spaces/newlines around, the bug happens or not).

I strongly guess this bug is related to the recent changes about handling of emojis. Though, I suppose you already have better test cases than the one I reported here.

SemiKebab
Moonbather
Moonbather
Posts: 58
Joined: 2021-05-30, 03:48

Re: 33.2.0 bug with Cyrillic text and digits

Unread post by SemiKebab » 2024-06-28, 12:20

RealityRipple, thank you for the great work you have made in issue #2538 and PR 2539 about this!

As a side note, about this comment:
Moonchild wrote: (not that I think filenames with emoji in them is in any way sane, but that's something for a different discussion...)
Just to provide an example, creators on YouTube often put emojis in their videos' titles. And indeed, I dislike this too, it causes many issues (for example, with such filenames on Android phone).

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 37494
Joined: 2011-08-28, 17:27
Location: Motala, SE

Re: 33.2.0 bug with Cyrillic text and digits

Unread post by Moonchild » 2024-06-28, 12:41

The fact that people use them in filenames and that YouTube is too dumb to translate or sanitize video titles to file names doesn't make the practice any more sane.