Hyphenation rules

Users and developers helping users with generic and technical Pale Moon issues on all operating systems.

Moderator: trava90

Forum rules
This board is for technical/general usage questions and troubleshooting for the Pale Moon browser only.
Technical issues and questions not related to the Pale Moon browser should be posted in other boards!
Please keep off-topic and general discussion out of this board, thank you!
User avatar
Memophenon
Hobby Astronomer
Hobby Astronomer
Posts: 15
Joined: 2020-12-25, 13:15

Hyphenation rules

Unread post by Memophenon » 2023-03-26, 20:12

By accident, I noticed that PM (32.1.0 on Win10 64-bits) breaks hyphenated (U+002D) words differently from Firefox and Chromium browsers. PM's rules seem to be: the left part must be at least 5 characters, the right part at least 6. It is impervious for my attemps to overrule this behavior by self-defined CSS properties in a webpage like:

Code: Select all

  -moz-hyphenate-limit-chars: 5 2 2 !important;
  -webkit-hyphenate-limit-before: 2 !important;
  -webkit-hyphenate-limit-after: 2 !important;
  -webkit-hyphenate-limit-chars: 5 2 2 !important;
  hyphenate-limit-chars: 5 2 2 !important;
Soft hyphens (U+00AD) are always respected as a breakpoint, but U+2010 is treated as U+002D, that is, according to the 12-5-6 rule.

Is it possible, as user, to change the settings of PM in this respect? For clarity, I'm not talking about any knowledge of the vocabulary of the language on hand. It's just about dealing with hyphen-like Unicode characters.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35484
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Re: Hyphenation rules

Unread post by Moonchild » 2023-03-26, 20:58

Hyphenation is not configurable in Pale Moon. We do not listen to CSS keywords trying to manipulate this.

Our algorithm is basically the same as the algorithm used in TeX (Knuth), IIRC.

We are neither Firefox nor Chromium; why do you expect our behaviour to be the same? And why does it matter to you exactly how words are hyphenated by different browsers?
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

User avatar
Memophenon
Hobby Astronomer
Hobby Astronomer
Posts: 15
Joined: 2020-12-25, 13:15

Re: Hyphenation rules

Unread post by Memophenon » 2023-03-26, 22:08

Oh, it's not a matter that keeps me awake. And for my part and notwithstanding Knuth's preferences not necessarily being mine, it's OK as it is, as word-breaking will never be perfect in all circumstances.

To answer your questions: (1) I expect PM to behave the same, because in most cases it does, on the face of it. (2) When designing webpages (on PM) to be seen by other people, I also pay attention to the details of how my contraptions are rendered by the 'usual' browsers (Ff, Chr, Saf). So I noticed the difference. Just that.

User avatar
Thad E G
Moongazer
Moongazer
Posts: 8
Joined: 2022-10-23, 10:38

Re: Hyphenation rules

Unread post by Thad E G » 2023-03-26, 22:45

As a once phototypesetter who took a pride in what I did, I do value the traditional rules and guidance, which developed out of aesthetics and concern for readability. I don't really remember much now, but seem to remember not breaking proper nouns, not leaving or carrying over less than three letters, and never changing the sense of a word: eg don't hyphenate "the-rapist."

It all went down the pan when writers started doing their own thing in word processors and levels of subeditors where thrown on the heap. One learns to live with the result. It's not a hill that I'll die on --- but it is nice to see people caring about it.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35484
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Re: Hyphenation rules

Unread post by Moonchild » 2023-03-26, 23:13

Memophenon wrote:
2023-03-26, 22:08
I expect PM to behave the same, because in most cases it does, on the face of it.
There are many ways in which Pale Moon does not behave the same. Chromium is not some kind of "golden standard" (far from it, actually, it does plenty of things objectively wrong or undesirable) and unlike Google-funded Mozilla, we don't necessarily always "do what they do". If that was our ultimate goal, then we may just as well fold today. :coffee:
Memophenon wrote:
2023-03-26, 22:08
When designing webpages (on PM) to be seen by other people, I also pay attention to the details of how my contraptions are rendered by the 'usual' browsers
Well it's good to know you're actually designing on Pale Moon! It makes perfect sense to also verify against the usual suspects, of course.
Thad E G wrote:
2023-03-26, 22:45
writers started doing their own thing in word processors and levels of subeditors where thrown on the heap
Unfortunately it seems, in this case of hyphenation in web browsers, more of that is going on with the introduced CSS keywords that would override traditional rules.

Either way, I don't see any reason to deviate from what we currently have. It works, it works well, and it follows (very) consistent rules even for exceptions in other languages, as far as I can tell. I'm not sure why another steering wheel would be necessary in an already obscenely sprawling and complex set of standards.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

User avatar
Memophenon
Hobby Astronomer
Hobby Astronomer
Posts: 15
Joined: 2020-12-25, 13:15

Re: Hyphenation rules

Unread post by Memophenon » 2023-03-27, 14:28

Thad E G wrote:
2023-03-26, 22:45
not leaving or carrying over less than three letters
These settings look more reasonable than 5/6. However, my paradise may be someone else's hell. My use case is a bit special: bilingual webpages (Dutch & English). So I've set hyphen: manual; and soft-hyphenate long words like programmeertaalonderzoek (= programming language research) manually as programmeertaal[SHY]onderzoek or even programmeer[SHY]taal[SHY]onderzoek. I have become aware now that I could discriminate in favour of Dutch by making the hyphens property dependent on the language used in a text fragment by classifying one of them as "dutch" (auto) or "english" (manual). I simply haven't thought of such micromanagement so far. First objection: who guarantees all browsers will listen to hyphen: auto;?

Now that the whole thing has caught my attention, I did some experiments with the word programmeertaal. It can be hyphenated at 3 places: pro|gram|meer|taal. The last one is perfectly fine, the others are correct but debatable from the readability point of view. I tried these four test cases with <html language="nl"> in Pale Moon, Firefox, Chrome on Windows, and Safari on macOS (HYMIN = U+002D, HY = U+2010, SHY = U+00AD):
  1. programmeertaal
  2. programmeer[HYMIN]taal
  3. programmeer[HY]taal
  4. programmeer[SHY]taal
(Sub 2 & 3: programmeertaal should never be written as programmeer-taal on one line, I only did this for the sake of experimentation.) Results with hyphens: auto;:
  1. PM, Ff, Chr: pro-/grammeertaal, program-/meertaal, programmeer-/taal; Saf: (no breaks)
  2. PM: pro-/grammeer-taal, program-/meer-taal; Ff, Saf: programmeer-/taal; Chr: pro-/grammeertaal, program-/meer-taal, programmeer-/taal
  3. PM: pro-/grammeer-taal, program-/meer-taal; Ff, Saf: programmeer-/taal; Chr: pro-/grammeertaal, program-/meer-taal, programmeer-/taal
  4. PM, Chr, Saf: programmeer-/taal; Ff: program-/meertaal, programmeer-/taal (and why not pro-/grammeertaal then?)
Results with hyphens: manual;:
  1. All: (no breaks)
  2. PM: (no breaks); Ff, Chr, Saf: programmeer-/taal
  3. PM: (no breaks); Ff, Chr, Saf: programmeer-/taal
  4. All: programmeer-/taal
Chromium and Safari are willing to split a T-shirt, BTW.

https://www.w3schools.com/cssref/css3_pr_hyphens.php describes hyphens: manual; as: "Default. Words are only hyphenated at &hyphen; or &shy; (if needed)". I have a lot of thoughts now after this tiny investigation, but I'll just say: "All browsers are different."

Locked