PM doesn't switch encoding when opening utf-8 text file Topic is solved

Users and developers helping users with generic and technical Pale Moon issues on all operating systems.

Moderator: trava90

Forum rules
This board is for technical/general usage questions and troubleshooting for the Pale Moon browser only.
Technical issues and questions not related to the Pale Moon browser should be posted in other boards!
Please keep off-topic and general discussion out of this board, thank you!
User avatar
tijara
Moonbather
Moonbather
Posts: 70
Joined: 2019-12-01, 15:25

PM doesn't switch encoding when opening utf-8 text file

Unread post by tijara » 2021-06-22, 21:41

Perhaps this is my fault (as always).
I can reproduce this so:
1) I make or grab a Unicode encoded text file (usually something with Chinese characters)
2) I verify the encoding with

Code: Select all

$ file -bi file.txt 
text/plain; charset=utf-8
3) Open it raw in PM via CTRL+O
Ideally, PM should recognize the encoding and switch from Western to utf-8.
But PM stays on Western and displays garbled text.
Switching the encoding via the View menu displays the file correctly.
Is this intended?

User avatar
RealityRipple
Astronaut
Astronaut
Posts: 659
Joined: 2018-05-17, 02:34
Location: Los Berros Canyon, California
Contact:

Re: PM doesn't switch encoding when opening utf-8 text file

Unread post by RealityRipple » 2021-06-22, 23:02

Use the UTF-8 BOM (Byte Order Mark) encoding or save option in your text editor (or insert 0xEF 0xBB 0xBF at the beginning of your file) to mark the file as UTF-8. Pale Moon will detect it.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35593
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Re: PM doesn't switch encoding when opening utf-8 text file

Unread post by Moonchild » 2021-06-23, 02:08

Pale Moon is a web browser, not a file browser.
Pale Moon will normally rely on http headers that indicate the encoding of a file. Obviously with a local plain text file those headers don't exist if there is no BOM (as RealityRipple pointed out).

Without such meta information available a file viewer will have to guess.

Many text file editors use some brute-force logic to detect that a file is UTF-8 (e.g. by scanning for extended unicode characters in the first x bytes) if there is no start-of-file encoding signature. Since Pale Moon's normal operation is on the web where things like these are always having meta data attached in headers (or in the case of HTML in a html header <meta> tag), it doesn't try to brute force detect but rather uses a default encoding setting for its language (which will be US_ASCII most likely, i.e. Western script code page)

If you would look at the console it would even tell you so.
The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

User avatar
tijara
Moonbather
Moonbather
Posts: 70
Joined: 2019-12-01, 15:25

Re: PM doesn't switch encoding when opening utf-8 text file

Unread post by tijara » 2021-06-23, 14:44

I always learn something new here :D
Thanks to both of you.

Locked