PM doesn't switch encoding when opening utf-8 text file

tijara · Unread post by **tijara** » 2021-06-22, 21:41

Perhaps this is my fault (as always).
I can reproduce this so:
1) I make or grab a Unicode encoded text file (usually something with Chinese characters)
2) I verify the encoding with

Code: Select all

$ file -bi file.txt 
text/plain; charset=utf-8

3) Open it raw in PM via CTRL+O
Ideally, PM should recognize the encoding and switch from Western to utf-8.
But PM stays on Western and displays garbled text.
Switching the encoding via the View menu displays the file correctly.
Is this intended?

Unread post by **RealityRipple** » 2021-06-22, 23:02

Use the UTF-8 BOM (Byte Order Mark) encoding or save option in your text editor (or insert 0xEF 0xBB 0xBF at the beginning of your file) to mark the file as UTF-8. Pale Moon will detect it.

Unread post by **Moonchild** » 2021-06-23, 02:08

Pale Moon is a web browser, not a file browser.
Pale Moon will normally rely on http headers that indicate the encoding of a file. Obviously with a local plain text file those headers don't exist if there is no BOM (as RealityRipple pointed out).

Without such meta information available a file viewer will have to guess.

Many text file editors use some brute-force logic to detect that a file is UTF-8 (e.g. by scanning for extended unicode characters in the first x bytes) if there is no start-of-file encoding signature. Since Pale Moon's normal operation is on the web where things like these are always having meta data attached in headers (or in the case of HTML in a html header <meta> tag), it doesn't try to brute force detect but rather uses a default encoding setting for its language (which will be US_ASCII most likely, i.e. Western script code page)

If you would look at the console it would even tell you so.

The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature.

tijara · Unread post by **tijara** » 2021-06-23, 14:44

I always learn something new here

Thanks to both of you.

Pale Moon forum

PM doesn't switch encoding when opening utf-8 text file Topic is solved

PM doesn't switch encoding when opening utf-8 text file

Re: PM doesn't switch encoding when opening utf-8 text file

Re: PM doesn't switch encoding when opening utf-8 text file

Re: PM doesn't switch encoding when opening utf-8 text file