Legacy Web - Document character encoding

Users and developers helping users with generic and technical Pale Moon issues on all operating systems.

Moderator: trava90

Forum rules
This board is for technical/general usage questions and troubleshooting for the Pale Moon browser only.
Technical issues and questions not related to the Pale Moon browser should be posted in other boards!
Please keep off-topic and general discussion out of this board, thank you!
User avatar
Shadow
Moon lover
Moon lover
Posts: 80
Joined: 2023-03-16, 13:21

Legacy Web - Document character encoding

Unread post by Shadow » 2024-09-16, 11:08

Didn't know whether I should post this or not but figured why not. :wave:

Operating system: Win 7
Browser version: 33.3.1
32-bit or 64-bit browser?: 64
Problem URL: http://solitonfilm.web.fc2.com/festj.htm

PM/Bask: Garbled

"The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol."

FF: Correctly guess renders

"The character encoding of the document was not declared, so the encoding was guessed from content. The character encoding needs to be declared in the Content-Type HTTP header, using a meta tag, or using a byte order mark."

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 37636
Joined: 2011-08-28, 17:27
Location: Motala, SE

Re: Legacy Web - Document character encoding

Unread post by Moonchild » 2024-09-16, 15:34

The character encoding of the page must be declared in the document or in the transfer protocol.
If the webmaster doesn't do that, then one can expect incorrect display.
If you know what character set is supposed to be used, then you can select it from Web Developer -> Character encoding to fix the display.
"A dead end street is a place to turn around and go into a new direction" - Anonymous
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

User avatar
satrow
Forum staff
Forum staff
Posts: 1925
Joined: 2011-09-08, 11:27

Re: Legacy Web - Document character encoding

Unread post by satrow » 2024-09-16, 16:08

It looks like it's one of the Japanese char sets; the page name ends in 'j' but there's no 'e' for English equivalent, cf. http://solitonfilm.web.fc2.com/anhomej.htm to http://solitonfilm.web.fc2.com/anhomee.htm

vannilla
Moon Magic practitioner
Moon Magic practitioner
Posts: 2401
Joined: 2018-05-05, 13:29

Re: Legacy Web - Document character encoding

Unread post by vannilla » 2024-09-16, 16:30

View -> Character Encoding -> Japanese (Shift_JIS).
satrow is correct; the specific encoding for this website is Shift JIS.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 37636
Joined: 2011-08-28, 17:27
Location: Motala, SE

Re: Legacy Web - Document character encoding

Unread post by Moonchild » 2024-09-16, 17:12

So the webmaster could fix this with a META (the easiest way), which was most definitely a thing back in 2001 when that page was created:

Code: Select all

<html lang="ja">
<head>
<meta charset="Shift_JIS">
</head>
... rest of page
</html>
, or a web server response header indicating the charset.

Code: Select all

Content-Type: text/html; charset=Shift_JIS
Of note this MUST be done because HTTP/1.1 (the protocol used) explicitly sets the default charset to ISO-8859-1 (Western, roman).
It's been good practice in browsers to default to the browser iuser's preferred encoding set in the browser options, though, slightly departing from that hard default. But it still remains that there can only be one default.

From the W3C page about this:
Documents transmitted with HTTP that are of type text, such as text/html, text/plain, etc., can send a charset parameter in the HTTP header to specify the character encoding of the document.

It is very important to always label Web documents explicitly. HTTP 1.1 says that the default charset is ISO-8859-1. But there are too many unlabeled documents in other encodings, so browsers use the reader's preferred encoding when there is no explicit charset parameter.
Alternatively, the webmaster can also just convert everything to UTF-8 (there are handy scripts available for this everywhere) and avoid this altogether. Still recommended to add a charset in the page/headers to be explicit, but UTF-8 is very broadly accepted everywhere and supports all languages.
"A dead end street is a place to turn around and go into a new direction" - Anonymous
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

User avatar
Shadow
Moon lover
Moon lover
Posts: 80
Joined: 2023-03-16, 13:21

Re: Legacy Web - Document character encoding

Unread post by Shadow » 2024-09-17, 10:27

Found there's a FF "equivalent" like option available.

Is it considered safe to leave "Text Encoding/Character Encoding (Name differs in PM/Bask) > Auto-Detect > Japanese" on?

User avatar
moonbat
Knows the dark side
Knows the dark side
Posts: 5583
Joined: 2015-12-09, 15:45

Re: Legacy Web - Document character encoding

Unread post by moonbat » 2024-09-18, 00:39

There's nothing unsafe about it, at the most you will get garbled text on some other site if it autodetects to Japanese but if it works on this one give it a try.
"One hosts to look them up, one DNS to find them and in the darkness BIND them."

Image
KDE Neon on a Slimbook Excalibur (Ryzen 7 8845HS, 64 GB RAM)
AutoPageColor|PermissionsPlus|PMPlayer|Pure URL|RecordRewind|TextFX