Japanese fonts appearing as garbage on WayBack Machine... Topic is solved

Users and developers helping users with generic and technical Pale Moon issues on all operating systems.

Moderator: trava90

Forum rules
This board is for technical/general usage questions and troubleshooting for the Pale Moon browser only.
Technical issues and questions not related to the Pale Moon browser should be posted in other boards!
Please keep off-topic and general discussion out of this board, thank you!
Sessh
Fanatic
Fanatic
Posts: 140
Joined: 2018-01-11, 18:43

Japanese fonts appearing as garbage on WayBack Machine...

Unread post by Sessh » 2023-02-05, 19:06

I am using the most recent PM version (I always update as soon as it comes in) on Windows 7.

(The below site doesn't always like to load the first or second time for me, but will eventually)

https://web.archive.org/web/20050106180 ... Index.HTML

This site here which is basically for a very old video game and it displays the untranslated Japanese texts from the game as you go through the pages, however all the Japanese fonts appear as garbage like this:
Untitled2.jpg
... however in Firefox Nightly and a Chromium-based browser, they appear normally:
Untitled1.jpg
That's just a cropped section of the title page, you can click through it a bit and see a bunch of garbage text instead of the proper Japanese characters. I saw another font related thread here, but didn't know if it was related so I just made a new thread.

Also interesting that I do Japanese lessons in Pale Moon and Jap text shows up flawlessly everywhere else. I can type Japanese text in without issue either. This is the first site I've seen have this issue.

Is it just me or is there something I need to tweak? This also happens with offline viewing as I downloaded this particular site for that purpose, but even loading offline from HDD has the same issue. Even the title of the page on the tab is a garbled mess. Just a font issue that can't be fixed at this time?

Thanks.

User avatar
Mæstro
Lunatic
Lunatic
Posts: 459
Joined: 2019-08-13, 00:30
Location: Casumia

Re: Japanese fonts appearing as garbage on WayBack Machine...

Unread post by Mæstro » 2023-02-05, 20:10

I can confirm that the same nonsense text appears running Pale Moon 32 on Linux. What follows is my speculation, likely quite wrong, meant to help someone more learnt to diagnose.

Many older Japanese sites use Shift JIS or other encodings for Japanese which antedate Unicode and have since become obsolete. My experience with old, Japanese sites which are still on-line has never had one render like this, but I suspect that the Archive applies UTF-8 encoding to all sites, including those on the Wayback Machine. Something similar has happened, in my experience, on some old, live (not archived) Russian sites with English text, where non-ASCII characters (curved quotation marks and diacritics especially) would render falsely; I suspect this has the same cause.
Browser: Pale Moon (Pusser’s repository for Debian)
Operating System: Linux Mint Debian Edition 4 (amd64)
※Receiving Debian 10 LTS security upgrades
Hardware: HP Pavilion DV6-7010 (1400 MHz, 6 GB)
Formerly user TheRealMaestro: æsc is the best letter.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35402
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Re: Japanese fonts appearing as garbage on WayBack Machine...

Unread post by Moonchild » 2023-02-05, 20:38

The console can be your friend sometimes:

Code: Select all

21:23:58.841 The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol. 1 SF3ConversationIndex.HTML
No character set declaration means the browser has to guess what to use. The default fallback encoding is whatever is default for the browser's locale. Since the Browser's locale tends to be en-US and the page hasn't specified a lang property either, it will fall back to en-US.

You can change character encoding by going into Web Developer, and under Character Encoding (once the page has loaded) select the appropriate entry (in this case Japanese Shift-JIS according to the headers archive.org added?)

The way to fix this would be for the site to declare a character encoding.

What mainstream browsers have done is adding a character set encoding detector based on large volume training data (guess who pioneered that...?) basically scanning content for non-ASCII then guessing which language it belongs to.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

Sessh
Fanatic
Fanatic
Posts: 140
Joined: 2018-01-11, 18:43

Re: Japanese fonts appearing as garbage on WayBack Machine...

Unread post by Sessh » 2023-02-05, 20:53

Haha, thanks, the one time I don't look in the console is when I really should have. I did look in the Prefs / Content section, but I didn't look in Advanced and probably wouldn't have guessed to change that setting.

Thanks! Problem appears solved.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35402
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Re: Japanese fonts appearing as garbage on WayBack Machine...

Unread post by Moonchild » 2023-02-05, 21:14

I just updated my post. it's probably easier to just adjust it on-the-fly from web developer if you regularly use different language encodings, instead of setting the default in prefs all the time.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

Sessh
Fanatic
Fanatic
Posts: 140
Joined: 2018-01-11, 18:43

Re: Japanese fonts appearing as garbage on WayBack Machine...

Unread post by Sessh » 2023-02-05, 21:25

That does work as well, but I have to reset it every time I reload the page or navigate through the site. If there isn't a way to save that preference to just stay set to that for this tab or something, it's probably better to set it in the Content tab. At least I'll know what the issue is if I suddenly get a site rendering in all Japanese that isn't supposed to in which case the Developer Tools > Character Encoding method would be a corrective measure in that situation. I suppose it depends how often that would happen when it's not supposed to, so I guess we'll see.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35402
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Re: Japanese fonts appearing as garbage on WayBack Machine...

Unread post by Moonchild » 2023-02-05, 21:30

Yup, it depends entirely on how often you end up on legacy content sites of differing language encodings. I'm guessing you'd be less likely to visit legacy Cyrillic or Tamil sites without character encoding if you frequent Japanese ones ;)
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

User avatar
jobbautista9
Keeps coming back
Keeps coming back
Posts: 780
Joined: 2020-11-03, 06:47
Location: Philippines
Contact:

Re: Japanese fonts appearing as garbage on WayBack Machine...

Unread post by jobbautista9 » 2023-02-06, 02:25

Btw if you have the menu bar enabled, the Character Encoding submenu is in the View menu.
Image

merry mimas

XUL add-ons developer. You can find a list of add-ons I manage at http://rw.rs/~job/software.html.

Mima avatar by 絵虎. Pixiv post: https://www.pixiv.net/en/artworks/15431817

Image

Locked