I created a Python 3 port of UXP out of boredom...

Discussions about the development and maturation of the platform code (UXP).
Warning: may contain highly-technical topics.

Moderators: trava90, athenian200

User avatar
athenian200
Contributing developer
Contributing developer
Posts: 1682
Joined: 2018-10-28, 19:56
Location: Georgia

Re: I created a Python 3 port of UXP out of boredom...

Post by athenian200 » 2026-04-02, 22:43

__Sandra__ wrote:
2026-04-02, 19:23
This patch solved the problem of stopping the build, but the "powershell.exe not find" message still appears.
So we at least got past the immediate str/bytes issues, that's good. It was kind of a rushed patch because I was trying to get ready for company arriving at noon and didn't have much time... glad it was really just what it looked like.

Well, I definitely didn't add that powershell check, so that's going to be hard to track down... but if it's not breaking anything, then perhaps we can leave it alone.
There is also a problem with the display in the console. There used to be Russian letters here.

codepage.png
That's definitely worth investigating, sounds like a Unicode/locale issue... which obviously wouldn't come up in my environment because I'm a native English speaker and thus everything was either UTF-8 or CP-1252 during the entire testing run. But yeah, these are the kind of edge cases I was a little paranoid about already, so nothing here is particularly shocking... the issues seem to be coming up mostly on things like Windows 7, 32-bit builds, and non-English locales.
"The Athenians, however, represent the unity of these opposites; in them, mind or spirit has emerged from the Theban subjectivity without losing itself in the Spartan objectivity of ethical life. With the Athenians, the rights of the State and of the individual found as perfect a union as was possible at all at the level of the Greek spirit." -- Hegel's philosophy of Mind

User avatar
jobbautista9
Board Warrior
Board Warrior
Posts: 1184
Joined: 2020-11-03, 06:47
Location: Philippines

Re: I created a Python 3 port of UXP out of boredom...

Post by jobbautista9 » 2026-04-03, 03:17

athenian200 wrote:
2026-04-02, 16:08
I guess I should have tested more Windows versions... sorry guys. Anyway, I didn't have my schedule clear today to work on this past noon, so I might not be able to more thoroughly test Windows 7 and 8 until later tonight. I kind of got blindsided by this and was hoping that a major issue like this would be caught in testing.
To be fair on you, we don't really support Windows versions < 10 as build environment. We never supported VS2019 either (which seems to be the most recent Visual Studio that Windows 7 supports). http://developer.palemoon.org/build/windows/

So don't be harsh on yourself! :thumbup:
Image

Tired of creating stuff!

Avatar artwork by Shinki669: https://www.pixiv.net/artworks/113645617

XUL add-ons developer. You can find a list of add-ons I manage at http://rw.rs/~job/software.html.

User avatar
athenian200
Contributing developer
Contributing developer
Posts: 1682
Joined: 2018-10-28, 19:56
Location: Georgia

Re: I created a Python 3 port of UXP out of boredom...

Post by athenian200 » 2026-04-03, 05:19

jobbautista9 wrote:
2026-04-03, 03:17
To be fair on you, we don't really support Windows versions < 10 as build environment. We never supported VS2019 either (which seems to be the most recent Visual Studio that Windows 7 supports). http://developer.palemoon.org/build/windows/

So don't be harsh on yourself! :thumbup:
I appreciate the encouragement. :)

Thankfully, it turned out this was actually a 32-bit Windows build issue I overlooked, and a one-line fix that had nothing to do with Windows 7 at all. Which meant I didn't have to setup a Windows 7 build environment to reproduce.

So that means the only issue remaining to be solved is actually why he can't see Cyrillic characters in his compiler output anymore, and I already have a working theory on that, because I suspect that's also not limited to Windows 7.

Essentially, the problem is this. MSYS1 itself appears to be a pure ASCII environment (at least that's what it is looking like so far). Just tested in all kinds of ways, couldn't get any non-ASCII characters to print normally inside MSYS1 no matter what I set LANG, LC_ALL, or any of that stuff to. A lot of the issues with that were papered over because I'm a native English speaker (and you can guess what that means for the triple ASCII/CP1252/UTF-8 safe zone/blindspot) and thus only noticed the UnicodeDecodeErrors themselves, and basically just fixed/bypassed them well enough to get Python 3 to "shut up," but didn't realize what I had to do to make it actually display correctly for speakers of other languages.

And my research so far has come up with this theory. That basically, Python 2 was able to (easily, as a quirky result of using ASCII bytestrings as its default string mode) bypass MSYS1 completely and somehow send raw bytes to the Windows ConHost that MSYS1 is running in. Because if MSYS1 "sees them," it won't know what to do with anything that's not either ASCII or a Latin-1 like encoding that can be degraded gracefully to ASCII (much like the default behavior of... either XTerm or Konsole, don't remember which... on an old Linux distro from the early 2000s). On Python 3... this is still possible, but not very easy because you have to have actual decoded text to perform string operations at all, and so you have to make sure most internal consumers within the build system get strings as text, but that whatever you have is bytes by the time it hits ConHost so MSYS1 doesn't intercept it and decode to ASCII. I think I have located the main place where the text from something like compiler output is being captured and printed to the console as text (which we can't have on MSYS1), and possibly fixed it... though it's kinda hard to tell in an English CP1252 environment whether my fix did anything.
"The Athenians, however, represent the unity of these opposites; in them, mind or spirit has emerged from the Theban subjectivity without losing itself in the Spartan objectivity of ethical life. With the Athenians, the rights of the State and of the individual found as perfect a union as was possible at all at the level of the Greek spirit." -- Hegel's philosophy of Mind

User avatar
__Sandra__
Apollo supporter
Apollo supporter
Posts: 40
Joined: 2022-05-16, 08:00
Location: Chernihiv, Ukraine

Re: I created a Python 3 port of UXP out of boredom...

Post by __Sandra__ » 2026-04-03, 06:02

athenian200 wrote:
2026-04-02, 22:43
That's definitely worth investigating, sounds like a Unicode/locale issue... which obviously wouldn't come up in my environment because I'm a native English speaker and thus everything was either UTF-8 or CP-1252 during the entire testing run. But yeah, these are the kind of edge cases I was a little paranoid about already, so nothing here is particularly shocking... the issues seem to be coming up mostly on things like Windows 7, 32-bit builds, and non-English locales.
Unfortunately the compilation failed.
[1775163143.0783317, "build_output", {"line": "\u040f\u0430\u0401\u00ac\u0490\u0437\u00a0\u00ad\u0401\u0490: \u045e\u0404\u00ab\u043e\u0437\u0490\u00ad\u0401\u0490 \u0434\u00a0\u00a9\u00ab\u00a0: c:\\pm_src\\platform\\media\\ffvpx\\libavutil\\mem_internal.h"}]
[1775163143.0783317, "build_output", {"line": "\u040f\u0430\u0401\u00ac\u0490\u0437\u00a0\u00ad\u0401\u0490: \u045e\u0404\u00ab\u043e\u0437\u0490\u00ad\u0401\u0490 \u0434\u00a0\u00a9\u00ab\u00a0: c:\\pm_src\\platform\\media\\ffvpx\\libavutil\\tx_template.c"}]
[1775163143.0783317, "build_output", {"line": "mozavutil.dll"}]
[1775163143.5307324, "build_output", {"line": "Traceback (most recent call last):"}]
[1775163143.5307324, "build_output", {"line": " File \"c:/pm_src/platform/config/expandlibs_exec.py\", line 355, in <module>"}]
[1775163143.5307324, "build_output", {"line": " exit(main(sys.argv[1:]))"}]
[1775163143.5307324, "build_output", {"line": " File \"c:/pm_src/platform/config/expandlibs_exec.py\", line 348, in main"}]
[1775163143.5307324, "build_output", {"line": " sys.stderr.write(stdout.decode(encoding='utf-8'))"}]
[1775163143.5307324, "build_output", {"line": "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 3: invalid start byte"}]
[1775163143.5619326, "build_output", {"line": "mozmake.EXE[5]: *** [c:/pm_src/platform/config/rules.mk;765: mozavutil.dll] Error 1"}]
[1775163143.5619326, "build_output", {"line": "mozmake.EXE[5]: *** Deleting file 'mozavutil.dll'"}]
[1775163143.5619326, "build_output", {"line": "mozmake.EXE[4]: *** [c:/pm_src/platform/config/recurse.mk;71: media/ffvpx/libavutil/target] Error 2"}]
[1775163143.5619326, "build_output", {"line": "mozmake.EXE[3]: *** [c:/pm_src/platform/config/recurse.mk;33: compile] Error 2"}]
[1775163143.5619326, "build_output", {"line": "mozmake.EXE[2]: *** [c:/pm_src/platform/config/rules.mk;493: default] Error 2"}]
[1775163143.5775325, "build_output", {"line": "mozmake.EXE[1]: *** [c:/pm_src/client.mk;406: realbuild] Error 2"}]
[1775163143.5775325, "build_output", {"line": "mozmake.EXE: *** [client.mk;164: build] Error 2"}]
[1775163143.6087327, "warning_summary", {"count": 142}]

User avatar
Moonchild
Project founder
Project founder
Posts: 39121
Joined: 2011-08-28, 17:27
Location: Sweden

Re: I created a Python 3 port of UXP out of boredom...

Post by Moonchild » 2026-04-03, 07:13

I'm having no build issues myself with the python 3 as it is now, but I did notice that the command window title changes to powershell while building. I'm a bit confused why powershell would even be involved in the build process at all when using Python 3. No idea where to start looking for this though.
"There is no point in arguing with an idiot, because then you're both idiots." - Anonymous
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

User avatar
__Sandra__
Apollo supporter
Apollo supporter
Posts: 40
Joined: 2022-05-16, 08:00
Location: Chernihiv, Ukraine

Re: I created a Python 3 port of UXP out of boredom...

Post by __Sandra__ » 2026-04-03, 08:30

I tried to make a build with the latest version of the patch "Follow-up: Fix 32-bit Windows builds" but it also ended with errors.

[1775203281.165765, "build_output", {"line": "\u040f\u0430\u0401\u00ac\u0490\u0437\u00a0\u00ad\u0401\u0490: \u045e\u0404\u00ab\u043e\u0437\u0490\u00ad\u0401\u0490 \u0434\u00a0\u00a9\u00ab\u00a0: c:\\pm_src\\platform\\media\\ffvpx\\libavutil\\tx_template.c"}]
[1775203281.165765, "build_output", {"line": "mozavutil.dll"}]
[1775203281.649366, "build_output", {"line": "Traceback (most recent call last):"}]
[1775203281.649366, "build_output", {"line": " File \"c:/pm_src/platform/config/expandlibs_exec.py\", line 355, in <module>"}]
[1775203281.649366, "build_output", {"line": " exit(main(sys.argv[1:]))"}]
[1775203281.649366, "build_output", {"line": " File \"c:/pm_src/platform/config/expandlibs_exec.py\", line 348, in main"}]
[1775203281.649366, "build_output", {"line": " sys.stderr.write(stdout.decode(encoding='utf-8'))"}]
[1775203281.649366, "build_output", {"line": "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 3: invalid start byte"}]
[1775203281.680566, "build_output", {"line": "mozmake.EXE[5]: *** [c:/pm_src/platform/config/rules.mk;765: mozavutil.dll] Error 1"}]
[1775203281.680566, "build_output", {"line": "mozmake.EXE[5]: *** Deleting file 'mozavutil.dll'"}]
[1775203281.680566, "build_output", {"line": "mozmake.EXE[4]: *** [c:/pm_src/platform/config/recurse.mk;71: media/ffvpx/libavutil/target] Error 2"}]
[1775203283.92697, "build_output", {"line": "Traceback (most recent call last):"}]
[1775203283.94257, "build_output", {"line": " File \"c:/pm_src/platform/config/expandlibs_exec.py\", line 355, in <module>"}]
[1775203283.94257, "build_output", {"line": " exit(main(sys.argv[1:]))"}]
[1775203283.94257, "build_output", {"line": " File \"c:/pm_src/platform/config/expandlibs_exec.py\", line 348, in main"}]
[1775203283.94257, "build_output", {"line": " sys.stderr.write(stdout.decode(encoding='utf-8'))"}]
[1775203283.94257, "build_output", {"line": "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 3: invalid start byte"}]
[1775203283.9581702, "build_output", {"line": "mozmake.EXE[5]: *** [c:/pm_src/platform/config/rules.mk;765: icu78.dll] Error 1"}]
[1775203283.9581702, "build_output", {"line": "mozmake.EXE[5]: *** Deleting file 'icu78.dll'"}]
[1775203283.9581702, "build_output", {"line": "mozmake.EXE[4]: *** [c:/pm_src/platform/config/recurse.mk;71: config/external/icu/target] Error 2"}]
[1775203283.9581702, "build_output", {"line": "mozmake.EXE[3]: *** [c:/pm_src/platform/config/recurse.mk;33: compile] Error 2"}]
[1775203283.9581702, "build_output", {"line": "mozmake.EXE[2]: *** [c:/pm_src/platform/config/rules.mk;493: default] Error 2"}]
[1775203283.9581702, "build_output", {"line": "mozmake.EXE[1]: *** [c:/pm_src/client.mk;406: realbuild] Error 2"}]
[1775203283.97377, "build_output", {"line": "mozmake.EXE: *** [client.mk;164: build] Error 2"}]
[1775203284.00497, "warning_summary", {"count": 142}]

User avatar
__Sandra__
Apollo supporter
Apollo supporter
Posts: 40
Joined: 2022-05-16, 08:00
Location: Chernihiv, Ukraine

Re: I created a Python 3 port of UXP out of boredom...

Post by __Sandra__ » 2026-04-03, 11:15

I removed "sys.stderr.write(stdout.decode(encoding='utf-8'))" line from file expandlibs_exec.py and the build was successful.

I also noticed a peculiarity that in the source files the files have a “0x0A” line ending. After building and packaging, all files began to have line ending “0x0D 0x0A”.

User avatar
Drugwash
Lunatic
Lunatic
Posts: 355
Joined: 2016-01-28, 12:08
Location: Ploieşti, Romania

Re: I created a Python 3 port of UXP out of boredom...

Post by Drugwash » 2026-04-03, 11:28

__Sandra__ wrote:
2026-04-03, 08:30
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 3: invalid start byte
Was about to forward the idea to try detecting byte string encoding first using chardet.detect() and selecting best confidence, then pass selected encoding to decode() instead of utf-8. However a quick test revealed the fact that chardet is no longer part of the default Python3 installation at least since Python 3.9 if not earlier. Substitutes could be charset_normalizer or cchardet but those are external modules too. Dunno of any similar other default module that'd be available in all [usable] Python3 versions. Which means, even trying to install any of those modules in the venv might fail if the user has no Internet access. :(

Or am I way off-track here? Can a prebuilt venv be deployed with the UXP code so that all necessary Python modules would be readily available? Maybe that's already the case, dunno. :?
I removed "sys.stderr.write(stdout.decode(encoding='utf-8'))" line from file expandlibs_exec.py and the build was successful.
That line seems to only serve for debugging so it seems safe to omit it, but it may be better to use the correct character decoding so any error/debug/info message could be displayed as intended.
I also noticed a peculiarity that in the source files the files have a “0x0A” line ending. After building and packaging, all files began to have line ending “0x0D 0x0A”.
Source files use *nix line-ending by default. SInce you're using Windows it makes sense that resulting files would have a Windows line-ending.

User avatar
athenian200
Contributing developer
Contributing developer
Posts: 1682
Joined: 2018-10-28, 19:56
Location: Georgia

Re: I created a Python 3 port of UXP out of boredom...

Post by athenian200 » 2026-04-03, 15:32

__Sandra__ wrote:
2026-04-03, 08:30
I tried to make a build with the latest version of the patch "Follow-up: Fix 32-bit Windows builds" but it also ended with errors.

[1775203281.165765, "build_output", {"line": "\u040f\u0430\u0401\u00ac\u0490\u0437\u00a0\u00ad\u0401\u0490: \u045e\u0404\u00ab\u043e\u0437\u0490\u00ad\u0401\u0490 \u0434\u00a0\u00a9\u00ab\u00a0: c:\\pm_src\\platform\\media\\ffvpx\\libavutil\\tx_template.c"}]
[1775203281.165765, "build_output", {"line": "mozavutil.dll"}]
[1775203281.649366, "build_output", {"line": "Traceback (most recent call last):"}]
[1775203281.649366, "build_output", {"line": " File \"c:/pm_src/platform/config/expandlibs_exec.py\", line 355, in <module>"}]
[1775203281.649366, "build_output", {"line": " exit(main(sys.argv[1:]))"}]
[1775203281.649366, "build_output", {"line": " File \"c:/pm_src/platform/config/expandlibs_exec.py\", line 348, in main"}]
[1775203281.649366, "build_output", {"line": " sys.stderr.write(stdout.decode(encoding='utf-8'))"}]
[1775203281.649366, "build_output", {"line": "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 3: invalid start byte"}]
[1775203281.680566, "build_output", {"line": "mozmake.EXE[5]: *** [c:/pm_src/platform/config/rules.mk;765: mozavutil.dll] Error 1"}]
[1775203281.680566, "build_output", {"line": "mozmake.EXE[5]: *** Deleting file 'mozavutil.dll'"}]
[1775203281.680566, "build_output", {"line": "mozmake.EXE[4]: *** [c:/pm_src/platform/config/recurse.mk;71: media/ffvpx/libavutil/target] Error 2"}]
[1775203283.92697, "build_output", {"line": "Traceback (most recent call last):"}]
[1775203283.94257, "build_output", {"line": " File \"c:/pm_src/platform/config/expandlibs_exec.py\", line 355, in <module>"}]
[1775203283.94257, "build_output", {"line": " exit(main(sys.argv[1:]))"}]
[1775203283.94257, "build_output", {"line": " File \"c:/pm_src/platform/config/expandlibs_exec.py\", line 348, in main"}]
[1775203283.94257, "build_output", {"line": " sys.stderr.write(stdout.decode(encoding='utf-8'))"}]
[1775203283.94257, "build_output", {"line": "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 3: invalid start byte"}]
[1775203283.9581702, "build_output", {"line": "mozmake.EXE[5]: *** [c:/pm_src/platform/config/rules.mk;765: icu78.dll] Error 1"}]
[1775203283.9581702, "build_output", {"line": "mozmake.EXE[5]: *** Deleting file 'icu78.dll'"}]
[1775203283.9581702, "build_output", {"line": "mozmake.EXE[4]: *** [c:/pm_src/platform/config/recurse.mk;71: config/external/icu/target] Error 2"}]
[1775203283.9581702, "build_output", {"line": "mozmake.EXE[3]: *** [c:/pm_src/platform/config/recurse.mk;33: compile] Error 2"}]
[1775203283.9581702, "build_output", {"line": "mozmake.EXE[2]: *** [c:/pm_src/platform/config/rules.mk;493: default] Error 2"}]
[1775203283.9581702, "build_output", {"line": "mozmake.EXE[1]: *** [c:/pm_src/client.mk;406: realbuild] Error 2"}]
[1775203283.97377, "build_output", {"line": "mozmake.EXE: *** [client.mk;164: build] Error 2"}]
[1775203284.00497, "warning_summary", {"count": 142}]
Yes, what you're finding is pretty similar to what I'm finding. This is purely an encoding mismatch with Russian Windows I believe... and yeah, the way I tried to deal with this initially was to detect the system locale in the places where I know for sure it writes to the console, but that's proving not to be very reliable because it really seems like either a lot of people who speak other languages have their system's codepage set to CP1252 regardless and just have their native language as a langpack for MSVC itself, or else every check for the system codepage fails inside MSYS1 and it reports CP1252 regardless of what you have set.

About the only way I was able to get the right character display so far was to force the console into 1251 mode with chcp, and then hunt down every instance of "getpreferredencoding," and just hardcode 1251 instead... obviously that wouldn't be a good upstream fix, but until I think of something better, all I can really suggest is that people building in non-English languages might have to locally patch the build system's encoding detection to be hardcoded to their language's ANSI codepage (1252 for Russian, maybe 932 for Japanese, etc) and use chcp 1251 in MozillaBuild's .bat files on top of that to try and get the display in their own output language. Basically, this doesn't support Unicode (but does support ANSI codepages), and Python 3 assumes Unicode... that's the mismatch.

The intermediate solution... would be to detect the console's encoding rather than the system-wide Windows encoding, and use that. But that would still rely on the user having set that properly in the .bat file that runs MozillaBuild, because it probably isn't set correctly by default. I know for me, system console encoding was set to good-old DOS-based CP437 by default (which is what gave me a false positive on a test suggesting we had a pure ASCII environment earlier) even though the system itself was set to CP1252 by default.
"The Athenians, however, represent the unity of these opposites; in them, mind or spirit has emerged from the Theban subjectivity without losing itself in the Spartan objectivity of ethical life. With the Athenians, the rights of the State and of the individual found as perfect a union as was possible at all at the level of the Greek spirit." -- Hegel's philosophy of Mind

User avatar
athenian200
Contributing developer
Contributing developer
Posts: 1682
Joined: 2018-10-28, 19:56
Location: Georgia

Re: I created a Python 3 port of UXP out of boredom...

Post by athenian200 » 2026-04-03, 15:52

Drugwash wrote:
2026-04-03, 11:28
__Sandra__ wrote:
2026-04-03, 08:30
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 3: invalid start byte
Was about to forward the idea to try detecting byte string encoding first using chardet.detect() and selecting best confidence, then pass selected encoding to decode() instead of utf-8. However a quick test revealed the fact that chardet is no longer part of the default Python3 installation at least since Python 3.9 if not earlier. Substitutes could be charset_normalizer or cchardet but those are external modules too. Dunno of any similar other default module that'd be available in all [usable] Python3 versions. Which means, even trying to install any of those modules in the venv might fail if the user has no Internet access. :(

Or am I way off-track here? Can a prebuilt venv be deployed with the UXP code so that all necessary Python modules would be readily available? Maybe that's already the case, dunno. :?
We actually don't use pip or anything, we just slide external modules we want into the build system and wire them in via generated .pth files that are created when the venv is created, we have a mechanism for that already. So if you find an external Python module that can work around this problem, I'm definitely interested. Question is whether something like chardet would work reliably on the small snippets of text the build system sometimes receives from tools.
"The Athenians, however, represent the unity of these opposites; in them, mind or spirit has emerged from the Theban subjectivity without losing itself in the Spartan objectivity of ethical life. With the Athenians, the rights of the State and of the individual found as perfect a union as was possible at all at the level of the Greek spirit." -- Hegel's philosophy of Mind

User avatar
Drugwash
Lunatic
Lunatic
Posts: 355
Joined: 2016-01-28, 12:08
Location: Ploieşti, Romania

Re: I created a Python 3 port of UXP out of boredom...

Post by Drugwash » 2026-04-03, 16:24

athenian200 wrote:
2026-04-03, 15:52
Question is whether something like chardet would work reliably on the small snippets of text the build system sometimes receives from tools.
You could build an ad-hoc test script for that. I've no idea how long/complicated the byte strings would be. Personally I use charset_normalizer in some short scripts that deal with subtitles embedded in media files, but any of those modules mentioned above should do the job. Maybe a thorough test would prove which is most reliable (as in most accurate detection) for the task at hand.

On my system building Pale Moon takes amost three hours so it would be extremely difficult for me to test any changes to the build system in a reasonable manner.

If you need some head start here's a site I just found that deals with chardet and text encoding: link.

User avatar
athenian200
Contributing developer
Contributing developer
Posts: 1682
Joined: 2018-10-28, 19:56
Location: Georgia

Re: I created a Python 3 port of UXP out of boredom...

Post by athenian200 » 2026-04-03, 17:20

One piece of info I'm curious about but haven't checked out yet:

If someone on a non-English Windows (or rather more specifically, one outside the range of CP1252) does this in MozillaBuild:

Code: Select all

$ python3
Python 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding(False)
'cp1252'
>>>
Does it output cp1252, or the correct codepage? Knowing either way will help me eliminate possibilities here... one of my working theories as to why this is failing is that MozillaBuild/MSYS1 is lying to Python about the system codepage and hardcoding it to cp1252 somehow. The other theory I have in mind would be much simpler to fix if that's not the case.
"The Athenians, however, represent the unity of these opposites; in them, mind or spirit has emerged from the Theban subjectivity without losing itself in the Spartan objectivity of ethical life. With the Athenians, the rights of the State and of the individual found as perfect a union as was possible at all at the level of the Greek spirit." -- Hegel's philosophy of Mind

User avatar
athenian200
Contributing developer
Contributing developer
Posts: 1682
Joined: 2018-10-28, 19:56
Location: Georgia

Re: I created a Python 3 port of UXP out of boredom...

Post by athenian200 » 2026-04-03, 19:12

I've been looking into this problem more, and I think I managed to confirm some facts that actually snap everything into focus.

The Russian user's console encoding is CP866, but his system encoding is CP1251, just as we'd expect. The mojibake actually gives me enough info to confirm this if I look at it the right way. This... isn't a weird situation at all, and it's surprising it's not handled correctly by Python here. But after doing some local monkeypatching to force everything into CP1251, I figured out the original phrase being turned into mojibake was:

Примечание: включение файла

Which I can't actually read (for obvious reasons), but can at least visually recognize the shape of as looking like proper Cyrillic character rendering.

The closest match I could reproduce by playing with Python 3 and deliberately decoding things wrong with Russian encodings was:

ЏаЁ¬Ґз ­ЁҐ: ўЄ«о祭ЁҐ д ©«

But on his machine, it apparently degrades a bit further with weird replacement characters and becomes:

?аЁ¬Ґз -­Ё?: ўЄ<оз-­Ё? д c<

More or less.

So Python 3 is decoding CP866 as CP1251, because MSVC sees the console encoding is CP866 and outputs that, but Python checks the system codepage instead of the console codepage and interprets CP866 as CP1251, resulting in the garbage the user saw. The funny thing is, MSVC is perfectly capable of outputting either CP1251 or CP866, but Python somehow isn't telling MSVC what it should be outputting here, just letting it guess wrong based on the console encoding and then taking that output and decoding based on the system encoding.

So that means... there are two basic workarounds that will probably work. One is just adding chcp 1251 to the batch file used to launch MSYS so that the terminal environment is actually 1251 like Python 3 thinks it is... and MSVC will also pick that up and get the hint that it needs to output CP1251 to avoid confusing Python. The other... is that I setup the build system on Windows to do something like:

Code: Select all

encoding = "cp%d" % ctypes.windll.kernel32.GetConsoleOutputCP()
or similar rather than use getpreferredencoding(False).

So yes, turns out this can be fixed... the missing piece of the puzzle is that on Windows, console encoding != system encoding, and that MSVC in MSYS1 only cares about the former, while Python's encoding/locale detection only cares about the latter.
"The Athenians, however, represent the unity of these opposites; in them, mind or spirit has emerged from the Theban subjectivity without losing itself in the Spartan objectivity of ethical life. With the Athenians, the rights of the State and of the individual found as perfect a union as was possible at all at the level of the Greek spirit." -- Hegel's philosophy of Mind

User avatar
adoxa
Astronaut
Astronaut
Posts: 619
Joined: 2019-03-16, 13:26
Location: Qld, Aus.

Re: I created a Python 3 port of UXP out of boredom...

Post by adoxa » 2026-04-03, 23:48

Would chcp 65001 & set PYTHONUTF8=1 work? That makes both console & Python use UTF-8.

User avatar
micwoj92
Fanatic
Fanatic
Posts: 198
Joined: 2020-12-22, 20:57

Re: I created a Python 3 port of UXP out of boredom...

Post by micwoj92 » 2026-04-05, 01:00

Thank you for this effort. Writing this from Pale Moon 34.3.0a1 from commit 8332170eea5158ebc11635ca6264db494d966aaf built with python 3.14.3

User avatar
Moonchild
Project founder
Project founder
Posts: 39121
Joined: 2011-08-28, 17:27
Location: Sweden

Re: I created a Python 3 port of UXP out of boredom...

Post by Moonchild » 2026-04-05, 07:47

Keep in mind this is the start of a new dev cycle so we got plenty of time to work this out. People building from source for daily use should just use the release branch if they run into python3 issues and are just trying to compile, not work through solving or helping solve.
As it is i was sent some more breakage issues in email, I'll forward those to you, probably just on the repo in a new issue.
"There is no point in arguing with an idiot, because then you're both idiots." - Anonymous
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

User avatar
athenian200
Contributing developer
Contributing developer
Posts: 1682
Joined: 2018-10-28, 19:56
Location: Georgia

Re: I created a Python 3 port of UXP out of boredom...

Post by athenian200 » 2026-04-06, 07:34

So, I was trying to look into the foreign text display issues before I realized there were more pressing Linux issues that have to be dealt with first...

https://repo.palemoon.org/athenian200/U ... experiment

I experimented with a Windows function called GetConsoleOutputCP... the reason I zeroed in on it, is because that's what MSVC's cl and link use to determine their own output encoding, and not GetACP which is what Python and some other tools seem to trust instead. The catch... is that you have to be careful of environment variables getting pulled in as the wrong encoding and breaking things, but really in most cases it's better to rely on file I/O on Windows rather than trust the integrity of an environment variable in situations where you need tight control of the encoding or byte-level representation.

I... think I got carried away with this because I was still in "hunt through the build system's guts and carefully change big things" mode rather than "practical fix" mode mentally, and possibly took it further than would be practical to upstream, but at the very least I think the patch existing for people to play around with as an alternative to trying to make the console match their system ANSI encoding won't hurt anything.

In any case, this hasn't seen a lot of testing and may take me a while to get back to if anyone is interested or runs into issues with it. Seriously, I should get my mind refocused in a more useful direction... messing with the build system too much has made me feel something like the programming equivalent of those people who play too much Tetris and start seeing falling blocks in their dreams, only with me it's old Mozilla code and the flow of the build system now...
"The Athenians, however, represent the unity of these opposites; in them, mind or spirit has emerged from the Theban subjectivity without losing itself in the Spartan objectivity of ethical life. With the Athenians, the rights of the State and of the individual found as perfect a union as was possible at all at the level of the Greek spirit." -- Hegel's philosophy of Mind

User avatar
__Sandra__
Apollo supporter
Apollo supporter
Posts: 40
Joined: 2022-05-16, 08:00
Location: Chernihiv, Ukraine

Re: I created a Python 3 port of UXP out of boredom...

Post by __Sandra__ » 2026-04-06, 09:22

athenian200 wrote:
2026-04-06, 07:34
So, I was trying to look into the foreign text display issues before I realized there were more pressing Linux issues that have to be dealt with first...

https://repo.palemoon.org/athenian200/U ... experiment
So far everything looks good!
You do not have the required permissions to view the files attached to this post.

User avatar
__Sandra__
Apollo supporter
Apollo supporter
Posts: 40
Joined: 2022-05-16, 08:00
Location: Chernihiv, Ukraine

Re: I created a Python 3 port of UXP out of boredom...

Post by __Sandra__ » 2026-04-06, 11:33

The build completed successfully.

The message about "powershell.exe" keeps appearing. The good news is that fewer “garbage” messages are output to the console and the size of the build log has been greatly reduced (from ~200 MB to ~1 MB).

User avatar
athenian200
Contributing developer
Contributing developer
Posts: 1682
Joined: 2018-10-28, 19:56
Location: Georgia

Re: I created a Python 3 port of UXP out of boredom...

Post by athenian200 » 2026-04-06, 16:36

__Sandra__ wrote:
2026-04-06, 11:33
The build completed successfully.

The message about "powershell.exe" keeps appearing. The good news is that fewer “garbage” messages are output to the console and the size of the build log has been greatly reduced (from ~200 MB to ~1 MB).
:thumbup: Glad to hear it.

Yeah, that was the thing I took the longest to figure out. Those messages about the Russian equivalent of "Note: Including header" were actually always supposed to be filtered out, and they always have been on English builds. It just wasn't happening on Russian Windows because Mozilla used an environment variable to store an AC_SUBST from autoconf, and forgot that on Windows, that variable will be mangled for anyone whose native language isn't English (though it has a chance of working for people in the Latin-1/CP850 territories) because the console codepage never matches the ANSI codepage unless the user explicitly sets that up. So that's why you've been seeing garbage this whole time. The remaining output from cl and link should look like Cyrillic characters instead of mojibake now.

That odd powershell.exe message is one I haven't been seeing on my system and for whatever reason haven't been able to reproduce. The reason it's particularly odd is that I don't recall ever adding anything powershell-related or touching any files that mention it.
"The Athenians, however, represent the unity of these opposites; in them, mind or spirit has emerged from the Theban subjectivity without losing itself in the Spartan objectivity of ethical life. With the Athenians, the rights of the State and of the individual found as perfect a union as was possible at all at the level of the Greek spirit." -- Hegel's philosophy of Mind