Cutting back l10n / Transifex is a bust

The l10n of Pale Moon. Rawr.
User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35474
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Cutting back l10n / Transifex is a bust

Unread post by Moonchild » 2016-01-15, 03:04

Some bad news: Our localization solution (Transifex) has proven to have such extremely poor file format support that most of our actively used translations have been mangled to hell.
We are cutting back our localization support to only those languages we actually have translators for, as well.

Transifex's file format support for both DTD an Properties files is terribly broken. We cannot use this system any longer, and I'll be completely pulling out - Transifex is not usable at all for either format because of automatic conversion of special characters and escaped unicode entities that should NOT be touched.

I'm sorry if I wasted anyone's time with this. I'll be working overtime to get as many languages acceptable for the v26 release from the exports of the languages that have been worked on here as possible, and I do thank everyone for their time. I'll have to look into another solution that actually supports our formats, or set time aside to make a translator application myself.
I'm sad that in 2016, we still have such terribly poor support for relatively simple and straightforward file formats in what could be considered professional frameworks :(
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

jumba

Re: Cutting back l10n / Transifex is a bust

Unread post by jumba » 2016-01-15, 10:30

That's unfortunate! What is the status of babelzilla being used for localization?

I found one alternative service that sounds promising. Have you heard of it?
http://zanata.org/
https://github.com/zanata/zanata-server

Supported project types are listed in here:
http://docs.zanata.org/en/release/user- ... ect-types/

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35474
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Re: Cutting back l10n / Transifex is a bust

Unread post by Moonchild » 2016-01-15, 10:45

jumba wrote:That's unfortunate!
Understatement of the year ;)
jumba wrote:What is the status of babelzilla being used for localization?
We can't. From the start they've been refusing full language packs, and urged me to find a different solution.

Their underlying WTS system is available on Github, but I have not been able to get it to work. If someone can help with that, then that would be a good solution - I have server capacity for it.
jumba wrote:I found one alternative service that sounds promising. Have you heard of it?
http://zanata.org/
I haven't heard of it but I'll have to look for alternatives later. The main problem is that most available translation server software doesn't support our language formats.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

glezos

Re: Cutting back l10n / Transifex is a bust

Unread post by glezos » 2016-01-15, 17:44

Hi, I'm Dimitris, the founder of Transifex.

As a heads up, this bug (DTD support for special characters in `<!ENTITY` tags) has been added to our backlog for development. It's a corner-case to us (the majority of the Transifex projects (open source or not) are not using this), so it hasn't surfaced as something urgent to implement. The issue is that Transifex prefers escaped characters (`&` instead of `&`) in the entities. As a workaround until there is a solution, you can run a small script to escape these characters before pushing to Transifex.

Regarding the .properties file, I believe this was a bug in the Windows Transifex client which has been fixed? I'm not sure though, since this was reported back in October and there was a client release since then.

If you decide to go, we'll understand, and feel bad we couldn't have served you guys better.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35474
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Re: Cutting back l10n / Transifex is a bust

Unread post by Moonchild » 2016-01-15, 22:08

Thanks for poking your head in here, Dimitris.
glezos wrote:As a heads up, this bug (DTD support for special characters in `<!ENTITY` tags) has been added to our backlog for development. It's a corner-case to us (the majority of the Transifex projects (open source or not) are not using this), so it hasn't surfaced as something urgent to implement. The issue is that Transifex prefers escaped characters (`&` instead of `&`) in the entities. As a workaround until there is a solution, you can run a small script to escape these characters before pushing to Transifex.
The problem is that these characters should be left alone. They should neither be escaped nor unescaped if they occur as parameters inside tags! Doing so in any document with this kind of XML-based structure will cause problems, not just for DTDs.

As i already explained to Nina, you will find both escaped and unescaped tokens in the DTD entities, by design. Any sort of conversion either way will mangle this. You should leave those strings alone!

== Excerpt from my mail to Nina ==
Some strings have escaped entities that would be used literally in XUL windows. e.g.:

Code: Select all

<!ENTITY securityView.privacy.header "Privacy & History">
<!ENTITY button-next-win.label "Next >">
Some strings have unescaped entities because the contents are parameters to be replaced at run-time (calling DTD entities within other DTD entities). e.g.:

Code: Select all

<!ENTITY brandFullName "Pale Moon">
<!ENTITY aboutDialog.title "About &brandFullName;">
(becomes "About Pale Moon")
Some strings have additional unescaped entities because they are literal HTML snippets. e.g.:

Code: Select all

<!ENTITY certerror.introPara1 "You have asked &brandShortName; to connect securely to <b>#1</b>, but we can't confirm that your connection is secure.">
If Transifex's format handler unescapes them before inserting them in the database then the former becomes incorrect in the database.
Escaping upon exporting from the database creates another, bigger, problem in that all run-time parameters become escaped and as such invalid for use in the application.

Just blindly un/escaping all regardless of where in the DTD the strings occur is a bug in your format handler, as explained in my previous mail. You should never, never, ever convert entities inside parameter strings as you are doing.

Even in the case of website translation this would be absolutely wrong!

Code: Select all

<a href="http://example.com/file.php?param=value&param2=value2">
after your translation would become

Code: Select all

<a href="http://example.com/file.php?param=value&param2=value2">
which is dead wrong.

=== end excerpt ===
The problem I indicated, but may not have come across, is that strings inside tags as parameters should never be converted. Only strings between tags (in the actual document body) should be converted.

1)

Code: Select all

<!ENTITY name "Value &parameter; is OK">
is a literal string inside a tag
(the common format for Mozilla language DTD files for localization)
2)

Code: Select all

<TAG>Value &parameter; is OK</TAG>
is a literal string between tags
glezos wrote:Regarding the .properties file, I believe this was a bug in the Windows Transifex client which has been fixed?
I never used the client because I could never get it to work; it would either flat out refuse, or it would not be allowed to push source files to the server -- I've done all my file submissions through the web interface directly to your server, after that.
So, this, too is likely a format handler issue, since this also deals with unintended conversions, e.g. \u0020 for a hard leading or trailing space is stripped, and other \u values are converted to their unicode/UTF-8 characters. They are in the files as escaped sequences for a damned good reason.
glezos wrote:If you decide to go, we'll understand, and feel bad we couldn't have served you guys better.
Considering you can neither offer proper dtd nor properties support, and having run into the additional issues that equally-named entities in different locations in a monolithic file (which has been required because your client never worked) are equalized to a single entry (and context or uniqueness of entries is ignored; that is an essential flaw!), Transifex simply doesn't work for us. 100% match repetitions Entities with the same tag don't necessarily have to be translated to the same target string if they end up in a different file or context...
The latter especially has me have to go through and compare against the previous sources and reference material to correct, a rather tedious and time-consuming task for the volume it is. I do not want to ever do that more than once, so whatever we end up using will have to be able to handle it flawlessly after this disaster.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

JustOff

Re: Cutting back l10n / Transifex is a bust

Unread post by JustOff » 2016-01-16, 09:53

Why not to try Crowdin? This premium service supports all required formats and free for open source projects.

Locked