Database for "dtd" files
Posted: 2016-09-29, 23:47
I've assembled the raw data to create an hierarchical database for Pale Moon's dtd/localization files (including "dtd" files in the 30+ Language Packs). Following is what this database might look like:
The "FileName Database" contains a record for each "ENTITY" included in that file. The database does not include the ENTITY statements themselves, but merely a pointer or link to those ENTITY statements, kept in another database file (ENTITY's are obtained from either the "ENTITY" database (for the default "en-US" locale), or from the "Locale" database for the Language Packs).
The "EntityName" is both a record key-field to a specific "ENTITY" and also part of the "ENTITY" data itself. Thus, this EntityName or key-field must be unique. The "EntityName" key remains unique, even though the same "EntityName" is used in multiple "dtd" files, as long as its string value is the same for all these files. That is, the same "EntityName" cannot be used to say one thing in one dtd file (like "chocolate"), and say something quite different in another dtd file (like "vanilla"). There are a number of such duplicate keys in Pale Moon's dtd files. But fortunately these duplicates mostly appear in a small number of pairs of files (such as "colors.dtd" and "fonts.dtd"). The EntityName's would have to be changed for one or both of these files; or, alternatively, one or both of these files could be removed from/not included in the database.
Obtaining the data for this database is administratively very simple, taking just a few minutes for each locale. And, the database can be kept in LibreOffice or Microsoft Office, instead of a more fancy or expensive solution. To create the database data:
1. Obtain the text editor "Notepad ++". Add the "Column sorting" and "Combine" plugins to Notepad ++. Also obtain the "Bandizip Archive Manager" if you intend to open "omni.ja" files. For Language Packs or other "xpi" files, most other archive managers will work.
2. Copy the locale files from the Language Pack to a separate folder. There are two groups of locale files in each Language Pack (corresponding to the two "omni.ja" files in Pale Moon. Delete all files in this new folder, that isn't a "dtd" file.
3. Drag and drop the folder with your "dtd" files to an open Notepad++. All "dtd" files should now be open in Notepad++. Optionally clean up the files by replacing all the double spaces <space><space> in all open files, with a single space. Repeat until there are "0" instances of double spaces. Then, click "Combine" on the plugins menu, to combine all the dtd files into one new file. Then, click "Column sorting" to sort the file. Delete all lines that are not "ENTITY" lines. You're basicly done at this point, except for the issue of "ENTITY" statements that occupied more than one-line prior to the sort. "ENTITY" statements must be terminated by "> and sorting chops off this termination. It's usually easier to find and fix these truncated "ENTITY" statements, prior to moving your combined dtd file to a spreadsheet (which is the next step).
4. The Notepad ++ file is now a 1-column list of ENTITY statements. Copy the list and paste to a spreadsheet. Next edit The ENTITY statements in Notepad++, until all that remains is the "EntityName", followed by one space, followed by one quote mark, followed by the ENTITY value (there should be no "> terminations). Next, replace the space and quote mark with a tab for each line. Then paste this edited ENTITY list to an empty column to the right of the ENTITY statements previously posted. The "EntityName" should now be split out in its own column, followed by the ENTITY string value, also in its own column.
5. Next repeat these steps for the remaining group of locale files.
The "FileName Database" contains a record for each "ENTITY" included in that file. The database does not include the ENTITY statements themselves, but merely a pointer or link to those ENTITY statements, kept in another database file (ENTITY's are obtained from either the "ENTITY" database (for the default "en-US" locale), or from the "Locale" database for the Language Packs).
The "EntityName" is both a record key-field to a specific "ENTITY" and also part of the "ENTITY" data itself. Thus, this EntityName or key-field must be unique. The "EntityName" key remains unique, even though the same "EntityName" is used in multiple "dtd" files, as long as its string value is the same for all these files. That is, the same "EntityName" cannot be used to say one thing in one dtd file (like "chocolate"), and say something quite different in another dtd file (like "vanilla"). There are a number of such duplicate keys in Pale Moon's dtd files. But fortunately these duplicates mostly appear in a small number of pairs of files (such as "colors.dtd" and "fonts.dtd"). The EntityName's would have to be changed for one or both of these files; or, alternatively, one or both of these files could be removed from/not included in the database.
Obtaining the data for this database is administratively very simple, taking just a few minutes for each locale. And, the database can be kept in LibreOffice or Microsoft Office, instead of a more fancy or expensive solution. To create the database data:
1. Obtain the text editor "Notepad ++". Add the "Column sorting" and "Combine" plugins to Notepad ++. Also obtain the "Bandizip Archive Manager" if you intend to open "omni.ja" files. For Language Packs or other "xpi" files, most other archive managers will work.
2. Copy the locale files from the Language Pack to a separate folder. There are two groups of locale files in each Language Pack (corresponding to the two "omni.ja" files in Pale Moon. Delete all files in this new folder, that isn't a "dtd" file.
3. Drag and drop the folder with your "dtd" files to an open Notepad++. All "dtd" files should now be open in Notepad++. Optionally clean up the files by replacing all the double spaces <space><space> in all open files, with a single space. Repeat until there are "0" instances of double spaces. Then, click "Combine" on the plugins menu, to combine all the dtd files into one new file. Then, click "Column sorting" to sort the file. Delete all lines that are not "ENTITY" lines. You're basicly done at this point, except for the issue of "ENTITY" statements that occupied more than one-line prior to the sort. "ENTITY" statements must be terminated by "> and sorting chops off this termination. It's usually easier to find and fix these truncated "ENTITY" statements, prior to moving your combined dtd file to a spreadsheet (which is the next step).
4. The Notepad ++ file is now a 1-column list of ENTITY statements. Copy the list and paste to a spreadsheet. Next edit The ENTITY statements in Notepad++, until all that remains is the "EntityName", followed by one space, followed by one quote mark, followed by the ENTITY value (there should be no "> terminations). Next, replace the space and quote mark with a tab for each line. Then paste this edited ENTITY list to an empty column to the right of the ENTITY statements previously posted. The "EntityName" should now be split out in its own column, followed by the ENTITY string value, also in its own column.
5. Next repeat these steps for the remaining group of locale files.