Interchangeable katakana and hiragana in Japanese searches

Talk about code development, features, specific bugs, enhancements, patches, and similar things.
Forum rules
Please keep everything here strictly on-topic.
This board is meant for Pale Moon source code development related subjects only like code snippets, patches, specific bugs, git, the repositories, etc.

This is not for tech support! Please do not post tech support questions in the "Development" board!
Please make sure not to use this board for support questions. Please post issues with specific websites, extensions, etc. in the relevant boards for those topics.

Please keep things on-topic as this forum will be used for reference for Pale Moon development. Expect topics that aren't relevant as such to be moved or deleted.
steverowland

Interchangeable katakana and hiragana in Japanese searches

Unread post by steverowland » 2016-02-24, 23:22

Hello, would it be possible to make katakana and hiragana interchangeable in search similarly to upper and lower case?
One browser that has this functionality is chrome (and its derivatives).

Long explanation:
To explain it a bit so you understand what i mean, in Japanese the terms taken from other languages are written in katakana, for example on some game related website it would be ハンター hanta- as "hunter" - for Japanese typing the default input method is hiragana - the set used for japanese words. So when you quickly try to search only first few syllables on the page, in my example I would type はん (as han - "hun"). Chrome will let me search it that way, but in Palemoon I either have to input as katakana exactly or convert it after its typed out which adds unnecessary search delay.
This feature would also be particularly useful in online communities and searches there (forums, 2ch), as some people will use different set as lot of words, especially internet slang and such or onomatopoeia, are possible in both variants so this would make searching for them easier.

Thank you.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35648
Joined: 2011-08-28, 17:27
Location: Motala, SE

Re: Interchangeable katakana and hiragana in Japanese searches

Unread post by Moonchild » 2016-02-25, 05:14

I'm not even sure where to start for this -- seems like it would need huge translation tables -- is Chrome talking to Google Translate for this, or something? Do you know?
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

_Poke_

Re: Interchangeable katakana and hiragana in Japanese searches

Unread post by _Poke_ » 2016-02-25, 05:39

It's not a translation, just an alphabet. Basically the same as being case insensitive.

So あ=ア, い=イ, etc.

Hiragana are U+3041 through U+3096, and it looks to me that they match perfectly with the Katakana at U+30A1 through U+30F6. (It'd be nice to get confirmation from a native speaker though? There are a few leftover kana which I don't recognise...)

EDIT: Sourced from https://en.wikipedia.org/wiki/Katakana#Unicode and https://en.wikipedia.org/wiki/Hiragana#Unicode

steverowland

Re: Interchangeable katakana and hiragana in Japanese searches

Unread post by steverowland » 2016-02-25, 06:26

Oh no it isn't translating anything, its just the 2 ways to type in Japanese.
Basically in the terms I'm trying to explain, the difference between 'あ' and 'ア' would be like 'a' and 'A'.
What chrome does is it just "merges" the unicode tables I guess, I'm not much of a programmer to know how that would work.

If you look at this table for hiragana https://en.wikipedia.org/wiki/Hiragana_ ... e_block%29 and this table for katakana https://en.wikipedia.org/wiki/Katakana_ ... e_block%29 what the starting from U+3041 (ぁ) to U+3096 (ゖ) should be interchangeable with U+30A1 (ァ) to U+30F6 (ヶ).

If you look at this wiki page https://ja.wikipedia.org/wiki/あ and then all other pages for the other characters (as seen in the table on the right side) you can see they always list both ways and their unicode representation. So all those in the table that have 2 characters next to each other would be the ones that are interchangeable (so basically all except for the small variants of some of the katakana letters but those are not really used and dont have hiragana counterpart so its not important).

Here a picture of how it looks in chrome, when is search あ on that page - you can see it highlights and finds both variants, even though their unicode is different
Image

Also one additional thing, the default way to type numbers on Japanese keyboard is with full width, 1234567890 and these are also not currently interchangeable with standard numbers 1234567890. Once again google chrome manages to merge these unicodes for same characters.
This is for example used when searching websites with dates as some websites will have them as 8月15日 while others as 8月15日 (15th of august). Or other numerical values displayed on forums and similar.
So once again having these be interchangeable in searches would help a lot for Japanese websites.

There are then other characters that you might want to look into from the Japanese set, such as +-*= being +-*= etc, there is more of these and as far as I know all of them are interchangeable in chrome (never had a problem not matching those I searched).

Oh and that reminds me, also all the possible widths of characters, there is for example ア and ア, this also works for A and A - these should also be interchangeable, even the roman letters as some websites might list for example タイプA and the other website タイプA - these are both "type A" but one is in regular width and one on full width.
Also lot of mobile friendly websites will use half width katakana such as タイプ turn into タイプ - it takes less space on smaller screens. So once again all these should be interchangeable (and they are in chrome).

I hope I explained it well enough.

I really love palemoon but for this reason I need to browse Japanese websites in chrome as things not matching searches can get really annoying.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35648
Joined: 2011-08-28, 17:27
Location: Motala, SE

Re: Interchangeable katakana and hiragana in Japanese searches

Unread post by Moonchild » 2016-02-25, 06:37

If I recall correctly though, it's not that simple.
I'm not sure if just ignoring alphabet by default is very user-friendly. Aren't most meaningful Japanese words written in kana usually only written in either hiragana or katakana, not both?
I'm afraid ignoring this will also hit completely meaningless strings.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

steverowland

Re: Interchangeable katakana and hiragana in Japanese searches

Unread post by steverowland » 2016-02-25, 06:54

While most yes, its the other cases that make it harder to search things. While on very formal, official and grammatically and typographically perfect websites you will probably not run into many problems, on all discussion forums, blogs and similar websites not meant to be professional, people use whatever they like. (If I would make an example to the extreme, it is as if you visit an English blog but the person decides to write all nouns starting with capital letter - without the possibility to search without case sensitivity you would have very hard time searching that website).
And often times it is not even incorrect to use the other types, as I mentioned in my previous post with the mobile friendly website displaying half width katakana instead of full width.

Perhaps it could be an option to not have this enabled, just like there is option to "Match case" - in case someone really wants to search exactly the case and width he puts into the line.

_Poke_

Re: Interchangeable katakana and hiragana in Japanese searches

Unread post by _Poke_ » 2016-02-25, 07:03

I suggest attaching it to the Match case checkbox (It might be a little more user friendly to detect kana/japanese input mode and add another dedicated checkbox, but wanting to control them independently sounds like an edge case).
There may be a few cases where the same spelling means different things based on the alphabet, but there are many more cases where people will type in the wrong alphabet, whether it's to be casual or a deliberate stylistic choice.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35648
Joined: 2011-08-28, 17:27
Location: Motala, SE

Re: Interchangeable katakana and hiragana in Japanese searches

Unread post by Moonchild » 2016-02-25, 07:25

Ignoring the width is probably a good thing to do, regardless, since that won't cause any difference in meaning.
I guess I can attach the katakana<->hiragana difference to the "case sensitive" checkbox, indeed -- and leave fine-grained options to extensions if people really need it.
Since I'm not Japanese and don't have Japanese input methods available (now know how to interpret if something is actually matched correctly) i'll have to rely on you to verify if something works as-intended. Are you willing to give some potentially unstable betas a spin once I've had time to look at this?

Are these alphabet characters all offset by a fixed amount in unicode? If so that would make it a lot simpler to code.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

_Poke_

Re: Interchangeable katakana and hiragana in Japanese searches

Unread post by _Poke_ » 2016-02-25, 07:41

Is a set offset something specific? You should be able to add 60 (in hex) to any Hiragana to get it's corresponding Katakana. The half-width characters are in a different order so they probably will need the more complex approach.

I can test when the time comes, though I don't have a physical Japanese keyboard (just the IME conversion from using a US keyboard) so not sure I can test every relevant character.

(If you want to set up the IME keyboard on your own computer you can find it in the language settings.)

steverowland

Re: Interchangeable katakana and hiragana in Japanese searches

Unread post by steverowland » 2016-02-25, 08:12

Yes I can also help testing, thanks for looking into this!

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35648
Joined: 2011-08-28, 17:27
Location: Motala, SE

Re: Interchangeable katakana and hiragana in Japanese searches

Unread post by Moonchild » 2016-02-26, 09:52

Apparently there's a bugzilla bug for this from 2001: bug #71893!
Status "assigned" meaning that "someone is working on it and will supply patches soon" for 15 years and counting :P
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

steverowland

Re: Interchangeable katakana and hiragana in Japanese searches

Unread post by steverowland » 2016-02-29, 14:32

Maybe the assigned person traveled to Japan to become a monk and study the moon runes in the temples there, in order to better understand the issue.