This program accepts input in ローマ字 (rōmaji) and produces the equivalent ひらがな (hiragana) and カタカナ (katakana) output. This program does not map arbitrary English phonemes into the Japanese sound system, nor does it translate from English to Japanese.

Transliteration is done by use of an associative array (hash table) that contains mappings from syllables written in Latin letters to 仮名 (kana). ひらがな may be produced by entering lowercase letters, カタカナ by using uppercase. Some punctuation is also allowed, though the Romanization for the lesser-used brackets is quite non-standard.

Some attempt has been made to simultaneously support the most common Romanization systems (修正ヘボン式ローマ字 (Modified Hepburn), 訓令式ローマ字 (Kunrei-siki), and 日本式ローマ字 (Nihon-siki), though completely meeting that goal is impossible. Where different systems conflict, precedence has been given to 日本式ローマ字 since it is the most regular and has a 1-to-1 relation between 仮名 and rōmaji.

Unrecognized characters in the input will be returned unprocessed but highlighted in a different color.

Quotes

Apostrophes (single quotes) are used to differentiate ん (syllabic n) from a doubled consonant when it appears before one of な に ぬ ね の. They are similarly used to prevent ん from binding with vowels and y syllables. For example, 「んな」, 「んあ」, and 「んや」 ("n'na", "n'a", and "n'ya") all require an apostrophe, while 「っな」, 「な」, and 「にゃ」 ("nna", "na", and "nya") do not.

An attempt has been made to translate Latin quotes properly such that regular double and single quotes as are used in English will be converted into left and right corner brackets (「」『』) as appropriate. However, there is a peculiarity with the way that white corner brackets (『 and 』) are represented:

Normally single quotes would be used to represent white corner brackets, but this is not possible since single quotes have already been used as described previously. Instead, white corner brackets are represented by doubled double quotes (""). This is admittedly non-standard and may change if a better solution is discovered.

A limitation of the quote translation is that quotes must be balanced. That is, if a set of regular corner brackets is opened, it is necessary to close it before the next set of the same type of bracket may be opened. For example, this ordering of brackets is possible: 「『』『』」「」 (and may be generated with this rather improbable sequence: "'""'""'""'""'"'"'") while this one is not: 「『」』.

X Characters

Some characters are difficult to represent in rōmaji. These characters' Romanization has been preceded with an "x". The cases where "x" is used are:

Output

Several output encodings are available. UTF-8 is the preferred encoding, and all modern web browsers should natively support it. ISO-8859-1 encoding with XHTML &#codes; is provided as a second option for older browsers that do not properly handle Unicode. In addition, the following legacy encodings are provided: EUC-JP, Shift_JIS, ISO-2022-JP. Please note that the legacy encodings have not been as thoroughly tested.

All pages except the Source Code page inherit the encoding set on the main page. The Source Code page always uses UTF-8 since that is what the script is actually encoded in and a different encoding may break it.

Meta

The idea for this program was borrowed from Joel Yliluoma (Bisqwit)'s Romaji to hiragana converter. Though the source for Bisqwit's program is generously provided, no actual code was borrowed.

The main reasons for writing this program were to gain a deeper understanding of rōmaji to 仮名 transliteration, to play with Unicode and other encoding methods for non-English languages, to work more with 仮名, and to start working with 漢字 (kanji) (the latter only appears in the comments and documentation for now).

The previously linked Wikipedia articles, as well as several others linked from the Japanese Writing entry were helpful in researching how transliteration is done. The Kotoeri input method used on Mac OS X was also helpful in deciding how to deal with various edge cases.