0

I've recently started learning Japanese on my free time, and since I'm a developer, I figured "hey, let's build my own study thingy".

Among other things, I implemented a tokenizer thanks to which I can now split Japanese sentences into lexemes, so I can then easely identify each and every grammar compound in the Japanese sentences I use.

But I'm currently facing one major issue: the tokens I retrieve from the Kuromoji tokenizer are, well, in Japanese, and dispite all my researchs, I have not found any viable translation of the said tokens.

I've tried to use google translate, jisho and romajidesu in order to "guess" the various meanings, but it's far from being a perfect solution.

That's why I came here, wondering whether anyone could help me get a at least translation of the said tokens. I know stackexchange users tend to dislike this kind of post, and I apologize for it, but I figured this was probably the best place to get some help.

Without further ado, I proudly present what I came up with:

Short version (most recuring bits):

接続 became 'conjunction' (based on Jisho's translation of 接続語)

副詞 became 'adverb', 副詞化 became 'adverbization'. I did not actually find a translation for that word, however 化, according to Jisho, may mean 'action of making something; -ification', therefore I translated it as "something that turns words into adverbs', 副詞可能 became 'potential adverb form', because 可能形 translates to, again, according to Jisho, 'potential form​' in linguistics terminology.

助動詞 became 'inflecting dependent word', because it would seem that this is what it translates to (as long as we're talking about Japanese. For any other languages it would mean 'auxiliary').

助詞 became 'particle' (unless a translation would lead me to a very specific terminology)

命令 became 'imperative'

促音便 became 'sukuonbin', since there's no actual translation (according to my (well, Jisho's) knowledge.

基本形 became 'uninflected word' for probably a reason I'm unable to remember as I'm writing these lines. I may have translated the idea that the "fundamental form" of a word is its unchanged form, and therefore "uninflected" form.

未然 became 'unperfective form'

特殊 became 'irregular', because it translated to 'unique, peculiar...'

連用 became 'continuative', since 連用形 translates to 'continuative form'

カ変 became 'kahen'. I hesitated and thought about translating it to 'irregular ka' but I found it to be less understandable, especially when you add "kuru" for instance into the mix.

一段・得ル': I kept the google trad for this one, but I highly doubt it should be translated to "ichidan profit".

上二 became 'upper inflection' because 上二段活用 translates to "conjugation (inflection, declension) of nidan verbs (resulting in a stem of either "i" or "u" for every conjugation)"

下二 beacme 'lower inflection' for the same reason as above.

五段 became godan

イ音便 became ionbin, because it has no real translation.

五段・カ行促音便ユク: godan ka line sukuonbin yuku. I'm guessing the "yuku" is some kind of suffix, or form.

文語 became 'formal'. That's what I came up with, considering it seems to translate to things one would say in written Japanese.

long version (the three objects and all the declinations - at least the ones found in Harry Potter -)

const pos = {
  その他: 'other',
  アルファベット: 'alphabet',
  サ変接続: 'sahen conjunction',
  ナイ形容詞語幹: 'nai adjective',
  フィラー: 'filler',
  一般: 'common',
  並立助詞: 'parallel marker',
  人名: 'person name',
  代名詞: 'pronoun',
  係助詞: 'particle',
  副助詞: 'adverbial particle',
  '副助詞/並立助詞/終助詞': 'adverbial particle / parallel marker / sentence-ending particle',
  副詞: 'adverb',
  副詞化: 'adverbization',
  副詞可能: 'frequency adverb',
  助動詞: 'inflecting dependent word​',
  助動詞語幹: 'inflecting dependent word​ stem',
  助数詞: 'Measure word​',
  助詞: 'particle',
  助詞類接続: 'particle conjunction',
  動詞: 'verb',
  動詞接続: 'verb conjunction',
  動詞非自立的: 'verb non-independent',
  句点: 'period',
  名: 'name',
  名詞: 'noun',
  名詞接続: 'noun conjunction',
  固有名詞: 'proper noun',
  国: 'country',
  地域: 'area',
  姓: 'surname',
  引用: 'quote',
  形容動詞語幹: 'adjectival noun',
  形容詞: 'adjective',
  形容詞接続: 'adjective conjunction',
  感動詞: 'interjection',
  括弧閉: 'closed parentheses',
  括弧開: 'open parentheses',
  接尾: 'suffix',
  接続助詞: 'conjunction particle',
  接続詞: 'conjunction',
  接続詞的: 'conjunction',
  接頭詞: 'prefix',
  数: 'number',
  数接続: 'number conjunction',
  格助詞: 'case-marking particle',
  特殊: 'irregular',
  空白: 'blank',
  終助詞: 'sentence-ending particle',
  組織: 'organization',
  縮約: 'contraction',
  自立: 'independent',
  記号: 'symbol',
  読点: 'comma',
  連体化: 'attributive form​',
  連体詞: 'adnominal adjective',
  連語: 'collocation',
  間投: 'intermittent throw',
  非自立: 'non independent',
};

const conjugatedForms = { ガル接続: 'garu conjunction', 仮定形: 'imaginary form', 仮定縮約1: 'contraction', 仮定縮約2: 'contraction', 体言接続: 'word conjunction', 体言接続特殊: 'irregular conjunction', 体言接続特殊2: 'irregular conjunction', 命令e: 'imperative e', 命令i: 'imperative i', 命令ro: 'imperative ro', 命令yo: 'imperative yo', 基本形: 'uninflected word', '基本形-促音便': 'uninflected word - sukuonbin', 文語基本形: 'uninflected words', 未然ウ接続: 'imperfective u conjunction', 未然ヌ接続: 'imperfective nu conjunction', 未然レル接続: 'imperfective reru conjunction', 未然形: 'imperfective form', 未然特殊: 'irregular imperfective', 現代基本形: 'modern uninflected words', 連用ゴザイ接続: 'continuative gozai conjunction', 連用タ接続: 'continuative ta conjunction', 連用テ接続: 'continuative te conjunction', 連用デ接続: 'continuative de conjunction', 連用ニ接続: 'continuative ni conjunction', 連用形: 'continuative form', 音便基本形: 'uninflected word', };

const conjugatedTypes = { 'カ変・クル': 'kahen kuru', 'カ変・来ル': 'kahen kuru', 'サ変・−スル': 'sahen suru', 'サ変・−ズル': 'sahen -zuru', 'サ変・スル': 'sahen suru', ラ変: 'rahen', 一段: 'ichidan', '一段・クレル': 'ichidan kureru', '一段・得ル': 'ichidan profit', '上二・ダ行': 'nidan upper inflection da line', '下二・ガ行': 'nidan lower inflection ga line', '下二・タ行': 'nidan lower inflection ta line', '下二・ダ行': 'nidan lower inflection da line', 不変化型: 'invariant', '五段・カ行イ音便': 'godan ka line ionbin', '五段・カ行促音便': 'godan ka line sokuonbin', '五段・カ行促音便ユク': 'godan ka line sukuonbin + yuku', '五段・ガ行': 'godan ga line', '五段・サ行': 'godan sa line', '五段・タ行': 'godan ta line', '五段・ナ行': 'godan na line', '五段・バ行': 'godan ba line', '五段・マ行': 'godan ma line', '五段・ラ行': 'godan ta line', '五段・ラ行アル': 'godan ra line + aru', '五段・ラ行特殊': 'godan ra line irregular', '五段・ワ行ウ音便': 'godan wa conjugation', '五段・ワ行促音便': 'godan wa line sukuonbin', '四段・ハ行': 'yodan ha line', '形容詞・アウオ段': 'adjective auodan', '形容詞・イイ': 'ii adjective', '形容詞・イ段': 'adjective idan', '文語・キ': 'formal ki', '文語・ケリ': 'formal keri', '文語・ゴトシ': 'formal gotoshi', '文語・ナリ': 'formal nari', '文語・ベシ': 'formal beshi', '文語・マジ': 'formal mashi', '文語・リ': 'formal li', '文語・ル': 'formal ru', '特殊・ジャ': 'irregular ja', '特殊・タ': 'irregular ta', '特殊・タイ': 'irregular tai', '特殊・ダ': 'irregular da', '特殊・デス': 'irregular desu', '特殊・ナイ': 'irregular nai', '特殊・ヌ': 'irregular nu', '特殊・マス': 'irregular masu', '特殊・ヤ': 'irregular ya', };

This was a long message and if you read it all, I thank you for your patience.

I'm French, so if I messed up some of my sentences to the point where you can't understand them, feel free to ask me to rephrase them.

If you feel like the point of this website is not for these kind of posts, then again, I apologize, and I'll gladly accept any link to other websites whose goal would be to answer to these kind of questions.

Thanks again to anyone willing to help me out.

*** UPDATE: Answer to @naruto's questions and a couple of followup questions ***

Yes these tokens are based on the ipadic dictionary. I should have mentionned that, but I totally forgot. Sorry about that. I tried to find information such as the one you've linked, but I did not find anything. All I found were either technical programming stuff or dead links.

If I may, I'd like to ask you a couple more things, and answer to your questions at the same time:

  • I don't remember seing "imperfective" used somewhere. That's just the word I came up with as I was matching the various translations I found to the French and English grammatical terms I know. It reminded me of English's "perfect" / "pluperfect" and French's "imparfait" / "plus que parfait" tenses.

  • I'm guessing that する is the only important word that belongs to サ変' , like 来る in カ変 ?

  • Regarding 上二 / 下二, I made a mistake in my "short version" , I actually translated them like this: '上二・ダ行': 'nidan upper inflection da line', which is very... verbose. Based on your translation, I guess I could go with 'upper / lower nidan Xa line' ?

  • Regarding 数接続 I really don't know. not only do I have almost zero notions of Japanese grammar (yet!), the tokens I retrieved were retrieved out of context. As I said in my first post, I fed my script all Harry Potter's tomes at once, and I then printed the tokens it had found as well as how many times it had found them.

Here's an updated version of my objects:

const partOfSpeech = {
  その他: 'other',
  アルファベット: 'alphabet',
  サ変接続: 'sahen conjunction',
  ナイ形容詞語幹: 'nai adjective',
  フィラー: 'filler',
  一般: 'common',
  並立助詞: 'parallel marker',
  人名: 'person name',
  代名詞: 'pronoun',
  係助詞: 'particle',
  副助詞: 'adverbial particle',
  '副助詞/並立助詞/終助詞': 'adverbial particle / parallel marker / sentence-ending particle',
  副詞: 'adverb',
  副詞化: 'adverbization',
  副詞可能: 'adverbizable',
  助動詞: 'auxiliary',
  助動詞語幹: 'auxiliary stem',
  助数詞: 'counter​',
  助詞: 'particle',
  助詞類接続: 'particle conjunction',
  動詞: 'verb',
  動詞接続: 'verb conjunction',
  動詞非自立的: 'verb non-independent',
  句点: 'period',
  名: 'name',
  名詞: 'noun',
  名詞接続: 'noun conjunction',
  固有名詞: 'proper noun',
  国: 'country',
  地域: 'area',
  姓: 'surname',
  引用: 'quote',
  形容動詞語幹: 'adjectival noun',
  形容詞: 'adjective',
  形容詞接続: 'adjective conjunction',
  感動詞: 'interjection',
  括弧閉: 'closed parentheses',
  括弧開: 'open parentheses',
  接尾: 'suffix',
  接続助詞: 'conjunction particle',
  接続詞: 'conjunction',
  接続詞的: 'conjunction',
  接頭詞: 'prefix',
  数: 'number',
  数接続: 'suffix to a number',
  格助詞: 'case-marking particle',
  特殊: 'irregular',
  空白: 'blank',
  終助詞: 'sentence-ending particle',
  組織: 'organization',
  縮約: 'contraction',
  自立: 'independent',
  記号: 'symbol',
  読点: 'comma',
  連体化: 'attributive form​',
  連体詞: 'adnominal adjective',
  連語: 'collocation',
  間投: 'interjectory',
  非自立: 'non independent',
};

const conjugatedForms = { ガル接続: 'garu conjunction', 仮定形: 'imaginary form', 仮定縮約1: 'contraction', 仮定縮約2: 'contraction', 体言接続: 'word conjunction', 体言接続特殊: 'irregular conjunction', 体言接続特殊2: 'irregular conjunction', 命令e: 'imperative e', 命令i: 'imperative i', 命令ro: 'imperative ro', 命令yo: 'imperative yo', 基本形: 'uninflected word', '基本形-促音便': 'uninflected word - double consonant euphonic change', 文語基本形: 'kobun uninflected word', 未然ウ接続: 'irrealis u conjunction', 未然ヌ接続: 'irrealis nu conjunction', 未然レル接続: 'irrealis reru conjunction', 未然形: 'irrealis form', 未然特殊: 'irrealis irregular', 現代基本形: 'modern Japanese uninflected word', 連用ゴザイ接続: 'continuative gozai conjunction', 連用タ接続: 'continuative ta conjunction', 連用テ接続: 'continuative te conjunction', 連用デ接続: 'continuative de conjunction', 連用ニ接続: 'continuative ni conjunction', 連用形: 'continuative form', 音便基本形: 'uninflected word', };

const conjugatedTypes = { 'カ変・クル': 'kahen kuru', 'カ変・来ル': 'kahen kuru', 'サ変・−スル': 'sahen suru', 'サ変・−ズル': 'sahen -zuru', 'サ変・スル': 'sahen suru', ラ変: 'rahen', 一段: 'ichidan', '一段・クレル': 'ichidan kureru', '一段・得ル': 'ichidan profit', '上二・ダ行': 'upper nidan da line', '下二・ガ行': 'lower nidan ga line', '下二・タ行': 'lower nidan ta line', '下二・ダ行': 'lower nidan da line', 不変化型: 'invariant', '五段・カ行イ音便': 'godan ka line i euphonic change', '五段・カ行促音便': 'godan ka line double consonant euphonic change', '五段・カ行促音便ユク': 'godan ka line double consonant euphonic change + yuku', '五段・ガ行': 'godan ga line', '五段・サ行': 'godan sa line', '五段・タ行': 'godan ta line', '五段・ナ行': 'godan na line', '五段・バ行': 'godan ba line', '五段・マ行': 'godan ma line', '五段・ラ行': 'godan ta line', '五段・ラ行アル': 'godan ra line + aru', '五段・ラ行特殊': 'godan ra line irregular', '五段・ワ行ウ音便': 'godan wa u euphonic change', '五段・ワ行促音便': 'godan wa line double consonant euphonic change', '四段・ハ行': 'yodan ha line', '形容詞・アウオ段': 'adjective auodan', '形容詞・イイ': 'ii adjective', '形容詞・イ段': 'adjective idan', '文語・キ': 'kobun ki', '文語・ケリ': 'kobun keri', '文語・ゴトシ': 'kobun gotoshi', '文語・ナリ': 'kobun nari', '文語・ベシ': 'kobun beshi', '文語・マジ': 'kobun mashi', '文語・リ': 'kobun li', '文語・ル': 'kobun ru', '特殊・ジャ': 'irregular ja', '特殊・タ': 'irregular ta', '特殊・タイ': 'irregular tai', '特殊・ダ': 'irregular da', '特殊・デス': 'irregular desu', '特殊・ナイ': 'irregular nai', '特殊・ヌ': 'irregular nu', '特殊・マス': 'irregular masu', '特殊・ヤ': 'irregular ya', };

  • Related? https://japanese.meta.stackexchange.com/q/352/9831 – chocolate Sep 22 '20 at 01:27
  • @Chocolate Since it's 4AM here I'll check it out tomorow, but from what I've seen this looks indeed quite promising.

    I don't know whether it'll perfectly fit my needs, but thank you regardless.

    – VoodooFrancis Sep 22 '20 at 02:20

1 Answers1

1

These word classes seem to be roughly based on IPA品詞体系. It includes domain-specific abbreviations and rarer classes from archaic Japanese. Some seem to be coined terms to handle minor exceptions rather than standard grammar terms.

  • 副詞可能 seems to be a domain-specific coined abbreviation to me. Here it means "can be used (also) as an adverb" (or "adverbizable"?). It refers to words like 昨日, 来年, etc.
  • 文語 refers to classical Japanese (also known as kobun).
  • 促音便 refers to "geminating (促) euphonic change (音便)" explained here. 促 means "gemination" or "double consonant".
  • 未然形 is often translated as "irrealis form". See Wikipedia, for example. "Imperfective" may not be wrong but did you see it used somewhere?
  • カ変 is short for カ行変格活用 ("irregular inflection of the ka-line"), but 来る is the only important word that belongs to this.
  • If you translate 五段 as "godan", why don't you translate 上二 as "upper nidan" and 下ニ "lower nidan"?
  • 助数詞 is usually just called "counter" (本, 人, 枚, etc).
  • Isn't 数接続 a subclass of 接頭詞 (suffix)? Then it's "(suffix that) continues to a number" (約 "approx", およそ "about", 計 "in total", etc). It's an explanation rather than an established noun.
  • 間投 is "interjectory".
  • 助動詞 is often translated as "auxiliary verb" or just "auxiliary". See my previous answer for details. "Inflecting dependent word" may not be wrong as a description, though.
naruto
  • 313,860
  • 13
  • 324
  • 625
  • Hello, Thank you for your answer ! Since I can't really write a clean answer as a direct answer to your message, I'll write everything under an "update" part in my original post.

    I will now take a look at your various links! Thanks again!

    – VoodooFrancis Sep 22 '20 at 13:34
  • @VoodooFrancis 数接続 appears only as a subclass of 接頭詞 (prefix) in the 品詞体系 table, and it's not a common grammatical term anyway. Perhaps "pre-number" is a good candidate for 数接続 although it's not very literal. – naruto Sep 23 '20 at 06:13
  • Ok, thanks a lot for your answers and sorry I did not answer earlier, I've been quite busy.

    I'll update everything :)

    – VoodooFrancis Sep 25 '20 at 11:30