I have around 11,000 Anki cards dedicated to Japanese learning. 4,000 of these are works I discovered on my own outside of study materials: found in books, heard in conversation, etc. These cards are mostly targed torwards recognition, which gives me a ton of breadth, but one thing I consistenlty struggle with is recall.
This is a search index over my Anki cards. It uses a few language models and other heuristics to find good search matches:
- meaning - two embeddings models, a multilingual ( ) and a Japanese-specific model turn each card into vectors. This works well for synonyms, but doesn’t match well when I only remember a part of the word I want to find.
- readings - embedding models treat kana and kanji as unrelated text. A lot of words in Japanese “optionally” have kanji, so seraching くりかえす matches 繰り返す. Same word, different spelling.
- sound - sometimes I misremember a word and type something that only sounds close. I once misremembered ぐにゃり (flabby) as ぐんやり. This converts kana to IPA and compares the words by phonemes, so typos that sound the same still match.
All of these are combined with reciprocal rank fusion, so a card that ranks well in any index floats to the top, and kana queries lean harder on reading and sound than on meaning.
look up words with embeddings — meaning, reading, and sound combined
loading...