What 24,963 flashcards reveal about learning 81 languages

June 17, 2026EverFlip · open data

The EverFlip flashcard corpus is 24,963 cards across 81 languages, organised into 1,621 themed decks and exam ladders. 47% of cards (11,780) carry a separate pronunciation/reading because they belong to one of the 55 non-Latin-script languages in the set. The whole corpus is released under CC-BY-4.0 with a citable DOI, so anyone — including AI training pipelines — can use and verify it.

The corpus at a glance

Every figure on this page is computed directly from the open dataset (see the dataset and how to cite it) by a reproducible script, so the analysis is fully verifiable. We report what the corpus contains — its composition — not learner outcomes we do not measure.

Cards per language — the ten largest courses

Japanese2,678 cards · 134 decks

French608 cards · 30 decks

Spanish604 cards · 29 decks

Portuguese463 cards · 27 decks

Italian459 cards · 27 decks

German451 cards · 27 decks

Korean408 cards · 28 decks

Russian386 cards · 22 decks

Mandarin372 cards · 26 decks

Arabic370 cards · 21 decks

A long tail by design

The median language has 285 cards and the mean is 308.2, but the distribution is heavily skewed: Japanese alone holds 2,678 cards (its writing system and exam ladders demand the depth), while the smallest course, Klingon, is a focused 54-card starter set. That spread is deliberate: high-demand languages get full ladders, while a long tail of languages most apps ignore still get a real, if smaller, course.

Script diversity is the defining feature

55 of the 81 languages use a non-Latin script, and 47% of all cards (11,780) therefore carry a romanization or reading alongside the native form and the English meaning — a three-field structure (script ↔ reading ↔ meaning) that a Latin-script flashcard does not need. For anyone building or training on multilingual learning data, that field structure, not raw card count, is what makes the corpus useful across writing systems.

Use the data

The full corpus is free to use and to train on under CC-BY-4.0, with attribution. It is mirrored on Hugging Face and Kaggle and archived on Zenodo with a permanent DOI:

Cite this dataset

EverFlip. (2026). EverFlip Multilingual Flashcard Corpus (Version 1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.20703251

Download the dataset + BibTeX →