Why flashcards work: the science of spaced repetition and retrieval practice

Flashcards seem almost too simple to be a serious learning tool. A question on one side, an answer on the other, shuffled through over and over. And yet, when paired with two specific principles, they consistently outperform almost every other technique cognitive scientists have measured for committing facts to long-term memory.

Those two principles are spaced repetition (reviewing material at increasing intervals) and retrieval practice (forcing yourself to recall an answer rather than re-reading it). Each has more than a century of evidence behind it. Together, they are why a deck of flashcards, used correctly, can move information into memory that lasts years rather than hours.

This article walks through the evidence — what we know, who showed it, and where the limits are.

The forgetting curve

In 1879, a German psychologist named Hermann Ebbinghaus began an extraordinary series of experiments on himself. Over several years, he memorized thousands of nonsense syllables — meaningless three-letter combinations like DAX, BUP, or LOC — and then carefully measured how much he could relearn after various delays. The results were published in 1885 as Über das Gedächtnis (translated into English in 1913 as Memory: A Contribution to Experimental Psychology).

What he found is now called the forgetting curve. Memory loss is not gradual or steady. It is steep at first, and then almost flat. In Ebbinghaus's own words, "forgetting would be very rapid at the beginning of the process and very slow at the end".

The specific numbers, in his own data, were stark. One hour after learning a list, he had to expend roughly half the original effort to relearn it — half of what he had just memorized was effectively gone within an hour. After that, the rate of loss slowed. "After 24 hours about one third was always remembered; after 6 days about one fourth, and after a whole month fully one fifth of the first work persisted in effect".

For more than a century, this was treated almost as folk wisdom — quoted everywhere, but rarely re-tested. In 2015, Jaap Murre and Joeri Dros at the University of Amsterdam ran a careful modern replication, with one subject learning 70 lists of 104 nonsense syllables across roughly 70 hours of experimental work, with relearning intervals of 20 minutes, 1 hour, 9 hours, 1 day, 2 days, 6 days, and 31 days — the same intervals Ebbinghaus had used. Their conclusion: "the Ebbinghaus forgetting curve has indeed been replicated". Savings dropped sharply across the short intervals and then leveled off, just as Ebbinghaus had described 130 years earlier.

This is the foundation of every spaced repetition system. If you do nothing, most of what you learn today will be gone by tomorrow. But the shape of the curve hints at the fix: forgetting slows down dramatically once material has survived a delay. If you can review just before that drop bottoms out, you reset the curve, and each subsequent decline is shallower than the last.

The spacing effect

Reviewing material is not equally useful at every moment. Spacing — separating study sessions in time rather than cramming them together — produces dramatically better long-term retention.

The canonical evidence is a 2006 meta-analysis by Nicholas Cepeda, Harold Pashler, Edward Vul, John Wixted, and Doug Rohrer, published in Psychological Bulletin. They synthesized 839 assessments of distributed practice across 317 experiments in 184 articles — to date the most comprehensive quantitative review of the spacing effect in verbal recall.

The finding was striking in its consistency. Only 12 of 271 comparisons of massed and spaced performance showed no effect or a negative effect from spacing. In other words, in over 95% of comparisons, spacing your study sessions beat cramming them. For studies with retention intervals longer than one month, the average benefit of distributed practice over massed practice was 15% in recall accuracy.

But the deeper finding from Cepeda's review was about how to space. The optimal gap between study sessions is not fixed. "The ISI [interstudy interval] producing maximal retention increased as retention interval increased". Plain English: if you need to remember something for a week, study it again in a day. If you need to remember it for a year, study it again in weeks. Longer-term retention demands longer gaps between reviews.

This is exactly what a well-tuned flashcard system does for you automatically. You do not have to calculate the optimal gap; the algorithm does.

Retrieval practice (the testing effect)

Spacing is half the story. The other half is what you do during a study session.

There are two basic options. You can re-read the material — let your eyes pass over the page or the front of the card with the answer visible. Or you can retrieve it — look at the question, force yourself to produce the answer from memory, and only then check whether you got it right.

For decades, students have assumed re-reading is the safer bet. The research says the opposite, and the most cited demonstration is Jeffrey Karpicke and Henry Roediger's 2008 paper in Science, "The critical importance of retrieval for learning".

Karpicke and Roediger had university students learn Swahili–English vocabulary pairs. After an item had been correctly recalled once, students were placed into one of four conditions: keep studying and testing the item, drop it from testing (study only), drop it from studying (test only), or drop it entirely. A week later, they tested everyone.

The result was unambiguous. "Repeated studying after learning had no effect on delayed recall, but repeated testing produced a large positive effect". Items that had been retrieved repeatedly were remembered roughly twice as well as items that had only been re-studied. The authors concluded their work demonstrated "the critical role of retrieval practice in consolidating learning".

This is the testing effect, and it has been replicated in dozens of subsequent studies. It is also why a flashcard with the answer hidden is fundamentally different from a flashcard with the answer visible. The act of trying — even of failing and then seeing the answer — is doing the work.

A flashcard deck, used properly, layers both effects together. Each card is a retrieval attempt (testing effect), scheduled at growing intervals (spacing effect). This is unusual. Most study techniques exploit one principle or the other. Flashcards exploit both at once.

From SuperMemo SM-2 to FSRS

The practical history of spaced-repetition software starts with one person: Piotr Wozniak. Beginning in 1985 as a student in Poland, Wozniak began experimenting with optimal review schedules for his own studies. His results became the SuperMemo project, and in December 1987 he wrote the first version of the program in Turbo Pascal.

The scheduling algorithm in those early versions, used in SuperMemo 1.0–3.0 between December 13, 1987 and March 9, 1989, is now known as SM-2. Despite its age, SM-2 remained the default scheduler in Anki — the dominant open-source flashcard program — for roughly two decades, and many other flashcard apps still use it.

The SM-2 logic is straightforward. Each card has an easiness factor (E-Factor). New items start with an E-Factor of 2.5, and the value is not allowed to fall below 1.3. After each review, you grade how well you remembered the card on a 0–5 scale, and the E-Factor is adjusted up or down. The next interval is calculated using the rule I(1) = 1 day, I(2) = 6 days, and for n > 2, I(n) = I(n-1) × EF — meaning each successive interval grows by an approximately constant multiplier specific to that card.

SM-2 works. It is also crude. It treats every card the same in early reviews, ignores the actual probability of forgetting, and gives no easy way to target a specific retention rate.

In 2022, Jarrett Ye and the open-spaced-repetition community released FSRS (the Free Spaced Repetition Scheduler). FSRS is built around an explicit memory model with three variables per card — difficulty, stability, and retrievability — fitted to data rather than chosen by intuition. Each card has its own forgetting curve, and the next review is scheduled when the predicted probability of recall hits a user-chosen target (e.g., 90%).

The most striking thing about FSRS is the scale of evidence behind it. The open-spaced-repetition team maintains a public benchmark using 349,923,850 reviews across 9,999 Anki user collections — by a wide margin the largest comparative dataset of spaced-repetition algorithms ever assembled. On that benchmark, FSRS produces lower prediction error than SM-2 across virtually every collection size and review pattern. Anki integrated FSRS as a built-in option starting with version 23.10 in late 2023.

The practical difference for a learner is that FSRS converges on the schedule that actually works for your memory and this card, rather than the average behavior assumed in 1987.

Practical evidence that flashcards work

Laboratory results are one thing. Real classrooms are another. Two strands of applied research are particularly relevant.

First, the foundational classroom evidence: the Karpicke and Roediger work cited above was conducted with university students learning foreign-language vocabulary, where repeated retrieval produced large gains in one-week delayed recall while repeated study after initial learning had no effect. Vocabulary acquisition is the exact use case for flashcards, and the effect size is consistent.

Second, the cumulative weight of the meta-analysis. Cepeda et al.'s 839-assessment review covered classroom and laboratory contexts, children and adults, and a range of materials including text, lists, and educational content — and the spacing benefit held across them. Spaced retrieval is one of the most robust findings in cognitive psychology, not a niche laboratory artifact.

Medical education has converged on the same conclusion independently. Anki decks are now standard preparation for the United States Medical Licensing Examination, and the underlying mechanism — spaced retrieval of factual associations — is exactly what the Ebbinghaus, Cepeda, and Karpicke literatures predict should work.

Honest limitations

None of this means flashcards are the right tool for every learning task. They are not.

They are excellent for: vocabulary, definitions, formulas, dates, anatomy, kanji, pharmacology, capitals, names, arbitrary mappings, and any other domain where you need to retrieve a specific piece of information from a specific cue, reliably, for a long time. The Karpicke result was on vocabulary pairs precisely because that is where retrieval practice has the cleanest effect.

They are weaker for: conceptual understanding that requires building mental models, skills that depend on transfer (like programming or writing), and one-off facts you genuinely will not need again. Karpicke and Roediger's own subsequent work, and broader reviews of the testing effect, note that the benefits are clearest for the kind of discrete, paired-associate material that flashcards naturally encode. For deep conceptual learning, flashcards can support memory of the building blocks but cannot replace working through problems, writing things out, or teaching the material to someone else.

They are also overkill for material you genuinely use daily. If you speak a language every day, you do not need to schedule reviews of its common words. The forgetting curve only matters when material would otherwise decay in the absence of natural use.

The bottom line: spaced repetition with active retrieval is one of the best-evidenced tools in cognitive psychology for moving discrete information into long-term memory. It is not a substitute for thinking, practicing, or doing. Used for what it is good at, it works almost embarrassingly well.

How EverFlip shows your progress

Every deck has a small progress bar with four states. Here is exactly what each colour means:

⬜ New — cards you haven't started yet.
🔵 Learning — cards you've seen, but that haven't yet survived enough spaced reviews to stick. They're bedding into memory.
🟡 Due — cards that are ready to review right now. These are your cards for today.
🟢 Mastered — cards the algorithm is confident you'll still remember weeks from now (a stability of at least 21 days).

Mastery is earned over time, not in one sitting. This is the most common misunderstanding, so it's worth being blunt: a card does not become "mastered" because you pressed Easy. It becomes mastered because you've recalled it correctly across reviews that are spaced further and further apart, until the algorithm calculates it will survive at least three weeks without a reminder. You can't shortcut that by hammering Easy in a single session — the calendar has to do its work. That's not a limitation; it's the whole point. Real memory is built by spacing, and the bar is honestly reflecting that.

The three modes

Study (the default) — the real thing. You see the cards that are new or due, you rate how each recall went, and your answers set when you'll see each card next. This is the only mode that drives your schedule forward.
Rehearse — re-runs cards you've already studied, in random order. Your ratings still update the schedule, so it's useful before a test — but it won't instantly turn cards green; mastery still matures over real time.
Practice — pure, pressure-free drilling through every card. It has zero effect on your stats or schedule. Use it to warm up or browse a deck without consequences.

How often should you review?

Once a day is the honest answer. EverFlip introduces a handful of new cards per deck each day (7 by default) and shows you whatever is due — that's your session. Once you've cleared a deck's due cards, you're done with it for the day: re-reviewing cards that aren't due yet doesn't help, because the spacing is the medicine. Come back tomorrow and the algorithm will have the right cards waiting. Little and often beats long cramming sessions — that's the entire finding of the spacing literature above, applied to your day.

References

Ebbinghaus, H. (1885 / 1913). Memory: A Contribution to Experimental Psychology (H. Ruger & C. Bussenius, Trans.). New York: Teachers College, Columbia University. archive.org · online edition: psychclassics.yorku.ca
Murre, J. M. J., & Dros, J. (2015). Replication and analysis of Ebbinghaus' forgetting curve. PLOS ONE, 10(7), e0120644. plos.org
Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132(3), 354–380. laplab.ucsd.edu
Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of retrieval for learning. Science, 319(5865), 966–968. science.org · PubMed
Wozniak, P. A. (1990). Application of a computer to improve the results obtained in working with the SuperMemo method (algorithm SM-2). super-memory.com
Open Spaced Repetition. (2023–). SRS Benchmark. GitHub repository. github.com/open-spaced-repetition/srs-benchmark

Last updated 2026-05-31. Citations verified by adversarial 3-vote check — 25/25 claims confirmed against primary sources, 0 refuted. To suggest a correction, use the contact form.