Skip to content

Conversation

@Luc-Mcgrady
Copy link
Contributor

Forum link:

@snorpdorp notified me that the logic behind the ratio graph is flawed as it doesn't properly account for the number of cards that are memorized to begin with.

I have implemented his (and @1DWalker's) proposed solution

image

Now the graph is inverted so that the higher the value, the more efficient it is, in @snorpdorp's own words:

The previous equation was flawed in that it displayed the total knowledge at the end of the simulation, not the relative gain in knowledge from the amount of studying done during the simulation timeframe. The new (correct) equation is (relative gain in knowledge as a result of studying during the simulation) / (time spent studying during simulation) - https://discord.com/channels/368267295601983490/1443273721878937650/1444447278344437833

@Luc-Mcgrady Luc-Mcgrady changed the title Feat/Different method for ratio graph in dr graph. Feat/Different method for ratio graph in Help Me Decide. Nov 29, 2025
@user1823
Copy link
Contributor

user1823 commented Nov 30, 2025

For me, the "per time" graph is not interpretable at all because the value of time recorded doesn't match the time I spend reviewing. So, as dae said in #4199 (comment), I suggest replacing it with a "per review" graph.

For the time graph, you've already pointed out that it's wrong for a bunch of people, and keeping it around adds more visual noise for everyone. Given that reviews should be a roughly equivalent curve, wouldn't it make more sense to focus on the most useful modes that don't have caveats?

I read Luc's comment in https://forums.ankiweb.net/t/replace-cmrr-with-workload-vs-dr-graph-more/63234/81:

I don’t want to make per review the only option. I think the impact of failed reviews taking longer is a key part of what the graph is trying to display.

I don't think that "failed reviews taking longer" is a significant enough factor (the graphs below are very similar). But, still, if you want to keep both the graphs, you can rename "Memorized / Time Ratio" to "Efficiency" and then add a toggle below to choose between reviews and time.

My opinion is that we should keep only the “per reviews” graph because it shows the same information and is not affected by bad review time data.

Here's a diff for my changes:

diff --git a/ftl/core/deck-config.ftl b/ftl/core/deck-config.ftl
index f4aca0ec1..0acce8541 100644
--- a/ftl/core/deck-config.ftl
+++ b/ftl/core/deck-config.ftl
@@ -513,8 +513,9 @@ deck-config-save-options-to-preset-confirm = Overwrite the options in your curre
 # to show the total number of cards that can be recalled or retrieved on a
 # specific date.
 deck-config-fsrs-simulator-radio-memorized = Memorized
-deck-config-fsrs-simulator-radio-ratio2 = Memorized / Time Ratio
-deck-config-fsrs-simulator-ratio-tooltip2 = { $time } memorized cards per hour
+deck-config-fsrs-simulator-radio-efficiency = Efficiency
+deck-config-fsrs-simulator-efficiency-hour-tooltip = { $efficiency } cards memorized per hour
+deck-config-fsrs-simulator-efficiency-reviews-tooltip = { $efficiency } cards memorized per 100 reviews

 ## Messages related to the FSRS scheduler’s health check. The health check determines whether the correlation between FSRS predictions and your memory is good or bad. It can be optionally triggered as part of the "Optimize" function.
diff --git a/ts/routes/deck-options/SimulatorModal.svelte b/ts/routes/deck-options/SimulatorModal.svelte
index 434001f6f..374e1b37b 100644
--- a/ts/routes/deck-options/SimulatorModal.svelte
+++ b/ts/routes/deck-options/SimulatorModal.svelte
@@ -571,7 +571,7 @@ License: GNU AGPL, version 3 or later; http://www.gnu.org/licenses/agpl.html
                                         value={SimulateWorkloadSubgraph.ratio}
                                         bind:group={simulateWorkloadSubgraph}
                                     />
-                                    {tr.deckConfigFsrsSimulatorRadioRatio2()}
+                                    {tr.deckConfigFsrsSimulatorRadioEfficiency()}
                                 </label>
                                 <label>
                                     <input
diff --git a/ts/routes/graphs/simulator.ts b/ts/routes/graphs/simulator.ts
index b76237459..2d7dec3b4 100644
--- a/ts/routes/graphs/simulator.ts
+++ b/ts/routes/graphs/simulator.ts
@@ -69,7 +69,7 @@ export function renderWorkloadChart(
     const subgraph_data = ({
         [SimulateWorkloadSubgraph.ratio]: data.map(d => ({
             ...d,
-            y: (60 * 60 * (d.memorized - d.reviewless_end_memorized)) / d.timeCost,
+            y: (100 * (d.memorized - d.reviewless_end_memorized)) / d.count,
         })),
         [SimulateWorkloadSubgraph.time]: data.map(d => ({ ...d, y: d.timeCost / d.learnSpan })),
         [SimulateWorkloadSubgraph.count]: data.map(d => ({ ...d, y: d.count / d.learnSpan })),
@@ -91,7 +91,7 @@ export function renderWorkloadChart(

     const formatY: (value: number) => string = ({
         [SimulateWorkloadSubgraph.ratio]: (value: number) =>
-            tr.deckConfigFsrsSimulatorRatioTooltip2({ time: value.toFixed(2) }),
+            tr.deckConfigFsrsSimulatorEfficiencyReviewsTooltip({ efficiency: value.toFixed(2) }),
         [SimulateWorkloadSubgraph.time]: (value: number) =>
             tr.statisticsMinutesPerDay({ count: parseFloat((value / 60).toPrecision(2)) }),
         [SimulateWorkloadSubgraph.count]: (value: number) => tr.statisticsReviewsPerDay({ count: Math.round(value) }),

Not at all important but reviewless_end_memorized feels slightly clunky. What about renaming it to end_knowledge_no_reviews? To mirror that, rename memorized to end_knowledge.

@user1823
Copy link
Contributor

user1823 commented Nov 30, 2025

For future reference, I want to mention the reasoning for using reviewless_end_memorized here:

i'm not against ΣR{start}, i just think that it is more correct to use a sum of R with no reviews, at the same simulation end time.
example: i have 100 knowledge. in 1 year, i'll have 50 due to decay. If I do 50 reviews instead, I'd have 100 knowledge again, what is my efficiency? Is it 0 because I did not gain new knowledge? or 50 because if I did not do these reviews, I would've lost knowledge?

if the objective is to increase sum(R_at_end_of_simulation) as much per unit workload, it is natural to optimize for (R(end of simulation, cost) - R(end of simulation, 0 cost)) / cost

Written by @1DWalker in https://discord.com/channels/368267295601983490/1443273721878937650/1443293525436403815 & https://discord.com/channels/368267295601983490/1443273721878937650/1443345283382640885

@snorpdorp
Copy link

For me, the "per time" graph is not interpretable at all because the value of time recorded doesn't match the time I spend reviewing. So, as dae said in #4199 (comment), I suggest replacing it with a "per review" graph.

As of right this second, the graph in the public release uses a flawed formula that shows nonsense results to the user in a way that they will believe it as meaningful which could lead to users making mis-informed decisions on the #1 most important parameter for their study plans.

Some change needs to go through ASAP to get a graph into the upcoming public release.

I think the order in which we do things should be to first A) Get a change through, and then after that B) worry about "per review" or "per time" or having both or what, even if the time estimation is imperfect. (I think there are both pros and cons to both the current implementation and your suggestion, but ultimately, the two graphs are very similar and lead to the user making the same decisions and one needs to go through now.)

doesn't match the time I spend reviewing

Is it that it is a completely unrelated number, or that it is just imprecise? It's been a while since I saw the calibration graph for simulator estimation of review time vs. actual review time. I would assume that the time estimation was done in such a way that, at least in the DR region around where most of the user reviews are done that it is mostly accurate.

Not at all important but reviewless_end_memorized feels slightly clunky. What about renaming it to end_knowledge_no_reviews? To mirror that, rename memorized to end_knowledge.

I have no objections. It's basically the same thing and I will leave the code readability questions to the software developers.

@user1823
Copy link
Contributor

I think the order in which we do things should be to first A) Get a change through, and then after that B) worry about "per review" or "per time" or having both or what, even if the time estimation is imperfect.

A new release isn't going to happen so soon anyway. So, we will have sufficient time to decide whether it should be "per review" or "per time" or both.

But, I agree that if a new release is being made and the decision isn't made till then, it would be a good idea to merge this PR as it is and make a new PR for B.

I would assume that the time estimation was done in such a way that, at least in the DR region around where most of the user reviews are done that it is mostly accurate.

There are many factors that aren't even captured by Anki. For example, if the user uses a Pomodoro-like reviewing strategy, Anki captures only the time spent reviewing but the total time spent (as perceived by the user) also includes the breaks. As another example, the maximum answer seconds setting limits the amount recorded by Anki, but the user might be consistently taking more time than that limit.

@snorpdorp
Copy link

snorpdorp commented Nov 30, 2025

A new release isn't going to happen so soon anyway.

I believe this fix alone is enough to justify speeding up the release schedule to have a new release ASAP.

As I wrote above:

As of right this second, the graph in the public release uses a flawed formula that shows nonsense results to the user in a way that they will believe it as meaningful which could lead to users making mis-informed decisions on the #1 most important parameter for their study plans.

the maximum answer seconds setting limits the amount recorded by Anki, but the user might be consistently taking more time than that limit.

Isn't the default setting for that 60s? I'm not like, neuroscientist, but if somebody is consistently going over that much time to recall something from long-term memory into short-term memory, they're definitely a severe outlier.

Like, in my own personal history, if I can't recall something within 15s of trying, I have only an extremely low chance of recalling it within 30s. Out of my gajillions of reviews, I don't think I've ever taken 30-60s to actually recall information. (Although I might get distracted and look away from my Anki screen during a review so that such a review-length is recorded.)

@user1823
Copy link
Contributor

if somebody is consistently going over that much time to recall something from long-term memory into short-term memory, they're definitely a severe outlier

What if the card asks them to explain something, not just to recall a fact? Explaining something can easily take longer than 60s.

Secondly, as you said, the user might get distracted, which can cause them to take longer. If the user has to do more reviews, they have higher chances of getting distracted in between. So, we should account for that time too when calculating the efficiency. But, the recorded times are limited to 60s, which can impact interpretability.

@snorpdorp
Copy link

snorpdorp commented Nov 30, 2025

What if the card asks them to explain something, not just to recall a fact? Explaining something can easily take longer than 60s.

I was under the impression that SRS/FSRS/Anki were all designed around the concept of the brain's forgetting curve, which only works for "recalling a basic fact from long-term memory into short-term memory", and that "explaining something" is not something that is suitable for SRS, and/or SRS would be (horribly) inefficient when used as such. As such, I don't think many users are using it in that manner, and if they are, they shouldn't. (Unless there's some reason it's actually effective that I'm unaware of. I've had horrible experiences with ineffectiveness trying anything more than just 1-2 basic facts at a time on a card.)

Secondly, as you said, the user might get distracted, which can cause them to take longer. If the user has to do more reviews, they have higher chances of getting distracted in between. So, we should account for that time too when calculating the efficiency. But, the recorded times are limited to 60s, which can impact interpretability.

In some way shape or form, there is going to be a mismatch between the time the app sees and how often the user wanders off and does something while the app is running. Simply having a 60000ms cutoff seems to be a crude-but-sufficient approach to the situation.

I suppose a more effective approach would be to take all the user review times, put it into bins, fit a Gaussian curve to it, filter out outliers, repeat the Gaussian curve, repeat the outlier removal, and repeat until there's a convergence on μ, and then use μ as the effective time, but that's kind of... way over-engineering the solution to a problem which is relatively minor. (Although it can be done.)

Even if you do that it'll still be biased and subject to mis-fittings. I think you'll need some mathematician to look at the user's actual time spent per review histogram to find Gaussian curves to see where each process yields to each bump in the histogram and/or if any of the data is meaningful.

However, at the end of the day, "time spent studying" is a simple idea for the human brain of the end-user to understand, and furthermore, what the end user probably actually wants. Simpler is better. "Memorized per 100 reviews" might be more mathematically accurate, but it's harder for the users to interpret that information into meaningful action. The added complexity will discourage users from using the feature in the first place, limiting it to only power users.

For a couple percent error in calculating the time graph (assuming it is just a few percent), I think the time graph is the way to go. I'd only suggest the "/100 reviews" if the time estimation is way off or something (e.g. 15% or more). (You could also include both for power-users who really want to look at the statistical accuracy of the prediction.)

There's also the fact that lower R cards tend to have longer recall times. I think users care more about total time than they do about number of recalls, even if the estimation is slightly inaccurate.

@JSchoreels
Copy link
Contributor

JSchoreels commented Nov 30, 2025

To be fair, I'd like to have both options, to see efficiencies as reviews or time.

Sure time seems to make more sense, but a lot of times people are focused on "how many reviews per day", and giving that info would save lot of futures questions about it from people that (like me) like to afk during mid-review or to do a few cards between two discord discussions :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants