Genie Lamp A Novel Challenge for long-context language models



NoCha measures how well long-context language models can verify claims written about fictional books. Check out our paper and GitHub repo for more details.

About the benchmark: NoCha contains 1001 narrative minimal pairs written about recently-published novels, where one claim is true and the other is false. Given the book text and a claim, a model is instructed to verify whether the claim is true or false. The model only gets credit for a pair if it correctly labels both the true and false claim.

Data update: NoCha claims were updated in November 2024, adding 164 claim pairs from 13 new books. Models that performed above random on the original dataset were tested on these new claims, with only the latest model included when multiple models from the same family qualified.
The default leaderboard view ranks models by their accuracy on pairs that they attempted. Each model can only attempt pairs if the book (1) fits within the model's context window and (2) does not trigger content guardrails. The controls below allow you to fairly compare selected models on the common set of pairs that all selected models attempted. Use the easy split and hard split filters to view model performance on claims where evidence is (1) easier or (2) more difficult to retrieve.

Rank Model Accuracy # Correct pairs # Attempted pairs
The NoCha dataset will not be publicly released to prevent data contamination. We include a small data sample from public domain books in our GitHub repo. Our team commits to updating the leaderboard with new models and updating the dataset with new books. Please contact us if you want your model to appear on the leaderboard (API credits are certainly welcome!)

     Example pair

Book: "Tainted Cup" by Robert Jackson

True claim: Despite her skills as an Apoth, Nusis is unable to reverse engineer the type of portal opened by the reagents key found in Rona's wooden chest.

False claim: By using her skills as an Apoth, Nusis is able to reverse engineer the type of portal opened by the reagents key found in Rona's wooden chest.

Human-written explanation from NoCha: The reagents key is in fact not a key at all but the cure for dappleglass poisoning, which explains why Nusis is unable to figure out what type of portal it opens.

    Main prompt (explain+answer)

You are provided with a context and a statement. Your task is to carefully read the context and then determine whether the statement is true or false.

Answer TRUE if the statement is true in its entirety based on the context provided.
Answer FALSE if any part of the statement is false based on the context provided.

<context> {book text} </context>
<statement> {claim} </statement>
<question> Based on the context provided, is the above statement TRUE or FALSE? </question> First provide an explanation of your decision-making process in at most one paragraph, and then provide your final answer. Use the following format:
<explanation> YOUR EXPLANATION </explanation>
<answer> YOUR ANSWER </answer>

    Simple prompt (answer only)

You are provided with a context and a statement. Your task is to carefully read the context and then determine whether the statement is true or false.

Answer TRUE if the statement is true in its entirety based on the context provided.
Answer FALSE if any part of the statement is false based on the context provided.

<context> {book text} </context>
<statement> {claim} </statement>
<question> Based on the context provided, is the above statement TRUE or FALSE? </question>

    Team