A widow in Mombasa types *222*4# on a feature phone. She is checking whether her cash transfer has landed for the month. The number was fixed on 1 January 2025 at two thousand shillings, paid through M-Pesa, into the SIM card registered in her name. It is the seventh of the month, and she is owed it.
Earlier that week, on her son's phone, she had asked an AI assistant how to renew her health insurance. NHIF, she said. The model walked her, patient and certain, through a portal that no longer exists. NHIF was abolished on 1 October 2024. Its replacement, the Social Health Authority, has different log-in pages, different premiums, different appeals.
She is uninsured this morning, and does not know it.
· · ·This is the kind of thing that does not show up in benchmarks. There is no row of MMLU that asks whether a model knows that NHIF was replaced by SHA, or that PM-KISAN's 22nd installment was paid on the thirteenth of March, or that NHIA inherited NHIS in 2022 and that the latter's enrolment portal has been dark for three years. There is no row of HumanEval that asks whether the model leads with the helpline number or buries it. There is no Chatbot Arena vote that registers a missed welfare payment as a wrong answer.
And so the wrong answers, when they come, come quietly.
I went looking for the benchmark that should already exist. I searched HuggingFace for PM-KISAN, for Inua Jamii, for NHIA, for SHA Kenya, for Ayushman Bharat eligibility QA. There were zero benchmark hits. The closest analogue is a UK-only dataset called CitizenQuery whose authors describe it, plainly, as "the first benchmark for citizen queries", and flag multilingual extension as future work.
That sentence is the gap. There is no public LLM benchmark that measures whether a frontier model can give correct, locally anchored welfare guidance to a resident of India, Nigeria, or Kenya. A population of roughly 1.65 billion people. About one in five humans alive today. PM-KISAN has disbursed ₹4.27 lakh crore since 2019. Ayushman Bharat covers 12 crore families. Inua Jamii pays 1.21 million elderly Kenyans every month. NHIA covers 21.7 million Nigerians. The deployment surface is enormous. The evaluation surface is empty.
Three countries. Five hundred ground-truth rows per country, written from scratch against named government sources — the 22nd PM-KISAN installment of ₹18,640 crore on 13 March 2026, Inua Jamii's KSh 2,000 transfer to 1.21 million beneficiaries via M-Pesa *222#, NHIA's 21.7 million enrolees under the Act that replaced NHIS in 2022. A shared 64-column schema. Eleven sections per country. Crisis rows that lead with the helpline before any empathic preamble. Adversarial fraud refusals where the correct answer is no. Status-transition rows that test whether the model knows NHIF is gone. Register variants in Sheng, in Swahili, in Kikuyu, in Luo, in Pidgin, in Hinglish, in eight Indic scripts.
What follows is a reading copy.