ai4data/datause-extraction holdout test set performance
World Bank Datause Monitoring Dashboard
Threshold: 0.40
About the Benchmark Dataset (ai4data/datause-holdout)
The evaluation is conducted on the canonical, project-cleaned Holdout v10 dataset (ai4data/datause-holdout). It consists of 1,149 prose-only text chunks (465 positive, 684 negative control records) systematically extracted from diverse humanitarian and development reports.
Costa Rica Results in Education (CORE) PAD (P181174)
Rwanda Socio-economic Inclusion Project II PAD (P509677)
Gold Mining Spillovers in Ghana (Benshaul-Tolonen 2019)
Precision
0.0%
True positive rate
Recall
0.0%
Dataset coverage
F0.5 Score
0.000
Precision-weighted
Filtered Matches
0
Matching filter criteria
TP: 0FP: 0FN: 0
Sub-category counts
About the Training Dataset (ai4data/datause-train)
This is the final dataset used to fine-tune the model adapter (consisting of 2,558 records). It includes the balanced positive dataset mentions combined with 120 pinned hard negatives to minimize out-of-domain hallucinations.