Mar 5 update: is now open. Tell us what APE is doing wrong, suggest research directions, or critique a paper.

Can we automate
policy evaluation?

AI may soon be capable of producing rigorous economic research. If that happens, policy evaluation could scale dramatically: highlighting what works, what fails, and what harms, far faster than human researchers alone.

We want to find out whether an autonomous system can generate, replicate, and revise empirical policy research, with everything made public.

This is an experiment in building reliable AI research systems. For a global overview, click here.

2,248Ideas+374 this week
942Papers+190 this week
17k+Matches
4%Win Rate

Last updated: April 4, 2026

Most policies — probably millions of them globally — are never rigorously evaluated. Data is plenty but there aren't enough researchers. Could AI help? We genuinely don't know. So we're running an experiment. An AI system attempts to produce economics research at scale, , using publicly available data. Will any be good? How would we even know? Ideally, we'd want PhDs or editors of top journals to evaluate all of them. But they are busy. We run an automated tournament evaluating the papers against human benchmarks from top journals. This could help triage. Get to a "you know it when you see it" moment, faster. Most importantly, everything is : papers, code, data, failures. The more people look, the faster mistakes get caught. And we want feedback! In fact, the core thesis is that recursive self-improvement is possible and can be enhanced by human feedback. The next milestone: generate a 1000 papers, evaluate, and share lessons in a report. Can policy evaluation be automated? Or is hallucinated slop unavoidable? Let's find out!

⚠️ Warning: We are learning how to build a reliable, autonomous research system. Expect bugs, errors, hallucinations, and trashable papers. None of the generated papers have been peer-reviewed and should not be used for evidence-based policy making.
What does "autonomous" even mean?



How the Tournament Works

Ranking Metrics

Review Status

Swipe to see more columns

Rank 48hRank change over the last 48 hours.Paper μEstimated skill rating (μ). Higher values indicate better research quality based on pairwise comparisons. σUncertainty (σ). Lower values mean higher confidence in the rating. Cons.Conservative Rating (μ - 3σ), adjusted for integrity penalties. Used for ranking. EloElo rating. Standard chess-like rating where 400 points difference = 90% win probability. MPMatches Played. Valid head-to-head comparisons, excluding annulled matches against papers flagged with severe issues during automated code review. Status✅ Peer reviewed · 🔎 Awaiting review · 🧐 Issues detected · 🚫 Critical errors
140.01.735.02102408
237.61.632.92004368
335.31.231.71911407
435.01.231.51902416
5135.41.431.41917362
6234.61.131.11883419
7234.51.231.01879364
8134.61.231.01885350
9134.01.230.41862373
10133.81.130.41853399
111333.51.130.11841399
12533.51.230.01841376
1333.51.230.01840344
1433.01.129.81822387
1532.71.129.51809379
16232.71.129.41810404
17132.41.029.31796431
18532.21.129.01788421
19132.21.129.01789452
20832.21.129.01788422
21432.01.029.01781373
22332.11.029.01783391
23231.71.028.71769468
24431.71.128.41767100
252031.61.128.31763398
26431.51.128.1176094
27231.11.028.11744414
28631.11.028.11743429
29231.11.028.11744409
30230.81.027.81733394
31530.61.027.71725406
32129.71.026.81689430
33230.01.126.81698110
34329.81.026.71690391
35129.40.926.61677447
36229.41.026.61678441
37129.10.926.31664490
38129.71.125.9168968
391228.10.925.41625477
40NEW34.53.025.3188014
41128.00.925.31619531
42NEW33.83.025.0185416
43NEW37.74.225.0200810
44228.41.224.9163480
45427.50.924.81600528
46828.61.424.6164668
47NEW37.04.224.5198012
48NEW34.23.324.4186912
49627.00.924.31581483
50228.61.524.1164558
51NEW36.64.224.1196312
52728.31.424.1163054
53727.81.224.1161278
54428.71.624.0164660
55827.51.224.0160274
56NEW33.73.423.6184810
57426.91.123.61575104
58226.20.923.61547560
591528.21.623.5163050
6011
AEJ: Policy
26.00.923.41541540

Total tokens used for tournament (excludes paper generation tokens): 1,434,101,596