Artifact Format Evaluation Harness

This report compares template artifacts and API-key-free agent-corpus artifacts. Reader scores combine answer accuracy, findability, visual checks, and interaction smoke tests; this is still not a human-study result.

code-review · templates

Profile Winners

human_reviewer: markdown (0.916)
agent_reader: markdown (0.912)
security_sensitive: markdown (0.927)
accessibility_first: markdown (0.950)
cost_sensitive: markdown (0.858)

Where HTML helped

html-static: comprehension 0.000, reviewability 0.058
html-svg: comprehension -0.060, reviewability 0.117
html-interactive: comprehension -0.030, reviewability 0.143

Format Scores

Format	Reader	Review	A11y	Security	Cost	Best profile
markdown	1.000	0.782	1.000	1.000	0.723	accessibility_first: 0.950
html-static	1.000	0.840	1.000	1.000	0.289	accessibility_first: 0.931
html-svg	0.940	0.899	1.000	1.000	0.200	accessibility_first: 0.924
html-interactive	0.970	0.925	1.000	1.000	0.273	accessibility_first: 0.932
json-renderer	1.000	0.364	1.000	1.000	0.218	accessibility_first: 0.904
notebook	0.850	0.659	1.000	1.000	0.536	accessibility_first: 0.920

scores.by-format.json · evidence.by-format.json

code-review · agent-corpus

Profile Winners

human_reviewer: markdown (0.935)
agent_reader: markdown (0.933)
security_sensitive: markdown (0.936)
accessibility_first: markdown (0.958)
cost_sensitive: markdown (0.905)

Where HTML helped

html-static: comprehension -0.220, reviewability -0.465
html-svg: comprehension 0.000, reviewability -0.569
html-interactive: comprehension 0.000, reviewability -0.503

Format Scores

Format	Reader	Review	A11y	Security	Cost	Best profile
markdown	1.000	0.818	1.000	1.000	0.842	accessibility_first: 0.958
html-static	0.780	0.354	1.000	1.000	0.401	accessibility_first: 0.891
html-svg	1.000	0.249	1.000	1.000	0.200	accessibility_first: 0.897
html-interactive	1.000	0.315	1.000	1.000	0.227	accessibility_first: 0.902
json-renderer	0.740	0.232	1.000	0.000	0.458	accessibility_first: 0.783
notebook	1.000	0.741	1.000	1.000	0.775	accessibility_first: 0.951

scores.by-format.json · evidence.by-format.json

dashboard-editor · templates

Profile Winners

human_reviewer: markdown (0.907)
agent_reader: markdown (0.904)
security_sensitive: markdown (0.922)
accessibility_first: markdown (0.941)
cost_sensitive: markdown (0.857)

Where HTML helped

html-static: comprehension 0.040, reviewability 0.058
html-svg: comprehension -0.060, reviewability 0.117
html-interactive: comprehension 0.070, reviewability 0.143

Format Scores

Format	Reader	Review	A11y	Security	Cost	Best profile
markdown	0.900	0.782	1.000	1.000	0.736	accessibility_first: 0.941
html-static	0.940	0.840	1.000	1.000	0.289	accessibility_first: 0.925
html-svg	0.840	0.899	1.000	1.000	0.200	accessibility_first: 0.914
html-interactive	0.970	0.925	1.000	1.000	0.261	accessibility_first: 0.931
json-renderer	0.900	0.368	1.000	1.000	0.224	accessibility_first: 0.895
notebook	0.750	0.659	1.000	1.000	0.539	accessibility_first: 0.910

scores.by-format.json · evidence.by-format.json

dashboard-editor · agent-corpus

Profile Winners

human_reviewer: markdown (0.926)
agent_reader: markdown (0.924)
security_sensitive: markdown (0.932)
accessibility_first: markdown (0.948)
cost_sensitive: markdown (0.903)

Where HTML helped

html-static: comprehension -0.220, reviewability -0.460
html-svg: comprehension 0.090, reviewability -0.565
html-interactive: comprehension 0.100, reviewability -0.503

Format Scores

Format	Reader	Review	A11y	Security	Cost	Best profile
markdown	0.900	0.818	1.000	1.000	0.851	accessibility_first: 0.948
html-static	0.680	0.358	1.000	1.000	0.397	accessibility_first: 0.881
html-svg	0.990	0.254	1.000	1.000	0.200	accessibility_first: 0.897
html-interactive	1.000	0.315	1.000	1.000	0.222	accessibility_first: 0.902
json-renderer	0.640	0.232	1.000	0.000	0.467	accessibility_first: 0.774
notebook	0.900	0.741	1.000	1.000	0.780	accessibility_first: 0.941

scores.by-format.json · evidence.by-format.json

incident-report · templates

Profile Winners

human_reviewer: markdown (0.919)
agent_reader: markdown (0.915)
security_sensitive: markdown (0.928)
accessibility_first: markdown (0.951)
cost_sensitive: markdown (0.866)

Where HTML helped

html-static: comprehension 0.000, reviewability 0.058
html-svg: comprehension -0.060, reviewability 0.117
html-interactive: comprehension -0.030, reviewability 0.143

Format Scores

Format	Reader	Review	A11y	Security	Cost	Best profile
markdown	1.000	0.782	1.000	1.000	0.748	accessibility_first: 0.951
html-static	1.000	0.840	1.000	1.000	0.297	accessibility_first: 0.932
html-svg	0.940	0.899	1.000	1.000	0.200	accessibility_first: 0.924
html-interactive	0.970	0.925	1.000	1.000	0.265	accessibility_first: 0.931
json-renderer	1.000	0.368	1.000	1.000	0.232	accessibility_first: 0.905
notebook	0.850	0.659	1.000	1.000	0.545	accessibility_first: 0.920

scores.by-format.json · evidence.by-format.json

incident-report · agent-corpus

Profile Winners

human_reviewer: markdown (0.937)
agent_reader: markdown (0.935)
security_sensitive: markdown (0.937)
accessibility_first: markdown (0.959)
cost_sensitive: markdown (0.910)

Where HTML helped

html-static: comprehension -0.220, reviewability -0.451
html-svg: comprehension 0.000, reviewability -0.555
html-interactive: comprehension 0.000, reviewability -0.489

Format Scores

Format	Reader	Review	A11y	Security	Cost	Best profile
markdown	1.000	0.818	1.000	1.000	0.856	accessibility_first: 0.959
html-static	0.780	0.367	1.000	1.000	0.401	accessibility_first: 0.891
html-svg	1.000	0.263	1.000	1.000	0.200	accessibility_first: 0.898
html-interactive	1.000	0.329	1.000	1.000	0.227	accessibility_first: 0.903
json-renderer	0.740	0.232	1.000	0.000	0.469	accessibility_first: 0.784
notebook	1.000	0.741	1.000	1.000	0.783	accessibility_first: 0.951

scores.by-format.json · evidence.by-format.json

prior-auth · templates

Profile Winners

human_reviewer: markdown (0.916)
agent_reader: markdown (0.910)
security_sensitive: markdown (0.927)
accessibility_first: markdown (0.950)
cost_sensitive: markdown (0.854)

Where HTML helped

html-static: comprehension 0.000, reviewability 0.054
html-svg: comprehension 0.000, reviewability 0.113
html-interactive: comprehension -0.030, reviewability 0.138

Format Scores

Format	Reader	Review	A11y	Security	Cost	Best profile
markdown	1.000	0.786	1.000	1.000	0.710	accessibility_first: 0.950
html-static	1.000	0.840	1.000	1.000	0.279	accessibility_first: 0.931
html-svg	1.000	0.899	1.000	1.000	0.200	accessibility_first: 0.930
html-interactive	0.970	0.925	1.000	1.000	0.258	accessibility_first: 0.931
json-renderer	1.000	0.382	1.000	1.000	0.206	accessibility_first: 0.904
notebook	0.960	0.659	1.000	1.000	0.532	accessibility_first: 0.931

scores.by-format.json · evidence.by-format.json

prior-auth · agent-corpus

Profile Winners

human_reviewer: markdown (0.936)
agent_reader: markdown (0.933)
security_sensitive: markdown (0.937)
accessibility_first: markdown (0.958)
cost_sensitive: markdown (0.904)

Where HTML helped

html-static: comprehension -0.220, reviewability -0.483
html-svg: comprehension 0.000, reviewability -0.587
html-interactive: comprehension 0.000, reviewability -0.525

Format Scores

Format	Reader	Review	A11y	Security	Cost	Best profile
markdown	1.000	0.823	1.000	1.000	0.839	accessibility_first: 0.958
html-static	0.780	0.340	1.000	1.000	0.396	accessibility_first: 0.890
html-svg	1.000	0.235	1.000	1.000	0.200	accessibility_first: 0.897
html-interactive	1.000	0.297	1.000	1.000	0.220	accessibility_first: 0.901
json-renderer	0.740	0.250	1.000	0.000	0.457	accessibility_first: 0.784
notebook	1.000	0.741	1.000	1.000	0.779	accessibility_first: 0.951

scores.by-format.json · evidence.by-format.json

research-explainer · templates

Profile Winners

human_reviewer: markdown (0.918)
agent_reader: markdown (0.913)
security_sensitive: markdown (0.928)
accessibility_first: markdown (0.951)
cost_sensitive: markdown (0.861)

Where HTML helped

html-static: comprehension 0.000, reviewability 0.054
html-svg: comprehension -0.060, reviewability 0.113
html-interactive: comprehension -0.030, reviewability 0.138

Format Scores

Format	Reader	Review	A11y	Security	Cost	Best profile
markdown	1.000	0.786	1.000	1.000	0.730	accessibility_first: 0.951
html-static	1.000	0.840	1.000	1.000	0.289	accessibility_first: 0.931
html-svg	0.940	0.899	1.000	1.000	0.200	accessibility_first: 0.924
html-interactive	0.970	0.925	1.000	1.000	0.264	accessibility_first: 0.931
json-renderer	1.000	0.400	1.000	1.000	0.255	accessibility_first: 0.908
notebook	0.850	0.659	1.000	1.000	0.535	accessibility_first: 0.920

scores.by-format.json · evidence.by-format.json

research-explainer · agent-corpus

Profile Winners

human_reviewer: markdown (0.937)
agent_reader: markdown (0.934)
security_sensitive: markdown (0.937)
accessibility_first: markdown (0.958)
cost_sensitive: markdown (0.907)

Where HTML helped

html-static: comprehension -0.220, reviewability -0.483
html-svg: comprehension 0.000, reviewability -0.587
html-interactive: comprehension 0.000, reviewability -0.521

Format Scores

Format	Reader	Review	A11y	Security	Cost	Best profile
markdown	1.000	0.823	1.000	1.000	0.846	accessibility_first: 0.958
html-static	0.780	0.340	1.000	1.000	0.396	accessibility_first: 0.890
html-svg	1.000	0.235	1.000	1.000	0.200	accessibility_first: 0.897
html-interactive	1.000	0.302	1.000	1.000	0.223	accessibility_first: 0.901
json-renderer	0.740	0.255	1.000	0.000	0.481	accessibility_first: 0.786
notebook	1.000	0.741	1.000	1.000	0.777	accessibility_first: 0.951

scores.by-format.json · evidence.by-format.json