Artifact Format Evaluation Harness
This report compares template artifacts and API-key-free agent-corpus artifacts. Reader scores combine answer accuracy, findability, visual checks, and interaction smoke tests; this is still not a human-study result.
code-review · templates
Profile Winners
- human_reviewer: markdown (0.916)
- agent_reader: markdown (0.912)
- security_sensitive: markdown (0.927)
- accessibility_first: markdown (0.950)
- cost_sensitive: markdown (0.858)
Where HTML helped
- html-static: comprehension 0.000, reviewability 0.058
- html-svg: comprehension -0.060, reviewability 0.117
- html-interactive: comprehension -0.030, reviewability 0.143
Format Scores
| Format | Reader | Review | A11y | Security | Cost | Best profile |
| markdown |
1.000 |
0.782 |
1.000 |
1.000 |
0.723 |
accessibility_first: 0.950 |
| html-static |
1.000 |
0.840 |
1.000 |
1.000 |
0.289 |
accessibility_first: 0.931 |
| html-svg |
0.940 |
0.899 |
1.000 |
1.000 |
0.200 |
accessibility_first: 0.924 |
| html-interactive |
0.970 |
0.925 |
1.000 |
1.000 |
0.273 |
accessibility_first: 0.932 |
| json-renderer |
1.000 |
0.364 |
1.000 |
1.000 |
0.218 |
accessibility_first: 0.904 |
| notebook |
0.850 |
0.659 |
1.000 |
1.000 |
0.536 |
accessibility_first: 0.920 |
scores.by-format.json · evidence.by-format.json
code-review · agent-corpus
Profile Winners
- human_reviewer: markdown (0.935)
- agent_reader: markdown (0.933)
- security_sensitive: markdown (0.936)
- accessibility_first: markdown (0.958)
- cost_sensitive: markdown (0.905)
Where HTML helped
- html-static: comprehension -0.220, reviewability -0.465
- html-svg: comprehension 0.000, reviewability -0.569
- html-interactive: comprehension 0.000, reviewability -0.503
Format Scores
| Format | Reader | Review | A11y | Security | Cost | Best profile |
| markdown |
1.000 |
0.818 |
1.000 |
1.000 |
0.842 |
accessibility_first: 0.958 |
| html-static |
0.780 |
0.354 |
1.000 |
1.000 |
0.401 |
accessibility_first: 0.891 |
| html-svg |
1.000 |
0.249 |
1.000 |
1.000 |
0.200 |
accessibility_first: 0.897 |
| html-interactive |
1.000 |
0.315 |
1.000 |
1.000 |
0.227 |
accessibility_first: 0.902 |
| json-renderer |
0.740 |
0.232 |
1.000 |
0.000 |
0.458 |
accessibility_first: 0.783 |
| notebook |
1.000 |
0.741 |
1.000 |
1.000 |
0.775 |
accessibility_first: 0.951 |
scores.by-format.json · evidence.by-format.json
dashboard-editor · templates
Profile Winners
- human_reviewer: markdown (0.907)
- agent_reader: markdown (0.904)
- security_sensitive: markdown (0.922)
- accessibility_first: markdown (0.941)
- cost_sensitive: markdown (0.857)
Where HTML helped
- html-static: comprehension 0.040, reviewability 0.058
- html-svg: comprehension -0.060, reviewability 0.117
- html-interactive: comprehension 0.070, reviewability 0.143
Format Scores
| Format | Reader | Review | A11y | Security | Cost | Best profile |
| markdown |
0.900 |
0.782 |
1.000 |
1.000 |
0.736 |
accessibility_first: 0.941 |
| html-static |
0.940 |
0.840 |
1.000 |
1.000 |
0.289 |
accessibility_first: 0.925 |
| html-svg |
0.840 |
0.899 |
1.000 |
1.000 |
0.200 |
accessibility_first: 0.914 |
| html-interactive |
0.970 |
0.925 |
1.000 |
1.000 |
0.261 |
accessibility_first: 0.931 |
| json-renderer |
0.900 |
0.368 |
1.000 |
1.000 |
0.224 |
accessibility_first: 0.895 |
| notebook |
0.750 |
0.659 |
1.000 |
1.000 |
0.539 |
accessibility_first: 0.910 |
scores.by-format.json · evidence.by-format.json
dashboard-editor · agent-corpus
Profile Winners
- human_reviewer: markdown (0.926)
- agent_reader: markdown (0.924)
- security_sensitive: markdown (0.932)
- accessibility_first: markdown (0.948)
- cost_sensitive: markdown (0.903)
Where HTML helped
- html-static: comprehension -0.220, reviewability -0.460
- html-svg: comprehension 0.090, reviewability -0.565
- html-interactive: comprehension 0.100, reviewability -0.503
Format Scores
| Format | Reader | Review | A11y | Security | Cost | Best profile |
| markdown |
0.900 |
0.818 |
1.000 |
1.000 |
0.851 |
accessibility_first: 0.948 |
| html-static |
0.680 |
0.358 |
1.000 |
1.000 |
0.397 |
accessibility_first: 0.881 |
| html-svg |
0.990 |
0.254 |
1.000 |
1.000 |
0.200 |
accessibility_first: 0.897 |
| html-interactive |
1.000 |
0.315 |
1.000 |
1.000 |
0.222 |
accessibility_first: 0.902 |
| json-renderer |
0.640 |
0.232 |
1.000 |
0.000 |
0.467 |
accessibility_first: 0.774 |
| notebook |
0.900 |
0.741 |
1.000 |
1.000 |
0.780 |
accessibility_first: 0.941 |
scores.by-format.json · evidence.by-format.json
incident-report · templates
Profile Winners
- human_reviewer: markdown (0.919)
- agent_reader: markdown (0.915)
- security_sensitive: markdown (0.928)
- accessibility_first: markdown (0.951)
- cost_sensitive: markdown (0.866)
Where HTML helped
- html-static: comprehension 0.000, reviewability 0.058
- html-svg: comprehension -0.060, reviewability 0.117
- html-interactive: comprehension -0.030, reviewability 0.143
Format Scores
| Format | Reader | Review | A11y | Security | Cost | Best profile |
| markdown |
1.000 |
0.782 |
1.000 |
1.000 |
0.748 |
accessibility_first: 0.951 |
| html-static |
1.000 |
0.840 |
1.000 |
1.000 |
0.297 |
accessibility_first: 0.932 |
| html-svg |
0.940 |
0.899 |
1.000 |
1.000 |
0.200 |
accessibility_first: 0.924 |
| html-interactive |
0.970 |
0.925 |
1.000 |
1.000 |
0.265 |
accessibility_first: 0.931 |
| json-renderer |
1.000 |
0.368 |
1.000 |
1.000 |
0.232 |
accessibility_first: 0.905 |
| notebook |
0.850 |
0.659 |
1.000 |
1.000 |
0.545 |
accessibility_first: 0.920 |
scores.by-format.json · evidence.by-format.json
incident-report · agent-corpus
Profile Winners
- human_reviewer: markdown (0.937)
- agent_reader: markdown (0.935)
- security_sensitive: markdown (0.937)
- accessibility_first: markdown (0.959)
- cost_sensitive: markdown (0.910)
Where HTML helped
- html-static: comprehension -0.220, reviewability -0.451
- html-svg: comprehension 0.000, reviewability -0.555
- html-interactive: comprehension 0.000, reviewability -0.489
Format Scores
| Format | Reader | Review | A11y | Security | Cost | Best profile |
| markdown |
1.000 |
0.818 |
1.000 |
1.000 |
0.856 |
accessibility_first: 0.959 |
| html-static |
0.780 |
0.367 |
1.000 |
1.000 |
0.401 |
accessibility_first: 0.891 |
| html-svg |
1.000 |
0.263 |
1.000 |
1.000 |
0.200 |
accessibility_first: 0.898 |
| html-interactive |
1.000 |
0.329 |
1.000 |
1.000 |
0.227 |
accessibility_first: 0.903 |
| json-renderer |
0.740 |
0.232 |
1.000 |
0.000 |
0.469 |
accessibility_first: 0.784 |
| notebook |
1.000 |
0.741 |
1.000 |
1.000 |
0.783 |
accessibility_first: 0.951 |
scores.by-format.json · evidence.by-format.json
prior-auth · templates
Profile Winners
- human_reviewer: markdown (0.916)
- agent_reader: markdown (0.910)
- security_sensitive: markdown (0.927)
- accessibility_first: markdown (0.950)
- cost_sensitive: markdown (0.854)
Where HTML helped
- html-static: comprehension 0.000, reviewability 0.054
- html-svg: comprehension 0.000, reviewability 0.113
- html-interactive: comprehension -0.030, reviewability 0.138
Format Scores
| Format | Reader | Review | A11y | Security | Cost | Best profile |
| markdown |
1.000 |
0.786 |
1.000 |
1.000 |
0.710 |
accessibility_first: 0.950 |
| html-static |
1.000 |
0.840 |
1.000 |
1.000 |
0.279 |
accessibility_first: 0.931 |
| html-svg |
1.000 |
0.899 |
1.000 |
1.000 |
0.200 |
accessibility_first: 0.930 |
| html-interactive |
0.970 |
0.925 |
1.000 |
1.000 |
0.258 |
accessibility_first: 0.931 |
| json-renderer |
1.000 |
0.382 |
1.000 |
1.000 |
0.206 |
accessibility_first: 0.904 |
| notebook |
0.960 |
0.659 |
1.000 |
1.000 |
0.532 |
accessibility_first: 0.931 |
scores.by-format.json · evidence.by-format.json
prior-auth · agent-corpus
Profile Winners
- human_reviewer: markdown (0.936)
- agent_reader: markdown (0.933)
- security_sensitive: markdown (0.937)
- accessibility_first: markdown (0.958)
- cost_sensitive: markdown (0.904)
Where HTML helped
- html-static: comprehension -0.220, reviewability -0.483
- html-svg: comprehension 0.000, reviewability -0.587
- html-interactive: comprehension 0.000, reviewability -0.525
Format Scores
| Format | Reader | Review | A11y | Security | Cost | Best profile |
| markdown |
1.000 |
0.823 |
1.000 |
1.000 |
0.839 |
accessibility_first: 0.958 |
| html-static |
0.780 |
0.340 |
1.000 |
1.000 |
0.396 |
accessibility_first: 0.890 |
| html-svg |
1.000 |
0.235 |
1.000 |
1.000 |
0.200 |
accessibility_first: 0.897 |
| html-interactive |
1.000 |
0.297 |
1.000 |
1.000 |
0.220 |
accessibility_first: 0.901 |
| json-renderer |
0.740 |
0.250 |
1.000 |
0.000 |
0.457 |
accessibility_first: 0.784 |
| notebook |
1.000 |
0.741 |
1.000 |
1.000 |
0.779 |
accessibility_first: 0.951 |
scores.by-format.json · evidence.by-format.json
research-explainer · templates
Profile Winners
- human_reviewer: markdown (0.918)
- agent_reader: markdown (0.913)
- security_sensitive: markdown (0.928)
- accessibility_first: markdown (0.951)
- cost_sensitive: markdown (0.861)
Where HTML helped
- html-static: comprehension 0.000, reviewability 0.054
- html-svg: comprehension -0.060, reviewability 0.113
- html-interactive: comprehension -0.030, reviewability 0.138
Format Scores
| Format | Reader | Review | A11y | Security | Cost | Best profile |
| markdown |
1.000 |
0.786 |
1.000 |
1.000 |
0.730 |
accessibility_first: 0.951 |
| html-static |
1.000 |
0.840 |
1.000 |
1.000 |
0.289 |
accessibility_first: 0.931 |
| html-svg |
0.940 |
0.899 |
1.000 |
1.000 |
0.200 |
accessibility_first: 0.924 |
| html-interactive |
0.970 |
0.925 |
1.000 |
1.000 |
0.264 |
accessibility_first: 0.931 |
| json-renderer |
1.000 |
0.400 |
1.000 |
1.000 |
0.255 |
accessibility_first: 0.908 |
| notebook |
0.850 |
0.659 |
1.000 |
1.000 |
0.535 |
accessibility_first: 0.920 |
scores.by-format.json · evidence.by-format.json
research-explainer · agent-corpus
Profile Winners
- human_reviewer: markdown (0.937)
- agent_reader: markdown (0.934)
- security_sensitive: markdown (0.937)
- accessibility_first: markdown (0.958)
- cost_sensitive: markdown (0.907)
Where HTML helped
- html-static: comprehension -0.220, reviewability -0.483
- html-svg: comprehension 0.000, reviewability -0.587
- html-interactive: comprehension 0.000, reviewability -0.521
Format Scores
| Format | Reader | Review | A11y | Security | Cost | Best profile |
| markdown |
1.000 |
0.823 |
1.000 |
1.000 |
0.846 |
accessibility_first: 0.958 |
| html-static |
0.780 |
0.340 |
1.000 |
1.000 |
0.396 |
accessibility_first: 0.890 |
| html-svg |
1.000 |
0.235 |
1.000 |
1.000 |
0.200 |
accessibility_first: 0.897 |
| html-interactive |
1.000 |
0.302 |
1.000 |
1.000 |
0.223 |
accessibility_first: 0.901 |
| json-renderer |
0.740 |
0.255 |
1.000 |
0.000 |
0.481 |
accessibility_first: 0.786 |
| notebook |
1.000 |
0.741 |
1.000 |
1.000 |
0.777 |
accessibility_first: 0.951 |
scores.by-format.json · evidence.by-format.json