Final Review · Paper 10 — Calibrating Trust in AI. A Co-Designed Checklist for Task-Specific Reliance

Calibrating Trust in AI. A Co-Designed Checklist for Task-Specific Reliance · Final Review · Score 30/30Calibrating Trust in AI. A Co-Designed Checklist for Task-Specific Reliance · Final Review · Score 30/30Calibrating Trust in AI. A Co-Designed Checklist for Task-Specific Reliance · Final Review · Score 30/30Calibrating Trust in AI. A Co-Designed Checklist for Task-Specific Reliance · Final Review · Score 30/30Calibrating Trust in AI. A Co-Designed Checklist for Task-Specific Reliance · Final Review · Score 30/30Calibrating Trust in AI. A Co-Designed Checklist for Task-Specific Reliance · Final Review · Score 30/30

Three complementary methods (depth interviews, N=109 survey, group workshops) are deliberately triangulated, and the rationale for each is well argued.

Appendix E change log (V1 wording, V2 wording, type, source) with item IDs gives precise, verifiable traceability.

Appendix G anonymized per-participant notes tie each revision to a concrete participant statement.

Items retained against feedback (Section 4.3) are documented with reasons, a hallmark of honest co-design.

Concrete, well-traced design moves (the ML engineer's silently degraded fraud model → versioning item ASF-3) make the changes credible.

The survey is honestly scoped as a directional check, and the worked professor case maps items onto genuinely differentiated practice.

−

The co-design cycle stops at V2; only one revision round is produced where peers went to V3.

−

The tool is purely reflective — no gating, scoring, or stop/revise/delegate/escalate outcomes — so it is less actionable than the decision-rule submissions.

−

The N=109 survey drives essentially no item revision; it only confirms patterns, a thin payoff for the effort.

−

Key data (survey CSV, V1/V2 PDFs) live only in an external Google Drive ZIP, so claims like the 87% figure cannot be verified in-paper.

−

Several V2 items are compound two-sentence prompts (e.g., TC-6), which strains usability.

−

The "profession-agnostic" claim is tested on a single profession (the professor).

Paper Nº 10

The Pros

The Cons