← Back to The Index Final Review Submission #10
Final Review

Review Nº 10

What factors determine whether a worker's trust in an AI system is well calibrated for a given task?
AuthorsNikola Kandev, Petru Nacea, Nicolò Lodigiani, Nicolò Moiso, Riccardo Ferrero, Tommaso Palena, Davide Pandino, Pietro Montorsi and Leonardo Vezzù
28/30
Score
The revision substantially addresses the most critical reviewer consensus points )methodology transparency, the PRISMA diagram, duplicate references, redundancy, the worked use case, the AI-use disclosure, and specific limitations) representing a clear and meaningful improvement over the 25/30 first submission, though residual gaps in the stakeholder table, explicit source classification, and cross-study synthesis prevent a perfect score.
High Distinction

The Pros

9 Items
+
Methodology section is now verifiable: databases, six explicit search strings, publication window with justified pre-2015 exceptions, and clear inclusion/exclusion criteria are stated.
+
A PRISMA-ScR flow diagram (Appendix D) reports exact counts (60 → 44 → 35 → 24 → 19) with exclusion reasons.
+
Duplicate citations flagged in Round 1 (4/15, 6/12/19, 7/13) are resolved; the bibliography now contains 19 distinct entries.
+
The duplicated Section 3.1.4 block is gone and the framework is restructured into five clean, non-overlapping categories plus a cross-cutting subsection.
+
The worked use case (Section 4) is genuinely operationalized: tasks are mapped to stakes, framework dimensions are cited inline, and survey data (Figs. 3–4) is integrated to support the argument.
+
Limitations (Section 5) move from generic statements to six concrete, named research gaps, each with the relevant citations.
+
The AI-use disclosure (Appendix C) is rewritten to directly address Review 2's objection that AI shortcuts the reading process.
+
Section 3.6 introduces a useful synthesis layer (miscalibration, organizational pressure, bias) that did not previously exist.
+
The survey (n > 100) is integrated as a second empirical pillar, supporting the use case with primary data.

The Cons

6 Items
Stakeholder table (Table 1) still lists only four actors, omits AI developers and vendors, and the explicit "influence on overtrust/undertrust" framing is only applied to the worker row.
Sources are not classified by type (empirical / theoretical / practice-oriented) in the paper itself; the mandatory quota raised by Review 2 is not visibly demonstrated.
Critical synthesis remains uneven: a few productive contrasts exist (Hoff & Bashir vs. Glikson & Woolley) but most categories still read as accumulation rather than comparison.
Section 3.7 ("Calibration Continuum") makes interpretive claims about "structural proxies" and "institutional signals" without anchoring them to specific cited evidence.
The empirical survey is announced as a methodological pillar but its sampling, instrument validation, and consent details are not summarized in the main text.
Minor formatting inconsistencies persist (spacing before commas, keyword spacing in the abstract, figure caption alignment).

Suggested Changes

12 Pointers
01
High
Location
Section 3.5.3 / Table 1
Issue
Stakeholder table still omits AI developers and vendors, and only the "Worker" row links explicitly to overtrust/undertrust
Suggested Fix
Add rows for AI developers and AI vendors/providers, and add a dedicated column "Influence on Overtrust/Undertrust" filled in for every row to make the link to calibration explicit
02
High
Location
Section 2.1 (Method) or Appendix
Issue
Sources are not classified by type within the paper, so reviewers cannot verify the empirical/theoretical/practice mix
Suggested Fix
Add a short summary table listing each of the 19 sources with a "Type" column (empirical / theoretical / practice-oriented / regulatory) and report the totals in-text
03
High
Location
Section 3 (subsections 3.2, 3.3, 3.4)
Issue
Synthesis is still mostly accumulative; only one explicit contrast (Hoff & Bashir vs. Glikson & Woolley) is developed
Suggested Fix
Add at least one comparative sentence per subsection that names where studies converge or disagree, e.g., on the relative weight of explainability vs. reliability or on cultural versus individual factors
04
Medium
Location
Section 3.7 (Calibration Continuum)
Issue
Claims about "structural proxies", "institutional signals", and "default trust" are interpretive and not anchored to specific citations
Suggested Fix
Either tie each claim to a numbered reference (e.g., [11], [13], [19]) or reframe the section as an explicitly labeled authors' synthesis with a clear hedging sentence
05
Medium
Location
Section 2.2 (Empirical Survey)
Issue
Survey is announced as a second methodological pillar but sampling, instrument validation, and consent are not described in the main text
Suggested Fix
Add 3–4 sentences summarizing recruitment, demographics, response rate, and any ethical-review/consent procedure; cross-reference the appendix for the full instrument
06
Medium
Location
Section 4.1 (Task Mapping)
Issue
Each subtask names a stakes level but does not consistently map to all five framework categories
Suggested Fix
For each of 4.1.1–4.1.3, add a one-line "Framework dimensions activated: [3.1, 3.2, 3.3, 3.5.1, 3.5.2]" tag so the mapping is mechanical and verifiable
07
Medium
Location
Section 4.4 / Figures 3 and 4
Issue
Survey figures are described qualitatively but no sample size, response rate, or basic statistics are reported in-caption
Suggested Fix
Add n, response rate, and where relevant median/mode to figure captions; note explicitly whether the pattern is statistically meaningful or descriptive only
08
Low
Location
Section 5 (Limitations)
Issue
The six gaps are well identified but not prioritized for future work
Suggested Fix
Close Section 5 with a short paragraph ranking the gaps by tractability or impact (e.g., longitudinal calibration > white-collar transferability > cultural comparison) so future researchers have a clear roadmap
09
High
Location
Reference [17] (EU AI Act) versus citation in Section 3.3
Issue
Section 3.3 cites the EU AI Act as [16], but the reference list places the AI Act at [17] and Floridi et al. at [16]
Suggested Fix
Audit all in-text citation numbers against the reference list and correct any mismatches; this is a substantive accuracy issue, not stylistic
10
Low
Location
Abstract and keywords
Issue
Keyword list contains a stray space ("AI Transparency , Trust Calibration") and the abstract does not state the framework's five categories or the survey's role
Suggested Fix
Tighten keywords (remove extra spaces) and add one sentence to the abstract naming the five framework categories and the survey (n > 100) as a complementary empirical input
11
Medium
Location
Appendix C (AI Use Disclosure)
Issue
The disclosure is much improved but does not specify which sections used AI assistance or the verification step performed
Suggested Fix
Add one sentence stating which phases (e.g., initial screening summaries, language polishing) used AI and that final author-of-record verification was performed by at least one named team member per section
12
Low
Location
Section 3.6.1 (Trust Miscalibration)
Issue
The "common structural cause" claim — that users evaluate AI holistically — is asserted rather than cited
Suggested Fix
Add a citation (likely [6] or [11]) supporting the holistic-evaluation claim, or hedge it as the authors' interpretive synthesis
Back to The Index
Score · 28/30
Strong · But · Sharper
Pros / Cons / Pointers
Final Review · Submission #10 The Index Grandi Sfide · 2026