← Back to The Index Final Review Submission #4
Final Review

Review Nº 04

Practical Criteria for AI Task Delegation: A Scoping Review of Task Complexity, Performance Reliability, and Regulatory Compliance
AuthorsJacopo Caravaggio, Gabriele Gallo, Sila Gulec, Ahmet Tolga Birand, Giacomo Simbula, Nicolò Fiori, Benedetta Stellato, Kutay Gunel, Behrad Galedari and Pietro De Francesco
31/30
Score
The revision is comprehensive and surgical, virtually every consensus point is fully addressed (visuals, generality, use case, conclusion, regulation in case, PRISMA-ScR, RQ-to-theme link, expanded corpus), with only the interaction-effects critique left mostly to "future work" rather than substantively analyzed.
Over the Top · 31/30

The Pros

8 Items
+
Added both a PRISMA-ScR flow diagram and an inputs/mediators/outcomes framework diagram, resolving the unanimous "no visuals" complaint.
+
Cleaned the framework (Sec. 4) of programmer-specific examples; the generality rule is now respected and the use case (Sec. 5) feels novel rather than redundant.
+
Use case substantially upgraded: dual scenario (sorting vs. login), theme-by-theme application, and an explicit 3-tier sub-task recommendation — directly answering Reviews 3, 7, and 8.
+
Cost & Efficiency (Sec. 4.3) restructured into three coherent levels (individual / temporal / organizational) and grounded with strong new empirical sources (Brynjolfsson 2025, ILO 2023, Anthropic Index 2025).
+
Corpus expanded from 24 to 39 sources, including the specifically requested Horowitz & Kahn (2024) study, addressing Review 2's call for empirical anchoring of the automation bias curve.
+
Standalone Conclusion (Sec. 7) added; verification cost is elevated to a unifying threshold concept.
+
Explicit bridge between the three RQ criteria and the five operational themes added in the introduction (Review 3's specific ask).
+
EU AI Act and NIST references are now operationalized inside the use case (Sec. 5.2 — Accountability), not just listed in the framework.

The Cons

5 Items
Factor-interaction critique (Reviews 2, 8) is acknowledged but mostly punted to "future work" (Sec. 6); only one concrete interaction (time pressure × complexity, [32]) is analyzed in the body.
The "automation bias curve" is named in Sec. 4.5 but still not visualized — a small chart or schematic would close Review 2's loop.
The Framework Diagram (Fig. 2) is included but not described in prose; the reader has to infer what "inputs / process mediators / outcomes" mean and how the feedback loop operates.
AI Use Disclosure (Appendix F) is more detailed than before but still somewhat generic — Review 9's specific point about elaborating it is only partially addressed.
Geographical scope gap (Review 10) is now mentioned in Sec. 6 but the methodology itself does not document any geographic filtering or language coverage beyond "English".

Suggested Changes

12 Pointers
01
High
Location
Section 4.5 (Human–AI Collaboration), "automation bias curve" paragraph
Issue
The curve is named and attributed to Horowitz & Kahn but never visualized, which was Review 2's central ask
Suggested Fix
Add a small inline figure plotting reliance/over-reliance vs. AI literacy with the moderate-knowledge peak labeled, and reference it from the text
02
High
Location
Section 6 (Future Gaps), "Systematic Neglect of Interacting Factors"
Issue
The interaction critique is acknowledged but only one interaction (time pressure × complexity) is actually analyzed in the body
Suggested Fix
Promote one or two concrete interactions (e.g., AI literacy × verification cost; accountability salience × task complexity) into Section 4 with at least one cited example, so interactions are demonstrated, not just flagged
03
High
Location
Appendix E (Framework Diagram, Figure 2)
Issue
The diagram is included but not described in prose; labels "inputs / process mediators / outcomes" and the feedback loop are not explained anywhere in the paper
Suggested Fix
Add a 3–5 sentence caption or a short paragraph in Section 4 walking the reader through how the five themes map onto inputs, mediators, and outcomes, and what the feedback loop represents
04
Medium
Location
Section 5.3 (Judgement), tiered recommendation
Issue
The three tiers (full / partial-with-review / discouraged) are stated but not tied to verification cost, which the Conclusion later claims is the unifying threshold
Suggested Fix
Add one sentence per tier explicitly grounding it in the verification-cost-vs-task-effort ratio so the use case foreshadows the conclusion's framing
05
Medium
Location
Section 4.2 (AI Performance), bias–aversion cycle paragraph
Issue
The Logg et al. [23] finding (algorithm appreciation) is presented as "a different side of the story" but is not reconciled with the Goddard [16] / Dietvorst [11] bias-then-aversion arc
Suggested Fix
Add 1–2 sentences explaining the boundary conditions (task type, expertise) under which appreciation vs. aversion dominates, so the section reads as synthesis rather than juxtaposition
06
Medium
Location
Section 4.3 (Cost & Efficiency), organizational-level paragraph
Issue
The skill-erosion claim from Rinta-Kahila [33] is cited four times in two sentences, which reads as over-reliance on a single source for a strong macro claim
Suggested Fix
Triangulate with at least one additional source (empirical or theoretical) supporting long-term skill degradation, or soften the claim to match the evidence base
07
Medium
Location
Section 2.2 (Inclusion/Exclusion), exception for [16]
Issue
The exception for the pre-2015 Goddard source is stated but the justification ("foundational source... widely cited") is brief and was specifically queried by Review 4
Suggested Fix
Expand by one sentence stating concretely what theoretical contribution from [16] is irreplaceable (e.g., the operational definition of automation bias used throughout §4.1)
08
Medium
Location
Section 2 (Methodology)
Issue
No mention of geographical or linguistic scope of the included studies, which Review 10 raised; "English-only" is stated but the geographic distribution of the 39 sources is not
Suggested Fix
Add one sentence in §2.4 or §3 quantifying the geographic spread of empirical sources, and acknowledge it as a scope limitation feeding into §6
09
Medium
Location
Section 5.2 (Applying the Framework — Accountability)
Issue
The claim that "both the EU AI Act and the NIST framework mandate that code handling personal data undergo documented human review" is strong and would benefit from a precise pointer
Suggested Fix
Add the specific Article (e.g., EU AI Act Art. 14 on human oversight) and the NIST RMF function (Manage / Measure) to make the regulatory mapping verifiable
10
Low
Location
Appendix F (AI Use Disclosure)
Issue
Review 9 specifically flagged this as vague; current revision adds detail but still uses general phrases like "checked by team members"
Suggested Fix
Specify the verification protocol per use (e.g., "each search string was independently re-executed by two team members and yields compared"; "every AI-generated citation was cross-checked against the DOI/publisher record")
11
Low
Location
Section 1 (Introduction), bullet list of five themes
Issue
The themes are introduced via bullets, but the paper otherwise uses prose; the bullets break the academic register set elsewhere
Suggested Fix
Convert the bullet list into a single dense paragraph that names and one-line-defines each theme, preserving the explicit 3→5 expansion
12
Low
Location
Section 7 (Conclusion)
Issue
The conclusion is now strong but ends on future work without restating the operational answer to the RQ in one sentence
Suggested Fix
Add a closing sentence of the form "Workers should delegate when X, Y, and Z hold; otherwise, retain manual execution," giving the reader a takeaway rule
Back to The Index
You aced it · 31/30
Read · React · Revise
Pros / Cons / Pointers
Final Review · Submission #4 The Index Grandi Sfide · 2026