Final Review · Submission #8 — Calibrating Trust in AI for Drafting-Related Professional Ta…

Methodological rigor visibly upgraded: PRISMA-ScR diagram, explicit Methods/Results split, transparent Boolean search log in Table 1 and Appendix A, and study-type coding in Appendix D answer multiple reviewers simultaneously.

The Verification-Value Paradox is now actionable through Table 7's decision criteria and the task-stakes classification in Table 6, directly resolving a criticism shared by Reviews 2, 5, and 7.

AI-type distinction (LLM / narrow ML / XAI) is properly integrated as a cross-cutting modifier in Section 3.5 and Table 4, rather than added cosmetically.

Use case is substantially sharpened with specific tools (Harvey, Lexis+ AI, CoCounsel) and an explicit junior/senior role distinction that turns abstract calibration into operational guidance.

Framework figure (Figure 2) now correctly conveys interconnection and bidirectionality with a central Calibrated Trust node, replacing the prior linear top-to-bottom diagram criticised by Reviews 1, 6, and 9.

Cross-domain Table 5 provides concrete medicine and architecture parallels, and the transferability claim is appropriately softened to "potential transferability."

Corpus expanded from 21 to 50 sources, addressing concerns about evidence-base thinness raised by Review 9.

Empirical contradictions (Naiseh vs. Senoner; Bansal et al.) are now explicitly surfaced and resolved through user-expertise and cognitive-load moderators rather than glossed over.

Concrete, testable hypotheses in Section 6 replace the generic "future work" prose of the original, addressing Reviews 2 and 4.

Balance between overtrust and undertrust is restored through integration of algorithm appreciation (Logg 2019) and explicit treatment of senior-lawyer undertrust as costly miscalibration in Section 4.

−

The domain-agnostic claim is better supported but still rests on illustrative tables rather than a second, fully developed worked use case (e.g., a clinical or architectural drafting scenario walked through all four dimensions).

−

Demographic, jurisdictional, and cultural variation remains acknowledged-only in Section 6; no synthesis is performed across the empirical studies that were included, leaving Review 2 and Review 4's concern only partially addressed.

−

The corpus skews heavily to policy/institutional sources (20/50, i.e., 40%) versus 15 empirical studies; the evidentiary weight of the framework still leans on guidance documents.

−

Several 2026 references (Liebherr et al. 2026; Pal et al. 2026 with anomalous DOI; Chaos Group 2026; VirtualSpaces 2026; Chang 2026) warrant verification — given the paper's own emphasis on hallucinated citations, source authenticity is a meta-credibility concern.

−

UI/UX factors in Intrinsic Trustworthiness (Review 5's point about warnings, highlights, indicators) are mentioned only briefly via Kim et al. and Küper et al.; not developed as a sub-theme.

−

The Discussion (Section 5) does not explicitly loop back to the four sub-questions posed in the Introduction, leaving the closure of the original research framing implicit.

High

Location

Section 4 (Worked Use Case)

Issue

Only legal drafting is fully worked through; the paper's domain-agnostic claim still rests largely on Table 5 illustrations.

Suggested Fix

Add a compact second worked case (~half page) — e.g., a radiology summary or an architectural compliance memo — that walks through all four dimensions to make transferability demonstrated rather than asserted.

High

Location

References list (entries dated 2026)

Issue

Several forthcoming/2026 citations (Liebherr et al. 2026; Pal et al. 2026 with anomalous DOI "10.1038/s41598-026-44167-3"; Chaos Group 2026; VirtualSpaces 2026; Chang 2026) require verification, especially given the paper's central concern with hallucinated authorities.

Suggested Fix

Re-verify each 2026 citation against publisher pages; correct DOIs; if any source is preprint or industry blog, label it accordingly in Table 11 and the reference list.

High

Location

Section 6 (Gaps and Future Work), demographic paragraph

Issue

Cultural/jurisdictional/seniority variation is acknowledged but never synthesised across the 15 empirical studies already included.

Suggested Fix

Add a short paragraph (or column in Appendix D) reporting the country/sample/profession of each empirical study, then summarise what the corpus does and does not cover demographically.

Medium

Location

Section 3.2 (Intrinsic Trustworthiness)

Issue

Interface/UI design cues (warnings, source highlights, uncertainty indicators) are mentioned only in passing through Kim et al. and Kueper et al.

Suggested Fix

Add a short paragraph treating UI/UX features (visible source links, confidence displays, hallucination warnings) as an explicit sub-component of Intrinsic Trustworthiness, since they directly mediate calibration.

Medium

Location

Section 5 (Discussion) closing

Issue

The Discussion does not explicitly answer the four sub-questions stated in the Introduction.

Suggested Fix

Insert a brief closing paragraph that revisits each sub-question (contextual, intrinsic, calibration, stakes) and states the one-sentence finding, closing the rhetorical loop.

Medium

Location

Figure 2 caption

Issue

Caption is terse ("dimensions interact bidirectionally rather than forming a fixed sequence").

Suggested Fix

Expand to identify what the AI-type modifier arrow does to each dimension, so the figure is self-contained when read in isolation.

Medium

Location

Section 3.5 (AI Type and Cross-Domain Evidence)

Issue

The cross-cutting AI-type modifier is introduced verbally but not visually integrated with Figure 2.

Suggested Fix

Either annotate Figure 2 with the AI-type ribbon explicitly, or add a small inset showing how each AI type shifts the verification threshold within each dimension.

Medium

Location

Table 7 (Decision criterion for the Verification-Value Paradox)

Issue

Currently presented as a 4-row table embedded in the Discussion; reviewers asked specifically for a usable practical tool.

Suggested Fix

Promote it to a labelled "Decision Tool" with a short worked numeric example (e.g., drafting time saved vs. verification time required) so practitioners can apply it.

Medium

Location

Section 4 (Worked Use Case)

Issue

Empirical legal studies (Magesh 2025; Kennedy 2025; Chang 2026) are referenced, but the use case still leans on policy guidance for normative claims.

Suggested Fix

Tie each of the four dimension paragraphs in Section 4 to at least one empirical finding, so the case study is grounded in research rather than guidance.

Low

Location

Composition of corpus (Table 2)

Issue

20/50 sources are policy/institutional/domain guidance, which is a high proportion for a scoping review.

Suggested Fix

Add one sentence justifying the policy weight (e.g., that practice-oriented professional governance is itself a primary object of analysis under the "contextual conditions" theme), pre-empting the imbalance critique.

Low

Location

Appendix E (AI Use Disclosure)

Issue

Disclosure is comprehensive, but the connection between authors' own AI workflow and the paper's framework is implicit.

Suggested Fix

Add one sentence explicitly mapping the authors' verification practice to the four dimensions of their own framework — this turns a compliance note into a self-consistent demonstration.

Low

Location

Section 1 (Introduction), definition of "drafting-related professional tasks"

Issue

Definition appears mid-introduction; readers from non-legal domains may still skim past it.

Suggested Fix

Move the definition to the first or second sentence of Section 1 (or set it apart as an italicised one-line definition) so the scope is unmissable.

Review Nº 08

The Pros

The Cons

Suggested Changes