EN
登录

报告和评估人工智能优于人类医生的声明的伦理指导

Ethical guidance for reporting and evaluating claims of AI outperforming human doctors

Nature 等信源发布 2024-10-02 18:39

可切换为仅中文


Claims of AI outperforming medical practitioners are under scrutiny, as the evidence supporting many of these claims is not convincing or transparently reported. These claims often lack specificity, contextualization, and empirical grounding. In this comment, we offer constructive ethical guidance that can benefit authors, journal editors, and peer reviewers when reporting and evaluating findings in studies comparing AI to physician performance.

人工智能优于医生的说法正在接受审查,因为支持许多这些说法的证据没有令人信服或透明的报道。。在这篇评论中,我们提供了建设性的道德指导,可以使作者,期刊编辑和同行评论员在报告和评估AI与医生表现比较的研究结果时受益。

The guidance provided here forms an essential addition to current reporting guidelines for healthcare studies using machine learning..

这里提供的指导是对当前使用机器学习的医疗保健研究报告指南的重要补充。。

IntroductionAn increasing number of academic reports contend that Artificial Intelligence (AI), in particular machine learning systems, surpasses medical practitioners’ performance in various clinical tasks and specialisms1,2,3. These outperformance claims—as we will refer to them in this article—vary in their formulation.

引言越来越多的学术报告认为,人工智能(AI),特别是机器学习系统,在各种临床任务和专业方面超过了医生的表现1,2,3。我们将在本文中提到的这些优于预期的说法在表述上有所不同。

Commonly used terms include ‘outperform,’ ‘surpass,’ ‘exceed,’ ‘better than’, and ‘superior to,’ but all claims are based on the fundamental assumption that AI can be directly compared to and exceed the expertise of medical practitioners on some level (for examples, see Table 1). These reports have contributed to excitement about AI’s potential value for medical contexts and have raised hopes about automating specific diagnostic tasks4,5.

常用的术语包括“跑赢大盘”、“超越”、“超越”、“优于”和“优于”,但所有的说法都是基于这样一个基本假设,即人工智能可以在一定程度上直接与医生的专业知识进行比较和超越(例如,见表1)。这些报告使人们对人工智能在医学领域的潜在价值感到兴奋,并为自动化特定的诊断任务带来了希望4,5。

The claims about outperformance have also been met with skepticism because of methodological flaws in AI studies. For example, it remains uncertain whether AI can outperform medical practitioners in clinical practice because model performance is often evaluated in unrealistic settings4. Furthermore, many studies fail to transparently report the circumstances under which AI is compared to medical practitioners, making it hard to verify claims of outperformance1.

由于人工智能研究的方法论缺陷,关于表现优异的说法也遭到了怀疑。例如,人工智能在临床实践中是否能优于医生仍然不确定,因为模型性能通常在不现实的环境中进行评估4。此外,许多研究未能透明地报告AI与医生进行比较的情况,因此很难验证优于1的说法。

Several scholars have concluded that reports on AI’s performance in medicine are “exaggerated,”6,7 and that it is “time to reality check” these kinds of claims to distinguish genuine potential from hype4. Some scholars even warn against using terms like ‘outperform’ because overpromising language risks being misinterpreted by the media and the public1,6,7,8 and may result in “sidestepping ethical concerns leaving no space for issues and criticism” on AI’s functioning in medical practice8.Table 1 Examples of outperformance claimsFull size tableWhile the concern.

一些学者得出结论,关于人工智能在医学上表现的报告是“夸大的”,6,7,现在是“现实检查”这些说法的时候了,以区分真正的潜力和hype4。一些学者甚至警告不要使用“跑赢大盘”之类的术语,因为过分妥协的语言可能会被媒体和公众误解1,6,7,8,并可能导致“回避道德问题,没有任何空间来讨论和批评人工智能在医疗实践中的功能8。表1跑赢大盘的例子全尺寸表。

ReferencesLiu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digital Health 1, e271–e297 (2019).Article

参考文献Liu,X。等人。从医学影像学检测疾病的深度学习表现与医疗保健专业人员的比较:系统评价和荟萃分析。柳叶刀数字健康1,e271-e297(2019)。文章

PubMed

PubMed

Google Scholar

谷歌学者

Lebovitz, S., Levina, N. & Lifshitz-Assaf, H. Is AI ground truth really true? The dangers of training and evaluating AI tools based on experts’ know-what. MIS Q. 45, 1501–1525 (2021).Article

Lebovitz,S.,Levina,N。和Lifshitz-Assaf,H。AI地面真相真的是真的吗?基于专家知识培训和评估人工智能工具的危险。MIS Q.451501–1525(2021)。文章

Google Scholar

谷歌学者

Han, R. et al. Randomised controlled trials evaluating artificial intelligence in clinical practice: a scoping review. Lancet Digital Health 6, e367–e373 (2024).Article

Han,R.等人,《临床实践中评估人工智能的随机对照试验:范围界定综述》。柳叶刀数字健康6,e367-e373(2024)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Wilkinson, J. et al. Time to reality check the promises of machine learning-powered precision medicine. Lancet Digital Health 2, e677–e680 (2020).Article

。柳叶刀数字健康2,e677-e680(2020)。文章

PubMed

PubMed

Google Scholar

谷歌学者

Fogel, A. L. & Kvedar, J. C. Artificial intelligence powers digital medicine. NPJ Digital Med. 1, 5 (2018).Article

Fogel,A.L。和Kvedar,J.C。人工智能推动数字医学。NPJ数字医学1,5(2018)。文章

Google Scholar

谷歌学者

BMJ. Concerns over ‘exaggerated’ study claims of AI outperforming doctors: Misleading claims fuel hype and pose a patient safety risk, warn researchers, www.sciencedaily.com/releases/2020/03/200325212159.htm (2020).Nagendran, M. et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies.

英国医学杂志。对人工智能优于医生的“夸大”研究声明的担忧:误导性声明助长了炒作,并构成了患者安全风险,警告研究人员,www.sciencedaily.com/releases/2020/03/200325212159.htm(2020)。Nagendran,M.等人,《人工智能与临床医生:对深度学习研究的设计、报告标准和主张的系统评价》。

bmj 368, m689 (2020).Bunz, M. & Braghieri, M. The AI doctor will see you now: assessing the framing of AI in news coverage. AI Society 37, 9–22 (2022).Article .

bmj 368,m689(2020)。Bunz,M。和Braghieri,M。人工智能医生现在来看你:评估新闻报道中人工智能的框架。AI Society 37,9-22(2022)。文章。

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Morley, J. et al. Operationalising AI ethics: barriers, enablers and next steps. AI Society 38, 411–423 (2023).Article

Morley,J.等人,《实施人工智能伦理:障碍、促成因素和下一步》。AI协会38411-423(2023)。文章

Google Scholar

谷歌学者

Dhiman, P. et al. Overinterpretation of findings in machine learning prediction model studies in oncology: a systematic review. J. Clin. Epidemiol. 157, 120–133 (2023).Article

Dhiman,P。等人。肿瘤学中机器学习预测模型研究结果的过度解释:系统综述。J、 临床。流行病。157120-133(2023)。文章

PubMed

PubMed

Google Scholar

谷歌学者

Gasulla, Ó. et al. Enhancing physicians’ radiology diagnostics of COVID-19’s effects on lung health by leveraging artificial intelligence. Front. Bioeng. Biotechnol. 11, 1010679 (2023).Article

加苏拉,Ó。。正面。生物能源。生物技术。111010679(2023)。文章

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Dorr, F. et al. COVID-19 pneumonia accurately detected on chest radiographs with artificial intelligence. Intell.-Based Med. 3-4, 100014 (2020).

Dorr,F.等人利用人工智能在胸片上准确检测到新型冠状病毒肺炎。因特尔-基于医学3-410014(2020)。

Google Scholar

谷歌学者

Kong, Y. et al. Constructing an automatic diagnosis and severity-classification model for acromegaly using facial photographs by deep learning. J. Hematol. Oncol. 13, 88 (2020).Article

Kong,Y.等。通过深度学习利用面部照片构建肢端肥大症的自动诊断和严重程度分类模型。J、 血液学。Oncol公司。13,88(2020)。文章

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Angkurawaranon, S. et al. A comparison of performance between a deep learning model with residents for localization and classification of intracranial hemorrhage. Sci. Rep. 13, 9975 (2023).Article

Angkurawaranon,S.等人。深度学习模型与住院医师在颅内出血定位和分类方面的表现比较。科学。代表139975(2023)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Collins, G. S. et al. TRIPOD+ AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. bmj 385, e078378 (2024).Article

Collins,G.S.等人,《TRIPOD+AI声明:使用回归或机器学习方法报告临床预测模型的更新指南》。bmj 385,e078378(2024)。文章

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Sounderajah, V. et al. Developing a reporting guideline for artificial intelligence-centred diagnostic test accuracy studies: the STARD-AI protocol. BMJ open 11, e047709 (2021).Article

Sounderajah,V。等人。为以人工智能为中心的诊断测试准确性研究制定报告指南:STARD-AI协议。BMJ公开赛11,e047709(2021)。文章

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Rivera, S. C. et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Lancet Digital Health 2, e549–e560 (2020).Article

Rivera,S.C.等人,《涉及人工智能干预的临床试验方案指南:SPIRIT-AI扩展》。柳叶刀数字健康2,e549-e560(2020)。文章

Google Scholar

谷歌学者

EQUATOR network. Enhancing the QUAlity and Transparency Of health Research, https://www.equator-network.org (2024).Klontzas, M. E., Gatti, A. A., Tejani, A. S. & Kahn, C. E. Jr AI reporting guidelines: how to select the best one for your research. Radiology: Artif. Intell. 5, e230055 (2023)..

赤道网络。提高健康研究的质量和透明度,https://www.equator-network.org(2024年)。。放射学:人工制品。因特尔。5,e230055(2023年)。。

Google Scholar

谷歌学者

Flanagin, A. et al. Reporting use of AI in research and scholarly publication—JAMA Network Guidance. JAMA 331, 1096–1098 (2024).Article

Flanagin,A.等人报道了人工智能在研究和学术出版物JAMA网络指南中的应用。JAMA 3311096–1098(2024)。文章

PubMed

PubMed

Google Scholar

谷歌学者

Bian, Y. et al. Artificial Intelligence to Predict Lymph Node Metastasis at CT in Pancreatic Ductal Adenocarcinoma. Radiology 306, 160–169 (2022).Article

Bian,Y.等人。人工智能预测胰腺导管腺癌CT淋巴结转移。放射学306160-169(2022)。文章

PubMed

PubMed

Google Scholar

谷歌学者

Urakawa, T. et al. Detecting intertrochanteric hip fractures with orthopedist-level accuracy using a deep convolutional neural network. Skelet. Radiol. 48, 239–244 (2019).Article

Urakawa,T。等人。使用深度卷积神经网络以骨科医师水平的准确性检测股骨粗隆间髋部骨折。Skelet。放射性。48239-244(2019)。文章

Google Scholar

谷歌学者

Iwaki, T. et al. Deep Learning Models for Cystoscopic Recognition of Hunner Lesion in Interstitial Cystitis. Eur. Urol. Open Sci. 49, 44–50 (2023).Article

Iwaki,T。等人。膀胱镜识别间质性膀胱炎Hunner病变的深度学习模型。欧元Urol。打开Sci。49,44-50(2023)。文章

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Ding, L. et al. Artificial intelligence system of faster region-based convolutional neural network surpassing senior radiologists in evaluation of metastatic lymph nodes of rectal cancer. Chin. Med. J. 132, 379–387 (2019).Article

Ding,L.等人。更快的基于区域的卷积神经网络的人工智能系统在评估直肠癌转移淋巴结方面超过了高级放射科医师。下巴。医学杂志132379-387(2019)。文章

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Kaddoura, T. et al. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians. Sci. Rep. 6, 33182 (2016).Article

Kaddoura,T。等人,《肺动脉高压的声学诊断:自动语音识别启发的分类算法优于医生》。科学。代表633182(2016)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Hung, J.-Y. et al. An outperforming artificial intelligence model to identify referable blepharoptosis for general practitioners. J. Personalized Med. 12, 283 (2022).Article

Hung,J.-Y.等人。一种优于人工智能的模型,用于识别全科医生可参考的上睑下垂。J、 个性化医学12283(2022)。文章

Google Scholar

谷歌学者

Nishida, N. et al. Artificial intelligence (AI) models for the ultrasonographic diagnosis of liver tumors and comparison of diagnostic accuracies between AI and human experts. J. Gastroenterol. 57, 309–321 (2022).Article

Nishida,N.等人。用于肝肿瘤超声诊断的人工智能(AI)模型以及AI和人类专家之间诊断准确性的比较。J、 胃肠道。57309-321(2022)。文章

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Crowson, M. G. et al. Paediatric sleep apnea event prediction using nasal air pressure and machine learning. J. Sleep. Res. 32, e13851 (2023).Article

Crowson,M.G.等人。使用鼻气压和机器学习预测小儿睡眠呼吸暂停事件。J、 睡觉。第32号决议,e13851(2023年)。文章

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Eskreis-Winkler, S. et al. Breast MRI Background Parenchymal Enhancement Categorization Using Deep Learning: Outperforming the Radiologist. J. Magn. Reson. Imaging 56, 1068–1076 (2022).Article

Eskreis Winkler,S.等人。使用深度学习的乳腺MRI背景实质增强分类:优于放射科医生。J、 马格纳。雷森。成像561068-1076(2022)。文章

PubMed

PubMed

Google Scholar

谷歌学者

Soydan, Z. et al. An AI based classifier model for lateral pillar classification of Legg–Calve–Perthes. Sci. Rep. 13, 6870 (2023).Article

Soydan,Z.等人。基于AI的Legg–Calve–Perthes侧柱分类分类器模型。科学。代表136870(2023)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Zhang, J., Chen, Z., Wu, J. & Liu, K. An intelligent decision-making support system for the detection and staging of prostate cancer in developing countries. Computational Math. Methods Med. 2020, 5363549 (2020).Article

Zhang,J.,Chen,Z.,Wu,J。&Liu,K。一种用于发展中国家前列腺癌检测和分期的智能决策支持系统。计算数学。方法医学杂志205363549(2020)。文章

Google Scholar

谷歌学者

Banja, J. D., Hollstein, R. D. & Bruno, M. A. When Artificial Intelligence Models Surpass Physician Performance: Medical Malpractice Liability in an Era of Advanced Artificial Intelligence. J. Am. Coll. Radiol. 19, 816–820 (2022).Article

。J、 美国科罗拉多州。放射性。19816-820(2022)。文章

PubMed

PubMed

Google Scholar

谷歌学者

Froomkin, A. M., Kerr, I. & Pineau, J. When AIs outperform doctors: confronting the challenges of a tort-induced over-reliance on machine learning. Ariz. L. Rev. 61, 33 (2019).

Froomkin,A.M.,Kerr,I。&Pineau,J。当AIs表现优于医生时:面对侵权引起的过度依赖机器学习的挑战。Ariz.L.Rev.61,33(2019)。

Google Scholar

谷歌学者

Download referencesAcknowledgementsThis study was funded by the Dutch Research Council (Nederlandse Organisatie voor Wetenschappelijk Onderzoek, NWO)—project number 406.DI.19.089. We want to thank our fellow RAIDIO project members—Sally Wyatt, Flora Lysen, and Shoko Vos—and the members of the AI ethics seminars at the UMC Utrecht for their insightful thoughts and suggestions, which helped to improve the manuscript.Author informationAuthors and AffiliationsUniversity Medical Center, Utrecht, The NetherlandsJojanneke Drogt, Megan Milota, Anne van den Brink & Karin JongsmaAuthorsJojanneke DrogtView author publicationsYou can also search for this author in.

下载参考文献致谢本研究由荷兰研究理事会(Nederlandse Organizatie voor Wetenschapelijk Onderzoek,NWO)资助-项目编号406.DI.19.089。我们要感谢RAIDIO项目的其他成员Sally Wyatt,Flora Lysen和Shoko Vos以及UMC乌得勒支人工智能伦理研讨会的成员,他们的深刻想法和建议有助于改进手稿。作者信息作者和附属机构大学医学中心,乌得勒支,荷兰Jojanneke Drogt,Megan Milota,Anne van den Brink&Karin JongsmaAuthorsJojanneke DrogtView作者出版物您也可以在中搜索这位作者。

PubMed Google ScholarMegan MilotaView author publicationsYou can also search for this author in

PubMed Google ScholarMegan MilotaView作者出版物您也可以在

PubMed Google ScholarAnne van den BrinkView author publicationsYou can also search for this author in

PubMed Google ScholarAnne van den BrinkView作者出版物您也可以在

PubMed Google ScholarKarin JongsmaView author publicationsYou can also search for this author in

PubMed Google ScholarKarin JongsmaView作者出版物您也可以在

PubMed Google ScholarContributionsJ.D., K.R., and M.M. were responsible for conceptualizing the initial concept for this manuscript. A.B. and J.D. analyzed the literature; J.D. was responsible for drafting and revising the manuscript. K.R., M.M., and A.B. provided critical intellectual input and revisions.

PubMed谷歌学术贡献。D、 ,K.R.和M.M.负责对本手稿的初始概念进行概念化。A、 B.和J.D.分析了文献;J、 D.负责起草和修改稿件。K、 R.,M.M。和A.B.提供了关键的智力投入和修订。

All authors approved the manuscript.Corresponding authorCorrespondence to.

所有作者都批准了手稿。。

Jojanneke Drogt.Ethics declarations

Jojanneke Drogt。道德宣言

Competing interests

相互竞争的利益

The authors declare no competing interests.

作者声明没有利益冲突。

Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Rights and permissions

。权限和权限

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material.

开放获取本文是根据知识共享署名非商业性NoDerivatives 4.0国际许可证授权的,该许可证允许以任何媒介或格式进行任何非商业性使用,共享,分发和复制,只要您对原始作者和来源给予适当的信任,提供知识共享许可证的链接,并指出您是否修改了许可材料。

You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

根据本许可证,您无权共享源自本文或其部分的改编材料。本文中的图像或其他第三方材料包含在文章的知识共享许可证中,除非该材料的信用额度中另有说明。如果材料未包含在文章的知识共享许可中,并且您的预期用途不受法律法规的许可或超出许可用途,则您需要直接获得版权所有者的许可。

To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/..

要查看此许可证的副本,请访问http://creativecommons.org/licenses/by-nc-nd/4.0/..

Reprints and permissionsAbout this articleCite this articleDrogt, J., Milota, M., van den Brink, A. et al. Ethical guidance for reporting and evaluating claims of AI outperforming human doctors.

转载和许可本文引用本文Drogt,J.,Milota,M.,van den Brink,A。等人。报告和评估AI优于人类医生的声明的道德指南。

npj Digit. Med. 7, 271 (2024). https://doi.org/10.1038/s41746-024-01255-wDownload citationReceived: 27 May 2024Accepted: 08 September 2024Published: 02 October 2024DOI: https://doi.org/10.1038/s41746-024-01255-wShare this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy to clipboard.

npj数字。医学7271(2024)。https://doi.org/10.1038/s41746-024-01255-wDownloadhttps://doi.org/10.1038/s41746-024-01255-wShare本文与您共享以下链接的任何人都可以阅读此内容:获取可共享链接对不起,本文目前没有可共享的链接。复制到剪贴板。

Provided by the Springer Nature SharedIt content-sharing initiative

由Springer Nature SharedIt内容共享计划提供