EN
登录

面向多种生物医学任务的多面手视觉-语言基础模型

A generalist vision–language foundation model for diverse biomedical tasks

Nature 等信源发布 2024-08-07 17:42

可切换为仅中文


AbstractTraditional biomedical artificial intelligence (AI) models, designed for specific tasks or modalities, often exhibit limited flexibility in real-world deployment and struggle to utilize holistic information. Generalist AI holds the potential to address these limitations due to its versatility in interpreting different data types and generating tailored outputs for diverse needs.

摘要为特定任务或模式设计的传统生物医学人工智能(AI)模型在现实世界的部署中往往表现出有限的灵活性,并且难以利用整体信息。多面手AI具有解决这些限制的潜力,因为它在解释不同数据类型和生成针对不同需求的定制输出方面具有多功能性。

However, existing biomedical generalist AI solutions are typically heavyweight and closed source to researchers, practitioners and patients. Here, we describe BiomedGPT, the first open-source and lightweight vision–language foundation model, designed as a generalist capable of performing various biomedical tasks.

然而,现有的生物医学通才AI解决方案通常是研究人员,从业者和患者的重量级和封闭来源。在这里,我们描述了BiomedGPT,这是第一个开源和轻量级的视觉语言基础模型,设计为能够执行各种生物医学任务的通才。

BiomedGPT achieved state-of-the-art results in 16 out of 25 experiments while maintaining a computing-friendly model scale. We also conducted human evaluations to assess the capabilities of BiomedGPT in radiology visual question answering, report generation and summarization. BiomedGPT exhibits robust prediction ability with a low error rate of 3.8% in question answering, satisfactory performance with an error rate of 8.3% in writing complex radiology reports, and competitive summarization ability with a nearly equivalent preference score to human experts.

BiomedGPT在25个实验中有16个取得了最先进的结果,同时保持了计算友好的模型规模。我们还进行了人体评估,以评估BiomedGPT在放射学视觉问答,报告生成和摘要方面的能力。BiomedGPT具有强大的预测能力,在回答问题时的错误率低至3.8%,在撰写复杂的放射学报告时表现出令人满意的性能,错误率为8.3%,并且具有与人类专家几乎相等的偏好评分的竞争性摘要能力。

Our method demonstrates that effective training with diverse data can lead to more practical biomedical AI for improving diagnosis and workflow efficiency..

我们的方法表明,使用不同数据进行有效训练可以产生更实用的生物医学AI,以提高诊断和工作流程效率。。

Access through your institution

通过您的机构访问

Buy or subscribe

购买或订阅

This is a preview of subscription content, access via your institution

这是订阅内容的预览,可通过您的机构访问

Access options

访问选项

Access through your institution

通过您的机构访问

Access through your institution

通过您的机构访问

Change institution

变革机构

Buy or subscribe

购买或订阅

Access Nature and 54 other Nature Portfolio journalsGet Nature+, our best-value online-access subscription24,99 € / 30 dayscancel any timeLearn moreSubscription info for Chinese customersWe have a dedicated website for our Chinese customers. Please go to naturechina.com to subscribe to this journal.Go to naturechina.comBuy this articlePurchase on Springer LinkInstant access to full article PDFBuy nowPrices may be subject to local taxes which are calculated during checkout.

Access Nature和54篇其他Nature Portfolio journalsGet Nature+,我们最有价值的在线订阅24,99欧元/30天,随时为中国客户获取更多订阅信息我们为中国客户提供了一个专门的网站。请访问naturechina.com订阅本期刊。访问naturechina.comBuy本文在Springer link上购买即时访问完整文章PDFBuy now价格可能需要缴纳结帐时计算的地方税。

Additional access options:

其他访问选项:

Log in

登录

Learn about institutional subscriptions

了解机构订阅

Read our FAQs

阅读我们的常见问题

Contact customer support

联系客户支持

Fig. 1: BiomedGPT can process diverse modalities and perform versatile tasks.Fig. 2: An overview of BiomedGPT: workflow, performance and pretraining datasets.Fig. 3: BiomedGPT performs fine-tuning for vision–language and medical-image-classification downstream tasks.Fig. 4: BiomedGPT performs few-epoch transfer learning for clinical-text understanding and summarization and generates a response through zero-shot transfer learning.Fig.

图1:BiomedGPT可以处理多种模式并执行多种任务。图2:BiomedGPT概述:工作流程,性能和预训练数据集。图3:BiomedGPT对视觉语言和医学图像分类下游任务进行微调。图4:BiomedGPT对临床文本理解和总结进行少量的历元迁移学习,并通过零次迁移学习产生响应。图。

5: Human evaluation of the VQA, text-summarization and captioning tasks.Fig. 6: Results of the ablation study on the impact of diversity of pretraining datasets and tasks and a graphical demonstration of BiomedGPT’s design..

5: VQA、文本摘要和字幕任务的人工评估。图6:关于预训练数据集和任务多样性影响的消融研究结果以及BiomedGPT设计的图形演示。。

Data availability

数据可用性

All data in this study are publicly available and can be accessed from: IU X-ray and Peir Gross (https://github.com/nlpaueb/bioCaption), MedICat (https://github.com/allenai/medicat), PathVQA (https://huggingface.co/datasets/flaviagiammarino/path-vqa), SLAKE 1.0 (https://www.med-vqa.com/slake/), DeepLesion (https://nihcc.app.box.com/v/DeepLesion), OIA-DDR (https://github.com/nkicsl/OIA), CheXpert- v1.0-small (https://www.kaggle.com/datasets/willarevalo/chexpert-v10-small), CytoImageNet (https://www.kaggle.com/datasets/stanleyhua/cytoimagenet), ISIC 2020 (https://challenge2020.isic-archive.com), Retinal Fundus (https://www.kaggle.com/c/diabetic-retinopathy-detection), MIMIC-III Clinic Notes (https://paperswithcode.com/dataset/hospital-admission-notes-from-mimic-iii), NCBI BioNLP (https://www.ncbi.nlm.nih.gov/research/bionlp/Data/), PubMed abstracts derived from the BLUE benchmark (https://github.com/ncbi-nlp/BLUE_Benchmark), VQA-RAD (https://osf.io/89kps/), CBIS-DDSM (https://www.kaggle.com/datasets/awsaf49/cbis-ddsm-breast-cancer-image-dataset), SZ-CXR and MC-CXR (access can be requested via the contact at http://archive.nlm.nih.gov/repos/chestImages.php), MIMIC-CXR (https://physionet.org/content/mimic-cxr-jpg/2.1.0/), MedNLI (https://physionet.org/content/mednli/1.0.0/), TREC 2022 (https://www.trec-cds.org/2022.html), SEER (https://seer.cancer.gov), MIMIC-III (https://physionet.org/content/mimiciii/1.4/), HealthcareMagic (https://huggingface.co/datasets/UCSD26/medical_dialog), MeQSum (https://huggingface.co/datasets/sumedh/MeQSum), MedMNIST v2 (https://medmnist.com) and ROCO (https://github.com/razorx89/roco-dataset).

All data in this study are publicly available and can be accessed from: IU X-ray and Peir Gross (https://github.com/nlpaueb/bioCaption), MedICat (https://github.com/allenai/medicat), PathVQA (https://huggingface.co/datasets/flaviagiammarino/path-vqa), SLAKE 1.0 (https://www.med-vqa.com/slake/), DeepLesion (https://nihcc.app.box.com/v/DeepLesion), OIA-DDR (https://github.com/nkicsl/OIA), CheXpert- v1.0-small (https://www.kaggle.com/datasets/willarevalo/chexpert-v10-small), CytoImageNet (https://www.kaggle.com/datasets/stanleyhua/cytoimagenet), ISIC 2020 (https://challenge2020.isic-archive.com), Retinal Fundus (https://www.kaggle.com/c/diabetic-retinopathy-detection), MIMIC-III Clinic Notes (https://paperswithcode.com/dataset/hospital-admission-notes-from-mimic-iii), NCBI BioNLP (https://www.ncbi.nlm.nih.gov/research/bionlp/Data/), PubMed abstracts derived from the BLUE benchmark (https://github.com/ncbi-nlp/BLUE_Benchmark), VQA-RAD (https://osf.io/89kps/), CBIS-DDSM (https://www.kaggle.com/datasets/awsaf49/cbis-ddsm-breast-cancer-image-dataset), SZ-CXR and MC-CXR (access can be requested via the contact at http://archive.nlm.nih.gov/repos/chestImages.php), MIMIC-CXR (https://physionet.org/content/mimic-cxr-jpg/2.1.0/), MedNLI (https://physionet.org/content/mednli/1.0.0/), TREC 2022 (https://www.trec-cds.org/2022.html), SEER (https://seer.cancer.gov), MIMIC-III (https://physionet.org/content/mimiciii/1.4/), HealthcareMagic (https://huggingface.co/datasets/UCSD26/medical_dialog), MeQSum (https://huggingface.co/datasets/sumedh/MeQSum), MedMNIST v2 (https://medmnist.com) and ROCO (https://github.com/razorx89/roco-dataset).

A randomly sampled subset of RSNA Pneumonia Detection Challenge (2018) was used for zero-shot prediction (https://www.rsna.org/rsnai/ai-image-challenge/rs.

随机抽样的RSNA肺炎检测挑战(2018)子集用于零次射击预测(https://www.rsna.org/rsnai/ai-image-challenge/rs.

Code availability

代码可用性

The pretrained and fine-tuned models, as well as source code for training, inference and data preprocessing, can be accessed at https://github.com/taokz/BiomedGPT.

预训练和微调的模型,以及用于训练、推理和数据预处理的源代码,可以在https://github.com/taokz/BiomedGPT.

ReferencesThirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1930–1940 (2023).Article

参考文献Thirunaukarasu,A.J.等人,《医学中的大型语言模型》。《自然医学》291930-1940(2023)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).Article

Moor,M.等人,《通才医学人工智能基础模型》。自然616259-265(2023)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Moody, L. et al. The person-centred care guideline: from principle to practice. J. Patient Exp. 5, 282–288 (2018).Article

Moody,L.等人,《以人为本的护理指南:从原则到实践》。J、 患者实验5282-288(2018)。文章

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Langberg, E. M., Dyhr, L. & Davidsen, A. S. Development of the concept of patient-centredness–a systematic review. Patient Educ. Couns. 102, 1228–1236 (2019).Article

Langberg,E.M.,Dyhr,L。和Davidsen,A.S。患者中心概念的发展-系统综述。患者教育。国家。1021228-1236(2019)。文章

PubMed

PubMed

Google Scholar

谷歌学者

Bates, D. W. et al. Reducing the frequency of errors in medicine using information technology. J. Am. Med. Inform. Assoc. 8, 299–308 (2001).Article

Bates,D.W.等人,《利用信息技术减少医学错误的频率》。J、 上午医疗通知。协会8299-308(2001)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Tu, T. et al. Towards generalist biomedical AI. NEJM AI https://doi.org/10.1056/AIoa2300138 (2024).Reed, S. et al. A generalist agent. Transact. Mach. Learn. Res. https://openreview.net/pdf?id=1ikK0kHjvj (2022).Driess, D. et al. Palm-e: an embodied multimodal language model. In Proc.

Tu,T。等人。走向多面手生物医学AI。NEJM AIhttps://doi.org/10.1056/AIoa2300138(2024年)。Reed,S.等人,一位多面手。交易。马赫。学习。研究。https://openreview.net/pdf?id=1ikK0kHjvj(2022年)。Driess,D。等人,《Palm-e:一种具身的多模态语言模型》。在过程中。

40th International Conference on Machine Learning 8469–8488 (JMLR.org, 2023).Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems 30 (Neural Information Processing Systems Foundation, 2017).Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf.

第40届国际机器学习会议8469-8488(JMLR.org,2023)。Vaswani,A。等人。注意力就是你所需要的。神经信息处理系统进展30(神经信息处理系统基金会,2017)。Brown,T。等人。语言模型很少是短期学习者。高级神经信息。

Process. Syst. 33, 1877–1901 (2020)..

过程。系统。331877年至1901年(2020年)。。

Google Scholar

谷歌学者

Touvron, H. et al. Llama: open and efficient foundation language models. Preprint at https://arxiv.org/abs/2302.13971 (2023).Li, C. et al. Llava-med: training a large language-and-vision assistant for biomedicine in one day. In Advances in Neural Information Processing Systems 36 (Neural Information Processing Systems Foundation, 2024).Wu, C., Zhang, X., Zhang, Y., Wang, Y., & Xie, W.

Touvron,H.等人,《美洲驼:开放高效的基础语言模型》。预印于https://arxiv.org/abs/2302.13971(2023年)。Li,C.等人。Llava med:在一天内为生物医学培训一名大型语言和视觉助手。神经信息处理系统进展36(神经信息处理系统基金会,2024)。Wu,C.,Zhang,X.,Zhang,Y.,Wang,Y.,&Xie,W。

Towards generalist foundation model for radiology. Preprint at https://arxiv.org/abs/2308.02463 (2023).Luo, R. et al. BioGPT: generative pretrained transformer for biomedical text generation and mining. Brief. Bioinform. 23, bbac409 (2022).Article .

走向放射学的通才基础模型。预印于https://arxiv.org/abs/2308.02463(2023年)。Luo,R。等人。BioGPT:用于生物医学文本生成和挖掘的生成预训练转换器。简介。生物信息。23,bbac409(2022)。文章。

PubMed

PubMed

Google Scholar

谷歌学者

Zhang, S. et al. Biomedclip: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. Preprint at https://arxiv.org/abs/2303.00915 (2023).Phan, L. N. et al. Scifive: a text-to-text transformer model for biomedical literature. Preprint at https://arxiv.org/abs/2106.03598 (2021).Lau, J.

Zhang,S.等人。Biomedclip:一个多模式生物医学基金会模型,从1500万个科学图像-文本对中预训练。预印于https://arxiv.org/abs/2303.00915(2023年)。Phan,L.N.等人,《Scifive:生物医学文献的文本到文本转换模型》。预印于https://arxiv.org/abs/2106.03598(2021年)。刘,J。

et al. A dataset of clinically generated visual questions and answers about radiology images. Sci. Data 5, 180251 (2018).Article .

等。关于放射学图像的临床生成的视觉问题和答案的数据集。。数据5180251(2018)。文章。

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Liu, B. et al. Slake: a semantically-labeled knowledge-enhanced dataset for medical visual question answering. In Proc. IEEE International Symposium on Biomedical Imaging (ISBI) 1650–1654 (Institute of Electrical and Electronics Engineers, 2021).He, X. et al. Towards visual question answering on pathology images.

Liu,B。et al。Slake:用于医学视觉问答的语义标记知识增强数据集。在Proc。IEEE生物医学成像国际研讨会(ISBI)1650–1654(电气与电子工程师学会,2021)。He,X.等人对病理图像的视觉问答。

In Proc. of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) 708–718 (Association for Computational Linguistics. 2021).Demner-Fushman, D. et al. Preparing a collection of radiology examinations for distribution and retrieval.

在过程中。计算语言学协会第59届年会和第11届自然语言处理国际联合会议(第2卷:短文)708-718(计算语言学协会,2021)。Demner-Fushman,D.等人。准备一系列放射学检查以供分发和检索。

J. Am. Med. Inform. Assoc. 23, 304–310 (2016).Article .

J、 上午医疗通知。协会23304–310(2016)。文章。

PubMed

PubMed

Google Scholar

谷歌学者

Johnson, A. E. et al. MIMIC-CXR-JPG — chest radiographs with structured labels. PhysioNet 101, 215–220 (2019).

Johnson,A.E.等人。MIMIC-CXR-JPG-具有结构化标签的胸片。PhysioNet 101215-220(2019)。

Google Scholar

谷歌学者

Pavlopoulos, J., Kougia, V., & Androutsopoulos, I. A survey on biomedical image captioning. In Proc. Second Workshop on Shortcomings in Vision and Language 26–36 (Association for Computational Linguistics, 2019).Li, P. et al. Self-supervised vision-language pretraining for medial visual question answering.

Pavlopoulos,J.,Kougia,V.,&Androutsopoulos,I。生物医学图像字幕的调查。在过程中。关于视觉和语言缺陷的第二次研讨会26-36(计算语言学协会,2019)。Li,P。等人。用于内侧视觉问答的自我监督视觉语言预训练。

In Proc. IEEE 20th International Symposium on Biomedical Imaging (ISBI) 1–5 (Institute of Electrical and Electronics Engineers, 2023).Zhang, X. et al. Pmc-vqa: visual instruction tuning for medical visual question answering. Preprint at https://arxiv.org/abs/2305.10415 (2023).Van Sonsbeek, T. et al.

在过程中。IEEE第20届生物医学成像国际研讨会(ISBI)1-5(电气与电子工程师学会,2023年)。Zhang,X。等。Pmc vqa:医学视觉问答的视觉指令调整。预印本https://arxiv.org/abs/2305.10415(2023年)。Van Sonsbeek,T。等人。

Open-ended medical visual question answering through prefix tuning of language models. In International Conference on Medical Image Computing and Computer-Assisted Intervention 726–736 (MICCAI, 2023).Lin, C. Y. Rouge: a package for automatic evaluation of summaries. In Text Summarization Branches Out 74–81 (Association for Computational Linguistics, 2004).Banerjee, S.

通过语言模型的前缀调整开放式医学视觉问答。在国际医学图像计算和计算机辅助干预会议726-736(MICCAI,2023)。Lin,C.Y.Rouge:用于自动评估摘要的软件包。文本摘要分支74-81(计算语言学协会,2004)。班纳吉,S。

& Lavie, A. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (eds. Goldstein, J., Lavie, A., Lin, C.-Y. & Voss, C.) 65–72 (Association for Computational Linguistics, 2005).Vedantam, R., Zitnick, C.

&Lavie,A。Meteor:mt评估的自动度量,与人类判断的相关性得到改善。在过程中。ACL机器翻译和/或摘要的内在和外在评估措施研讨会(编辑:Goldstein,J.,Lavie,A.,Lin,C.-Y.&Voss,C。)65-72(计算语言学协会,2005)。韦丹坦(Vedantam),R。齐特尼克(Zitnick)。

L. & Parikh, D. Cider: Consensus-based image description evaluation. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR) 4566–4575 (Institute of Electrical and Electronics Engineers, 2015).Jing, B., Xie, P. & Xing, E. On the automatic generation of medical imaging reports. In Proc.

五十、 &Parikh,D。Cider:基于共识的图像描述评估。在过程中。计算机视觉与模式识别会议(CVPR)4566–4575(电气与电子工程师学会,2015)。Jing,B.,Xie,P。&Xing,E。关于医学影像报告的自动生成。在过程中。

56th Annual Meeting of the Association for Computational Linguistics (eds. Gurevych, I. & Miyao, Y.) 2577–2586 (Association for Computational Ling.

计算语言学协会第56届年会(编辑:Gurevych,I.&Miyao,Y.)2577-2586(计算语言学协会)。

Google Scholar

谷歌学者

Van Veen, D. et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat. Med. 30, 1134–1142 (2024).Jing, B., Xie, P. & Xing, E. On the automatic generation of medical imaging reports. Proc. 56th Annual Meeting of the Association for Computational Linguistics 1 (eds.

Van Veen,D.等人采用的大型语言模型在临床文本摘要方面优于医学专家。《自然医学》301134-1142(2024)。Jing,B.,Xie,P。&Xing,E。关于医学影像报告的自动生成。程序。计算语言学协会第56届年会(编辑)。

Gurevych, I. & Miyao, Y.) 2577–2586 (2018).Yang, J. et al. MedMNIST v2 - a large-scale lightweight benchmark for 2D and 3D biomedical image classification. Sci. Data 10, 41 (2023).Jaeger, S. et al. Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant. Imaging Med.

Gurevych,I。和Miyao,Y.)2577-2586(2018)。Yang,J。et al。MedMNIST v2-用于2D和3D生物医学图像分类的大规模轻量级基准。。数据10,41(2023)。Jaeger,S.等人。用于计算机辅助筛查肺部疾病的两个公共胸部X射线数据集。数量。成像医学。

Surg. 4, 475–477 (2014).Capellán-Martín, D. et al. A lightweight, rapid and efficient deep convolutional network for chest x-ray tuberculosis detection. In Proc. 2023 IEEE 20th Int. Symp. Biomed. Imaging (ISBI) 1–5 (IEEE, 2023).Manzari, O. N. et al. Medvit: a robust vision transformer for generalized medical image classification.

Surg.4475-477(2014)。Capellán-Martín,D.等人。一种用于胸部x射线结核病检测的轻质,快速和有效的深度卷积网络。在过程中。2023年IEEE第20届国际研讨会。生物医学。成像(ISBI)1-5(IEEE,2023)。Manzari,O.N.等人。Medvit:用于广义医学图像分类的强大视觉转换器。

Comput. Biol. Med. 157, 106791 (2023).Article .

计算。《生物医学》157106791(2023)。第条。

PubMed

PubMed

Google Scholar

谷歌学者

Lee, R. S. et al. A curated mammography data set for use in computer-aided detection and diagnosis research. Sci. Data 4, 1–9 (2017).Article

Lee,R.S.等人。用于计算机辅助检测和诊断研究的精选乳腺X线照相数据集。。数据4,1-9(2017)。文章

Google Scholar

谷歌学者

Romanov, A. & Shivade, C. Lessons from natural language inference in the clinical domain. In Proc. 2018 Conference on Empirical Methods in Natural Language Processing 1586–1596 (Association for Computational Linguistics, 2018).Gloeckler Ries, L. A. et al. Cancer survival and incidence from the surveillance, epidemiology, and end results (SEER) program.

Romanov,A。&Shivade,C。临床领域自然语言推理的经验教训。在过程中。2018年自然语言处理经验方法会议1586-1596(计算语言学协会,2018)。Gloeckler Ries,L.A.等人,《监测、流行病学和最终结果(SEER)计划中的癌症生存率和发病率》。

Oncologist 8, 541–552 (2003).Article .

肿瘤学家8541-552(2003)。文章。

PubMed

PubMed

Google Scholar

谷歌学者

Abacha, A. B., & Demner-Fushman, D. On the summarization of consumer health questions. In Proc. 57th Annual Meeting of the Association for Computational Linguistics 2228–2234 (2019).Zeng, G. et al. Meddialog: large-scale medical dialogue datasets. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 9241–9250 (Association for Computational Linguistics, 2020).Johnson, A.

Abacha,A.B。和Demner-Fushman,D。关于消费者健康问题的总结。在过程中。计算语言学协会第57届年会2228-2234(2019)。Zeng,G。等人。Meddialog:大规模医学对话数据集。在过程中。2020年自然语言处理经验方法会议(EMNLP)9241-9250(计算语言学协会,2020)。约翰逊,A。

E. et al. MIMIC-III a freely accessible critical care database. Sci. Data 3, 1–9 (2019)..

E、 等人。MIMIC-III是一个可免费访问的重症监护数据库。。数据3,1-9(2019)。。

Google Scholar

谷歌学者

Dubey, S. et al. Using machine learning for healthcare treatment planning. Front. Artif. Intell. 6, 1124182 (2023).Roberts, K. et al. Overview of the TREC 2021 clinical trials track. In Proc. Thirtieth Text Retrieval Conference (TREC, 2021).Van Aken, B. et al. Clinical outcome prediction from admission notes using self-supervised knowledge integration.

Dubey,S.等人。使用机器学习进行医疗保健治疗计划。正面。人工制品。因特尔。61124182(2023)。Roberts,K.等人,《TREC 2021临床试验轨道概述》。在过程中。第三十届文本检索会议(TREC,2021)。Van Aken,B。等人。使用自我监督知识整合从入院记录中预测临床结果。

In Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume 881–893 (Association for Computational Linguistics, 2021).OpenAI. GPT-4V(ision) system card. OpenAI https://openai.com/research/gpt-4v-system-card (2023).Wang, P. et al. OFA: unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework.

在过程中。计算语言学协会欧洲分会第十六届会议:第881-893卷(计算语言学协会,2021年)。OpenAI。GPT-4V(ision)系统卡。OpenAIhttps://openai.com/research/gpt-4v-system-card(2023年)。Wang,P。et al。OFA:通过简单的序列到序列学习框架统一架构、任务和模式。

Proc. Int. Conf. Mach. Learn. PMLR 162, 23318–23340 (2022)..

Proc. Int. Conf. Mach. Learn. PMLR 162, 23318–23340 (2022)..

Google Scholar

谷歌学者

Hu, X. et al. Expert knowledge-aware image difference graph representation learning for difference-aware medical visual question answering. In Proc. 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 4156–4165 (Association for Computing Machinery, 2023).Jeong, J. et al.

Hu,X。等人。用于差异感知医学视觉问答的专家知识感知图像差异图表示学习。在Proc。第29届ACM SIGKDD知识发现和数据挖掘会议4156-4165(计算机械协会,2023)。Jeong,J。等人。

Multimodal image-text matching improves retrieval-based chest x-ray report generation. In Proc. Medical Imaging with Deep Learning 227 978–990 (Proceedings of Machine Learning Research, 2024).Fu, S. et al. Assessment of data quality variability across two EHR systems through a case study of post-surgical complications.

多模式图像文本匹配改进了基于检索的胸部x射线报告生成。在过程中。深度学习医学成像227 978-990(机器学习研究论文集,2024)。Fu,S.等人。通过手术后并发症的案例研究评估两个EHR系统的数据质量变异性。

In Proc. AMIA Joint Summits on Translational Science 196–205 (American Medical Informatics Association, 2022).Delbrouck, J. B. et al. Improving the factual correctness of radiology report generation with semantic rewards. In Findings of the Association for Computational Linguistics: EMNLP 2022 (eds.

在过程中。AMIA转化科学联合峰会196-205(美国医学信息学协会,2022)。Delbrouck,J.B.等人。通过语义奖励改进放射学报告生成的事实正确性。在计算语言学协会的研究结果中:EMNLP 2022(eds。

Goldberg, Y., Kozareva, Z. & Zhang, Y.) 4348–4360 (Association for Computational Linguistics, 2022).Yang, H., Lin, J., Yang, A., Wang, P. & Zhou, C. Prompt tuning for unified multimodal pretrained models. In Findings of the Association for Computational Linguistics: ACL 2023 (eds. Rogers, A., Boyd-Graber, J.

Goldberg,Y.,Kozareva,Z。&Zhang,Y。)4348-4360(计算语言学协会,2022)。Yang,H.,Lin,J.,Yang,A.,Wang,P。&Zhou,C。快速调整统一的多模态预训练模型。在计算语言学协会的研究结果:ACL 2023(编辑罗杰斯,A.,博伊德·格雷伯,J。

& Okazaki, N.) 402–416 (Association for Computational Linguistics, 2023).Chen, Z. et al. Towards understanding the mixture-of-experts layer in deep learning. Adv. Neural Inf. Process. Syst. 35, 23049–23062 (2022)..

&冈崎,N.)402-416(计算语言学协会,2023)。Chen,Z.等人,深入理解深度学习中专家层的混合。高级神经信息处理。系统。3523049–23062(2022年)。。

Google Scholar

谷歌学者

Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In International Conference on Learning Representations. (2021).Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. BERT: pretraining of deep bidirectional transformers for language understanding.

Dosovitskiy,A。等人。一幅图像相当于16×16个单词:用于图像识别的变压器。在国际学习表征会议上。(2021年)。Devlin,J.,Chang,M.W.,Lee,K。和Toutanova,K。BERT:深度双向变压器的预训练,用于语言理解。

In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds. Burstein, J., Doran, C. & Solorio, T.) 4171–4186 (Association for Computational Linguistics, 2019).Ke, G. He, D. & Liu, T. Y. Rethinking positional encoding in language pretraining.

在过程中。2019年计算语言学协会北美分会会议:人类语言技术(编辑:Burstein,J.,Doran,C。&Solorio,T。)4171-4186(计算语言学协会,2019)。Ke,G。He,D。&Liu,T。Y。反思语言预训练中的位置编码。

In International Conference on Learning Representations (ICLR, 2019).Ba, J. L., Kiros, J. R. & Hinton, G.E. Layer normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016)Shleifer, S., Weston, J. & Ott, M., Normformer: Improved transformer pretraining with extra normalization. Preprint at https://arxiv.org/abs/2110.09456 (2021).Dai, Z., Liu, H., Le, Q.

在国际学习表征会议(ICLR,2019)上。Ba,J.L.,Kiros,J.R。&Hinton,G.E。层归一化。预印于https://arxiv.org/abs/1607.06450(2016)Shleifer,S.,Weston,J。&Ott,M.,Normmer:改进的变压器预训练,具有额外的归一化。预印于https://arxiv.org/abs/2110.09456(2021年)。戴Z.,刘H.,乐Q。

V. & Tan, M. Coatnet: marrying convolution and attention for all data sizes. In Proc. Advances in Neural Information Processing Systems 34 (NeurIPS 2021) 3965–3977 (Neural Information Processing Systems, 2021).Wang, Z. et al. SimVLM: simple visual language model pretraining with weak supervision. In International Conference on Learning Representations.

五、 &Tan,M。Coatnet:将卷积和注意力结合到所有数据大小上。在过程中。神经信息处理系统进展34(NeurIPS 2021)3965-3977(Neural Information Processing Systems,2021)。Wang,Z。等人。SimVLM:简单的视觉语言模型预训练,监督较弱。在国际学习表征会议上。

(International Conference on Learning Representations, 2022).Esser, P., Rombach, R. & Ommer, B. Taming transformers for high-resolution image synthesis. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 12873–12883 (Institute of Electrical and Electronics Engineers/Computer Vision Foundation, 2021).Chen, T.

(国际学习表征会议,2022年)。Esser,P.,Rombach,R。&Ommer,B。用于高分辨率图像合成的驯服变压器。在过程中。IEEE/CVF计算机视觉和模式识别会议(CVPR)12873–12883(电气与电子工程师学会/计算机视觉基金会,2021年)。陈,T。

et al. Pix2seq: a language modeling framework for object detection. In International Conference on Learning Repr.

Pix2seq:用于对象检测的语言建模框架。在国际学习报告会议上。

Google Scholar

谷歌学者

He, K. et al. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (Institute of Electrical and Electronics Engineers, 2016).Wei, J. et al. Finetuned language models are zero-shot learners. In International Conference on Learning Representations (International Conference on Learning Representations, 2022).Schick, T.

He,K.等人。用于图像识别的深度残差学习。在过程中。IEEE计算机视觉和模式识别会议770-778(电气与电子工程师学会,2016)。Wei,J。等人。微调语言模型是零射击学习者。在国际学习表征会议(国际学习表征会议,2022年)上。希克,T。

& Schütze, H. It’s not just size that matters: small language models are also few-shot learners. In Proc. 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds. Toutanova, K. et al.) 2339-2352 (Association for Computational Linguistics, 2021).Bao, H.

&Schütze,H。重要的不仅仅是规模:小型语言模型也是少数快速学习者。在过程中。2021年计算语言学协会北美分会会议:人类语言技术(eds.Toutanova,K.et al。)2339-2352(计算语言学协会,2021)。鲍,H。

et al. BEiT: BERT pretraining of image transformers. In International Conference on Learning Representations (International Conference on Learning Representations, 2022).Xu, H. et al. E2E-VLP: end-to-end vision-language pretraining enhanced by visual learning. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (eds.

贝特(BEiT)等人:伯特(BERT)对图像转换器进行了预训练。在国际学习表征会议(国际学习表征会议,2022年)上。Xu,H。等人。E2E-VLP:通过视觉学习增强的端到端视觉语言预训练。在过程中。计算语言学协会第59届年会和第11届自然语言处理国际联合会议(编辑)。

Zong, C. et al.) 503–513 (2021).Sutskever, I., Vinyals, O. & Le, Q.V. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27 (Conference on Neural Information Processing Systems, 2014).Loshchilov, I. & Hutter, F. Decoupled weight decay regularization.

Zong,C.等人)503-513(2021)。Sutskever,I.,Vinyals,O。&Le,Q.V。使用神经网络进行序列间学习。神经信息处理系统进展27(神经信息处理系统会议,2014)。Loshchilov,I。&Hutter,F。解耦权重衰减正则化。

In International Conference on Learning Representations (International Conference on Learning Representations, 2019).Micikevicius, P. et al. Mixed precision training. In International Conference on Learning Representations (International Conference on Learning Representations, 2018).Raghu, M. et al.

在国际学习表征会议(国际学习表征会议,2019年)上。Micikevicius,P。等人。混合精度训练。在国际学习表征会议(2018年国际学习表征会议)上。Raghu,M.等人。

Transfusion: understanding transfer learning .

输血:理解迁移学习。

PubMed Google ScholarRong ZhouView author publicationsYou can also search for this author in

PubMed Google ScholarRong Zhou查看作者出版物您也可以在

PubMed Google ScholarEashan AdhikarlaView author publicationsYou can also search for this author in

PubMed Google ScholarEashan AdhikarlaView作者出版物您也可以在

PubMed Google ScholarZhiling YanView author publicationsYou can also search for this author in

PubMed Google ScholarZhiling YanView作者出版物您也可以在

PubMed Google ScholarYixin LiuView author publicationsYou can also search for this author in

PubMed Google ScholarYixin LiuView作者出版物您也可以在

PubMed Google ScholarJun YuView author publicationsYou can also search for this author in

PubMed Google ScholarJun YuView作者出版物您也可以在

PubMed Google ScholarZhengliang LiuView author publicationsYou can also search for this author in

PubMed谷歌学者刘正良查看作者出版物您也可以在

PubMed Google ScholarXun ChenView author publicationsYou can also search for this author in

PubMed Google ScholarXun ChenView作者出版物您也可以在

PubMed Google ScholarBrian D. DavisonView author publicationsYou can also search for this author in

PubMed Google ScholarBrian D.DavidonView作者出版物您也可以在

PubMed Google ScholarHui RenView author publicationsYou can also search for this author in

PubMed Google ScholarHui RenView作者出版物您也可以在

PubMed Google ScholarJing HuangView author publicationsYou can also search for this author in

PubMed Google ScholarJing HuangView作者出版物您也可以在

PubMed Google ScholarChen ChenView author publicationsYou can also search for this author in

PubMed Google ScholarChen ChenView作者出版物您也可以在

PubMed Google ScholarYuyin ZhouView author publicationsYou can also search for this author in

PubMed Google ScholarYuyin Zhou查看作者出版物您也可以在

PubMed Google ScholarSunyang FuView author publicationsYou can also search for this author in

PubMed Google ScholarSunyang FuView作者出版物您也可以在

PubMed Google ScholarWei LiuView author publicationsYou can also search for this author in

PubMed Google ScholarWei LiuView作者出版物您也可以在

PubMed Google ScholarTianming LiuView author publicationsYou can also search for this author in

PubMed Google ScholarTianming LiuView作者出版物您也可以在

PubMed Google ScholarXiang LiView author publicationsYou can also search for this author in

PubMed Google ScholarXiang LiView作者出版物您也可以在

PubMed Google ScholarYong ChenView author publicationsYou can also search for this author in

PubMed Google ScholarYong ChenView作者出版物您也可以在

PubMed Google ScholarLifang HeView author publicationsYou can also search for this author in

PubMed Google ScholarLifang HeView作者出版物您也可以在

PubMed Google ScholarJames ZouView author publicationsYou can also search for this author in

PubMed谷歌学者James ZouView作者出版物您也可以在

PubMed Google ScholarQuanzheng LiView author publicationsYou can also search for this author in

PubMed谷歌学者Quanzheng LiView作者出版物您也可以在

PubMed Google ScholarHongfang LiuView author publicationsYou can also search for this author in

PubMed谷歌学者刘红芳查看作者出版物您也可以在

PubMed Google ScholarLichao SunView author publicationsYou can also search for this author in

PubMed Google ScholarLichao SunView作者出版物您也可以在

PubMed Google ScholarContributionsK.Z. and L.S. designed the study. K.Z., R.Z. and E.A. carried out data collection, data preprocessing, model construction and model validation. J.Y., Z.Y., Y.L. and Z.L. carried out the data analysis benchmarking results. X.C., B.D.D., J.H., C.C., Y.Z., S.F., W.L., T.L., X.L., Y.C., L.H., J.Z., Q.L.

PubMed谷歌学术贡献SK。Z、 L.S.设计了这项研究。K、 Z.,R.Z.和E.A.进行了数据收集,数据预处理,模型构建和模型验证。J、 Y.,Z.Y.,Y.L.和Z.L.进行了数据分析基准测试结果。十、 C.,B.D.D.,J.H.,C.C.,Y.Z.,S.F.,W.L.,T.L.,X.L.,Y.C.,L.H.,J.Z.,Q.L。

and H.L. provided knowledge support and interpreted the findings. H.R. carried out the human evaluation for the generated text from BiomedGPT as well as GPT-4V. L.S. provided knowledge support, interpreted the findings and supervised the study. All authors contributed to manuscript writing and reviewed and approved the final version.

H.L.提供了知识支持并解释了研究结果。H、 R.对BiomedGPT和GPT-4V生成的文本进行了人体评估。五十、 美国提供了知识支持,解释了研究结果并监督了研究。所有作者都为稿件撰写做出了贡献,并审查并批准了最终版本。

L.H., X.L. and L.S. co-supervised the study.Corresponding authorsCorrespondence to.

五十、 H.,X.L.和L.S.共同监督了这项研究。通讯作者通讯。

Xiang Li, Lifang He or Lichao Sun.Ethics declarations

李翔、何丽芳或孙立超。道德宣言

Competing interests

相互竞争的利益

The research was conducted independently of any commercial or financial relationships that could be construed as a potential conflict of interest. Although X.C. is employed by Samsung, the company was not involved in any aspect of this research. The other authors declare no competing interests.

这项研究是独立于任何可能被解释为潜在利益冲突的商业或财务关系进行的。虽然X.C.受雇于三星,但该公司并未参与这项研究的任何方面。其他作者声明没有利益冲突。

Peer review

同行评审

Peer review information

同行评审信息

Nature Medicine thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Lorenzo Righetto, in collaboration with the Nature Medicine team.

《自然医学》感谢匿名审稿人对这项工作的同行评审做出的贡献。主要处理编辑:洛伦佐·里格托(LorenzoRighetto)与《自然医学》团队合作。

Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Extended dataExtended Data Fig. 1 Statistics of pretraining and fine-tuning datasets.(a) Modality distribution of pretraining data used in BiomedGPT.

Additional informationPublisher的注释Springer Nature在已发布的地图和机构隶属关系中的管辖权主张方面保持中立。扩展数据扩展数据图1预训练和微调数据集的统计数据。(a) BiomedGPT中使用的预训练数据的模态分布。

(b) For the training and testing splits of datasets used in downstream fine-tuning, we typically follow the format of number of training samples/number of validation samples/number of test samples to detail each dataset. More details of the data split are described in Supplementary Table 7.Extended Data Fig.

(b) 对于用于下游微调的数据集的训练和测试分割,我们通常遵循训练样本数/验证样本数/测试样本数的格式来详细描述每个数据集。补充表7中描述了数据分割的更多细节。

2 Overview of BiomedGPT’s model configuration and architecture.(a) Detailed model configuration of BiomedGPT. Here, ‘#’ indicates number of. ‘Att.’, ‘Enc.’ and ‘Dec.’ indicate Attention, Encoder and Decoder, respectively. The hidden size is the size of the embeddings and the size of the output of each self-attention and feed-forward layer.

2 BiomedGPT模型配置和体系结构概述。(a) BiomedGPT的详细模型配置。这里,“#”表示数量附件,'附件'和“12月”分别表示注意、编码器和解码器。隐藏大小是嵌入的大小以及每个自我注意和前馈层的输出大小。

The first layer of FFN expands the hidden size to the intermediate size, and the second layer contracts it back to the hidden size. This expansion and contraction allow the network to create more complex representations. During the pretraining phase, image processing involves resizing and cropping the images to varying resolutions, corresponding to the input sizes listed in the table.

FFN的第一层将隐藏大小扩展为中间大小,第二层将其收缩回隐藏大小。这种扩展和收缩允许网络创建更复杂的表示。在预训练阶段,图像处理涉及将图像调整大小并裁剪到不同的分辨率,与表中列出的输入大小相对应。

It should be noted that during fine-tuning and inference stages, the input resolution of BiomedGPT can be flexibly adjusted according to the specific requirements of the task. (b) The neural network architecture of BiomedGPT, which includes bidirectional encoder blocks and autoregressive decoder blocks.

应该注意的是,在微调和推理阶段,BiomedGPT的输入分辨率可以根据任务的具体要求灵活调整。(b) BiomedGPT的神经网络结构,包括双向编码器块和自回归解码器块。

The number of blocks varies for different model scales.Extended Data Fig. 3 The graphical illustrations of the key components in BiomedGPT.(a) Head-.

块的数量因模型比例的不同而不同。扩展数据图3 BiomedGPT中关键组件的图形说明。(a) 头部-。

Nat Med (2024). https://doi.org/10.1038/s41591-024-03185-2Download citationReceived: 29 January 2024Accepted: 10 July 2024Published: 07 August 2024DOI: https://doi.org/10.1038/s41591-024-03185-2Share this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy to clipboard.

Nat Med(2024)。https://doi.org/10.1038/s41591-024-03185-2Download引文接收日期:2024年1月29日接受日期:2024年7月10日发布日期:2024年8月7日OI:https://doi.org/10.1038/s41591-024-03185-2Share本文与您共享以下链接的任何人都可以阅读此内容:获取可共享链接对不起,本文目前没有可共享的链接。复制到剪贴板。

Provided by the Springer Nature SharedIt content-sharing initiative

由Springer Nature SharedIt内容共享计划提供