Turn off MathJax
Article Contents
Qing Li, Lei Li, Yu Li. Developing ChatGPT for biology and medicine: a complete review of biomedical question answering. Biophysics Reports. doi: 10.52601/bpr.2024.240004
Citation: Qing Li, Lei Li, Yu Li. Developing ChatGPT for biology and medicine: a complete review of biomedical question answering. Biophysics Reports. doi: 10.52601/bpr.2024.240004

Developing ChatGPT for biology and medicine: a complete review of biomedical question answering

doi: 10.52601/bpr.2024.240004
More Information
  • Corresponding author: liyu@cse.cuhk.edu.hk (Y. Li)
  • Received Date: 15 January 2024
  • Accepted Date: 19 February 2024
  • Available Online: 28 March 2024
  • ChatGPT explores a strategic blueprint of question answering (QA) to deliver medical diagnoses, treatment recommendations, and other healthcare support. This is achieved through the increasing incorporation of medical domain data via natural language processing (NLP) and multimodal paradigms. By transitioning the distribution of text, images, videos, and other modalities from the general domain to the medical domain, these techniques have accelerated the progress of medical domain question answering (MDQA). They bridge the gap between human natural language and sophisticated medical domain knowledge or expert-provided manual annotations, handling large-scale, diverse, unbalanced, or even unlabeled data analysis scenarios in medical contexts. Central to our focus is the utilization of language models and multimodal paradigms for medical question answering, aiming to guide the research community in selecting appropriate mechanisms for their specific medical research requirements. Specialized tasks such as unimodal-related question answering, reading comprehension, reasoning, diagnosis, relation extraction, probability modeling, and others, as well as multimodal-related tasks like vision question answering, image captioning, cross-modal retrieval, report summarization, and generation, are discussed in detail. Each section delves into the intricate specifics of the respective method under consideration. This paper highlights the structures and advancements of medical domain explorations against general domain methods, emphasizing their applications across different tasks and datasets. It also outlines current challenges and opportunities for future medical domain research, paving the way for continued innovation and application in this rapidly evolving field. This comprehensive review serves not only as an academic resource but also delineates the course for future probes and utilization in the field of medical question answering.

  • Qing Li, Lei Li and Yu Li declare that they have no conflict of interest.
    This article does not contain any studies with human or animal subjects performed by any of the authors.
    Qing Li and Lei Li contributed equally to this work.

  • loading
  • Abacha BA, Hasan SA, Datla VV, Demner-Fushman D, Müller H (2019) Vqa-med: overview of the medical visual question answering task at imageclef 2019. Proceedings of Conference and Labs of the Evaluation Forum. https://ceur-ws.org/Vol-2380/paper_272.pdf
    Alayrac JB, Donahue J, Luc P, Miech A, Barr I, Hasson Y, Lenc K, Mensch A, Millican K, Reynolds M, Ring R, Rutherford E, Cabi S, Han T, Gong Z, Samangooei S, Moteiro M, Menick J, Borgeaud S, Brock A, Nematzadeh A, Sharifzadeh S, Binkowski M, Barreira R, Vinyals O, Zisserman A (2022) Flamingo: a visual language model for few-shot learning. Adv Neural Inf Process Syst 35: 23716−23736
    Bosma M, Mishra G, Roberts A, Barham P, Chung HW, Sutton C, Gehrmann S, Schuh P, Shi K, Tsvyashchenko S, Maynez J, Rao A, Barnes P, Tay Y, Shazeer N, Prabhakaran V, Reif E, Du N, Hutchinson B, Pope R, Bradbury J, Austin J, Isard M, Gur-Ari G, Yin P, Duke T, Levskaya A, Ghemawat S, Dev S, Michalewski H, Garcia X, Misra V, Robinson K, Fedus L, Zhou D, Ippolito D, Luan D, Lim H, Zoph B, Spiridonov A, Sepassi R, Dohan D, Agrawal S, Omernick M, Dai AM, Pillai TS, Pellat M, Lewkowycz A, Moreira E, Child R, Polozov O, Lee K, Zhou Z, Wang X, Saeta B, Diaz M, Firat O, Catasta M, Wei J, Meier-Hellstern K, Eck D, Dean J, Petrov S, Fiedel N (2023) Palm: Scaling language modeling with pathways. J Mach Learn Res 24(240): 1−113
    Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33: 1877−1901
    Cai X, Liu S, Han J, Yang L, Liu Z, Liu T (2021) ChestXRayBERT: a pretrained language model for chest radiology report summarization. IEEE Trans Multimed 25: 845−855 doi: 10.1109/TMM.2021.3132724
    Chen J, Zhu D, Shen X, Li X, Liu Z, Zhang P, Krishnamoorthi R, Chandra V, Xiong Y, Elhoseiny M (2023a) Minigpt-v2: large language model as a unified interface for vision-language multi-task learning. arXiv: 2310.09478. https://doi.org/10.48550/arXiv.2310.09478
    Chen YC, Li L, Yu L, El Kholy A, Ahmed F, Gan Z, Liu J (2020) Uniter: universal image-text representation learning. European conference on computer vision. pp. 104−120
    Chen Z, Cano AH, Romanou A, Bonnet A, Matoba K, Salvi F, Pagliardini M, Fan S, Köpf A, Mohtashami A, Sallinen A, Sakhaeirad A, Swamy V, Krawczuk I, Bayazit D, Marmet A, Montariol S, Hartley MA, Jaggi M, Bosselut A (2023b) MEDITRON-70B: scaling medical pretraining for large language model. arXiv: 2311.16079. https://doi.org/10.48550/arXiv.2311.16079
    Cheng J, Ye J, Deng Z, Chen J, Li T, Wang H, Su Y, Huang Z, Chen J, Jiang L, Sun H, He J, Zhang S, Zhu M, Qiao Y (2023) SAM-Med2D. arXiv: 2308.16184. https://doi.org/10.48550/arXiv.2308.16184
    Cui Y, Che W, Liu T, Qin B, Wang S, Hu G (2020) Revisiting pre-trained models for Chinese natural language processing. arXiv: 2004.13922. https://doi.org/10.48550/arXiv.2004.13922
    Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv: 1810.04805. https://doi.org/10.48550/arXiv.1810.04805
    Dong L, Yang N, Wang W, Wei F, Liu X, Wang Y, Gao J, Zhou M, Hon HW (2019) Unified language model pre-training for natural language understanding and generation. Proceedings of the 33rd International Conference on Neural Information Processing Systems. pp. 13063–13075
    Driess D, Xia F, Sajjadi MS, Lynch C, Chowdhery A, Ichter B, Wahid A, Tompson J, Vuong Q, Yu T, Huang W, Chebotar Y, Sermanet P, Duckworth D, Levine S, Vanhoucke V, Hausman K, Toussaint M, Greff K, Zeng A, Mordatch I, Florence P (2023) PaLM-E: an embodied multimodal language model. arXiv: 2303.03378. https://doi.org/10.48550/arXiv.2303.03378
    Du N, Huang Y, Dai AM, Tong S, Lepikhin D, Xu Y, Krikun M, Zhou Y, Yu AW, Firat O, Zoph B, Fedus L, Bosma M, Zhou Z, Wang T, Wang YE, Webster K, Pellat M, Robinson K, Meier-Hellstern K, Duke T, Dixon L, Zhang K, Le QV, Wu Y, Chen Z, Cui C (2022) Glam: efficient scaling of language models with mixture-of-experts. Proceedings of the 39th International Conference on Machine Learning. pp. 5547−5569
    Eslami S, de Melo G, Meinel C (2021) Does CLIP benefit visual question answering in the medical domain as much as it does in the general domain? arXiv: 2112.13906. https://doi.org/10.48550/arXiv.2112.13906
    Gardères F, Ziaeefard M, Abeloos B, Lecue F (2020) Conceptbert: concept-aware representation for visual question answering. Findings of the Association for Computational Linguistics: EMNLP 2020. pp. 489−498
    Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, Naumann T, Gao J, Poon H (2021) Domain-specific language model pretraining for biomedical natural language processing. ACM Trans ComputHealthc 3(1): 1−23
    Hu X, Gu L, Kobayashi K, An Q, Chen Q, Lu Z, Su C, Harada T, Zhu Y (2023) Interpretable medical image visual question answering via multi-modal relationship graph learning. arXiv: 2302.09636. https://doi.org/10.48550/arXiv.2302.09636
    Kanakarajan KR, Kundumani B, Sankarasubbu M (2021) BioELECTRA: pretrained biomedical text encoder using discriminators. Proceedings of the 20th Workshop on Biomedical Language Processing. pp. 143−154
    Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, Xiao T, Whitehead S, Berg AC, Lo WY, Dolla´r P, and Girshick R (2023) Segment anything. arXiv: 2304.02643. https://doi.org/10.48550/arXiv.2304.02643
    Kim S, Joo SJ, Kim D, Jang J, Ye S, Shin J, Seo M (2023) The COT COLLECTION: improving zero-shot and few-shot learning of language models via chain-of-thought fine-tuning. arXiv: 2305.14045. https://doi.org/10.48550/arXiv.2305.14045
    Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: a lite bert for self-supervised learning of language representations. arXiv: 1909.11942. https://doi.org/10.48550/arXiv.1909.11942
    Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2019) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. arXiv: 1901.08746. https://doi.org/10.48550/arXiv.1901.08746
    Li C, Wong C, Zhang S, Usuyama N, Liu H, Yang J, Naumann T, Poon H, Gao J (2023a) LLaVA-Med: large language-and-vision assistant for biomedicine. arXiv: 2304.04342. https://doi.org/10.48550/arXiv.2304.04342
    Liévin V, Hother CE, Motzfeldt AG, Winther O (2022) Can large language models reason about medical questions? arXiv: 2207.08143. https://doi.org/10.48550/arXiv.2207.08143
    Li P, Liu G, Tan L, Liao J, Zhong S (2023b) Self-supervised vision-language pretraining for medial visual question answering. arXiv: 2211.13594. https://doi.org/10.48550/arXiv.2211.13594
    Liu Y, Wang Z, Xu D, Zhou L (2023) Q2ATransformer: Improving medical vqa via an answer querying decoder. arXiv: 2304.01611. https://doi.org/10.48550/arXiv.2304.01611
    Lu J, Batra D, Parikh D, Lee S (2019) Vilbert: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. arXiv: 1908.02265. https://doi.org/10.48550/arXiv.1908.02265
    Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H, and Liu TY (2022) BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform 23(6): bbac409. https://doi.org/10.1093/bib/bbac409
    Luo Y, Zhang J, Fan S, Yang K, Wu Y, Qiao M, Nie Z (2023) BioMedGPT: open multimodal generative pre-trained transformer for biomedicine. arXiv: 2308.09442. https:// doi.org/10.48550/arXiv.2308.09442
    Ma L, Han J, Wang Z, Zhang D (2023) CephGPT-4: an interactive multimodal cephalometric measurement and diagnostic system with visual large language model. arXiv: 2307.07518. https://doi.org/10.48550/arXiv.2307.07518
    Manmadhan S, Kovoor BC (2023) Parallel multi-head attention and term-weighted question embedding for medical visual question answering. Mult Tools Appl 82: 34937−34958 doi: 10.1007/s11042-023-14981-2
    Moor M, Huang Q, Wu S, Yasunaga M, Zakka C, Dalmia Y, Reis EP, Rajpurkar P, Leskovec J (2023) Med-Flamingo: a multimodal medical few-shot learner. arXiv: 2307.15189. https://doi.org/10.48550/arXiv.2307.15189
    Nori H, King N, McKinney SM, Carignan D, Horvitz E (2023) Capabilities of GPT-4 on medical challenge problems. arXiv: 2303.13375. https://doi.org/10.48550/arXiv.2303.13375
    OpenAI (2022) Introducing ChatGPT. https://openai.com/blog/chatgpt
    Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Askell A, Welinder P, Christiano P, Leike J, and Lowe R (2022) Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst 35: 27730−27744
    Radford A, Kim J W, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, and Sutskever I (2021) Learning transferable visual models from natural language supervision. Proceedings of the 38th International Conference on Machine Learning. pp. 8748−8763
    Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8): 9. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
    Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(1): 5485−5551
    Ramesh A, Pavlov M, Goh G, Gray S, Voss C, Radford A, Sutskever I (2021) Zero-shot text-to-image generation. Proceedings of the 38th International Conference on Machine Learning. pp. 8821−8831
    Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B (2022) High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684−10695
    Scao TL, Fan A, Akiki C, Pavlick E, Ili ́c S, Hesslow D, Castagné R, Luccioni AS, Yvon F, Gallé M, Tow J, Rush AM, Biderman S, Webson A, Ammanamanchi PS, Wang T, Sagot B, Muennighoff N, Moral AV, Ruwase O, Bawden R, Bekman S, Major AM, Wolf T, Beltagy I, Nguyen H, Saulnier L, Tan S, Suarez PO, Sanh V, Lauren ̧con H, Jernite Y, Launay J, Mitchell M, Raffel C (2022) BLOOM: a 176b-parameter open-access multilingual language model. arXiv: 2211.05100. https://doi.org/10.48550/arXiv.2211.05100
    Sharma D, Purushotham S, Reddy CK (2021) MedFuseNet: an attention-based multimodal deep learning model for visual question answering in the medical domain. Sci Rep 11(1):19826. https://doi.org/10.1038/s41598-021-98390-1
    Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, Payne P, Seneviratne M, Gamble P, Kelly C, Schärli N, Chowdhery A, Mansfield P, Agüera y Arcas B, Webster D, Corrado GS, Matias Y, Chou K, Gottweis J, Tomasev N, Liu Y, Rajkomar A, Barral J, Semturs C, Karthikesalingam A, Natarajan V (2022) Large language models encode clinical knowledge. arXiv: 2212.13138. https://doi.org/10.48550/arXiv.2212.13138
    Singhal K, Tu T, Gottweis J, Sayres R, Wulczyn E, Hou L, Clark K, Pfohl S, Cole-Lewis H, Neal D, Schaekermann M, Wang A, Amin M, Lachgar S, Mansfield P, Prakash S, Green B, Dominowska E, Aguera y Arcas B, Tomasev N, Liu Y, Wong R, Semturs C, Mahdavi SS, Barral J, Webster D, Corrado GS, Matias Y, Azizi S, Karthikesalingam A, Natarajan V (2023) Towards expert-level medical question answering with large language models. arXiv: 2305.09617. https://doi.org/10.48550/arXiv.2305.09617
    Tan H, Bansal M (2019) Lxmert: learning cross-modality encoder representations from transformers. arXiv: 1908.07490. https://doi.org/10.48550/arXiv.1908.07490
    Taylor R, Kardas M, Cucurull G, Scialom T, Hartshorn A, Saravia E, Poulton A, Kerkez V, Stojnic R (2022) Galactica: a large language model for science. arXiv: 2211.09085. https://doi.org/10.48550/arXiv.2211.09085
    Thawkar O, Shaker A, Mullappilly SS, Cholakkal H, Anwer RM, Khan S, Laaksonen J, Khan FS (2023) XrayGPT: chest radiographs summarization using large medical vision-language models. arXiv: 2306.07971. https://doi.org/10.48550/arXiv.2306.07971
    Thoppilan R, De Freitas D, Hall J, Shazeer N, Kulshreshtha A, Cheng HT, Jin A, Bos T, Baker L, Du Y, Li Y, Lee H, Zheng HS, Ghafouri A, Menegali M, Huang Y, Krikun M, Lepikhin D, Qin J, Chen D, Xu Y, Chen Z, Roberts A, Bosma M, Zhao V, Zhou Y, Chang CC, Krivokon I, Rusch W, Pickett M, Srinivasan P, Man L, Meier-Hellstern K, Morris MR, Doshi T, Delos Santos R, Duke T, Soraker J, Zevenbergen B, Prabhakaran V, Diaz M, Hutchinson B, Olson K, Molina A, Hoffman-John E, Lee J, Aroyo L, Rajakumar R, Butryna A, Lamm M, Kuzmina V, Fenton J, Cohen A, Bernstein R, Kurzweil R, Aguera-Arcas B, Cui C, Croak M, Chi E, Le Q (2022) Lamda: language models for dialog applications. arXiv: 2201.08239. https://doi.org/10.48550/arXiv.2201.08239
    Tian Y, Gan R, Song Y, Zhang J, Zhang Y (2023) CHIMED-GPT: a chinese medical large language model with full training regime and better alignment to human preferences. arXiv: 2311.06025. https://doi.org/10.48550/arXiv.2311.06025
    Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, Rodriguez A, Joulin A, Grave E, and Lample G (2023) Llama: open and efficient foundation language models. arXiv: 2302.13971. https://doi.org/10.48550/arXiv.2302.13971
    Tu T, Azizi S, Driess D, Schaekermann M, Amin M, Chang PC, Carroll A, Lau C, Tanno R, Ktena I, Mustafa B, Chowdhery A, Liu Y, Kornblith S, Fleet D, Mansfield P, Prakash S, Wong R, Virmani S, Semturs C, Mahdavi SS, Green B, Dominowska E, Aguera y Arcas B, Barral J, Webster D, Corrado GS, Matias Y, Singhal K, Florence P, Karthikesalingam A, Natarajan V (2023) Towards generalist biomedical AI. arXiv: 2307.14334. https://doi.org/10.48550/arXiv.2307.14334
    Wang G, Yang G, Du Z, Fan L, Li X (2023a) ClinicalGPT: large language models finetuned with diverse medical data and comprehensive evaluation. arXiv: 2306.09968. https://doi.org/10.48550/arXiv.2306.09968
    Wang Z, Wu Z, Agarwal D, Sun J (2023b) MedCLIP: contrastive learning from unpaired medical images and text. arXiv: 2210.10163. https://doi.org/10.48550/arXiv.2210.10163
    Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, Le Q, and Zhou D (2022) Chain-of-thought prompting elicits reasoning in large language models. Adv Neural Inf Process Syst 35: 24824−24837
    Wu C, Lin W, Zhang X, Zhang Y, Wang Y, Xie W (2023a) PMC-LLaMA: an open-source language model for medical applications. arXiv: 2304.14454. https://doi.org/10.48550/arXiv.2304.14454
    Wu S, Fei H, Qu L, Ji W, Chua TS (2023b) NExT-GPT: any-to-any multimodal LLM. arXiv: 2309.05519. https://doi.org/10.48550/arXiv.2309.05519
    Wu Y, Wang S, Yang H, Zheng T, Zhang H, Zhao Y, Qin B (2023c) An early evaluation of gpt-4v(ision). arXiv: 2310.16534. https://doi.org/10.48550/arXiv.2310.16534
    Xu H, Ghosh G, Huang PY, Arora P, Aminzadeh M, Feichtenhofer C, Metze F, Zettlemoyer L (2021). Vlm: task-agnostic video-language model pre-training for video understanding. arXiv: 2105.09996. https://doi.org/10.48550/arXiv.2105.09996
    Xu M (2023) MedicalGPT: training medical GPT models. https://github.com/shibing624/MedicalGPT
    Yasunaga M, Bosselut A, Ren H, Zhang X, Manning CD, Liang PS, Leskovec J (2022a) Deep bidirectional language-knowledge graph pretraining. Adv Neural Inf Process Syst 35: 37309−37323
    Yasunaga M, Leskovec J, Liang P (2022b) LinkBERT: pretraining language models with document links. arXiv: 2203.15827. https://doi.org/10.48550/arXiv.2203.15827
    Ye F, Liu G, Wu X, Wu L (2023) AltDiffusion: a multilingual text-to-image diffusion model. arXiv: 2308.09991. https://doi.org/10.48550/arXiv.2308.09991
    Yu Z, Yu J, Cui Y, Tao D, Tian Q (2019) Deep modular co-attention networks for visual question answering. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6281−6290
    Zhan LM, Liu B, Fan L, Chen J, Wu XM (2020) Medical visual question answering via conditional reasoning. In Proceedings of the 28th ACM International Conference on Multimedia. pp. 2345−2354
    Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, Dewan C, Diab M, Li X, Lin XV, Mihaylov T, Ott M, Shleifer S, Simig D, Koura PS, Sridhar A, Wang T, Zettlemoyer L (2022) OPT: open pre-trained transformer language models. arXiv: 2205.01068. https://doi.org/10.48550/arXiv.2205.01068
    Zhang S, Xu Y, Usuyama N, Bagga J, Tinn R, Preston S, Rao R, Wei M, Valluri N, Wong C, Lungren MP, Naumann T, Poon H (2023) Large-scale domain-specific pretraining for biomedical vision-language processing. arXiv: 2303.00915. https://doi.org/10.48550/arXiv.2303.00915
    Zhao H, Cai Z, Si S, Ma X, An K, Chen L, Liu Z, Wang S, Han W, Chang B (2023) MMICL: empowering vision-language model with multi-modal in-context learning. arXiv: 2309.07915. https://doi.org/10.48550/arXiv.2309.07915
    Zhu D, Chen J, Shen X, Li X, Elhoseiny M (2023) MiniGPT-4: enhancing vision-language understanding with advanced large language models. arXiv: 2304.10592. https://doi.org/10.48550/arXiv.2304.10592
  • 加载中


    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(3)  / Tables(3)

    Article Metrics

    Article views (116) PDF downloads(6) Cited by()
    Proportional views


    DownLoad:  Full-Size Img  PowerPoint