Latest

Monday, May 6, 2024

Med-Gemini : Googles AI Models specifically designed for Medical Diagnosis and Clinical Reasoning



Med-Gemini : Googles AI Models specifically designed for Medical Diagnosis and Clinical Reasoning


Med-Gemini-  Google introduce a new family of multimodal medical models, it is highly capable multimodal medical models and built upon Gemini. Med-Gemini demonstrates important advancements in clinical reasoning, multimodal, and long-context capabilities. They are further fine-tuned to make use of web search for current information and can be customized to novel medical modalities through the use of modality-specific encoders.


Artificial intelligence (AI) in medicine is revolutionizing how Medical lab technicians handle complex tasks such as diagnosing patients, planning treatments using latest research. Google DeepMind Introduces Med-Gemini, its latest medical AI marvel, Tailored specifically for healthcare. Med-Gemini an expert in interpreting the complexity of medical dialogues with accuracy of diagnosis. Advanced AI models promise to enhance healthcare by increasing accuracy and efficiency. AI model can process and interpret effectively the Large volume of medical data, such as images, videos and electronic health records, Modern medical practices, AI can comprehend these carefully and accurately. 

This post is based on A per paper published by Google deepmind Click here


Challenge for Medical AI


Medicine is a multifaceted endeavor. A clinician’s day-to-day work involves patient consultations, where clear communication of diagnoses, treatment plans, and empathy are essential for building trust. Complex cases necessitate deeper understanding of the patient’s history within the electronic medical record, along with multimodal reasoning from medical images and other diagnostics. To guide their decisions under uncertainty, clinicians must stay abreast of the latest medical information from a wide variety of authoritative sources that can range from research publications to procedural videos.

To be Excellent in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with their strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini 1.0 and Gemini 1.5, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly integrate the use of web search.


Before Med Gemini Era 

Although artificial intelligence (AI) systems can assist individual medical tasks and demonstrate early promise towards multimodal multi-task "generalist" medical uses, the development of more sophisticated reasoning, multimodal, and long-context understanding capabilities would enable significantly more intuitive and helpful assistive tools for clinicians and patients alike. Google has already launched AI in Healthcare with multiple models in the field like Med-PaLM 2, AlphaFold, Flan-PaLM etc. 

Medically fine-tuned LLMs can also provide high-quality long-form answers, Despite these promising results, there are considerable opportunities for improvement in performance.  Additionally, their ability to handle complex multimodal medical data is currently limited.


Med Gemini 


Med_Gemini2


The paper stated that, "Med-Gemini inherits Gemini's foundational capabilities in language and conversations, multimodal understanding, and long-context reasoning."

The Gemini models, as detailed in the Gemini 1.0 and 1.5 technical reports  are a new generation of highly capable multimodal models with novel foundational capabilities that have the potential to address some of these key challenges for medical AI. The models are transformer decoder models enhanced with innovations in architecture, optimization and training data, enabling them to exhibit strong capabilities across various modalities including images, audio, video, and text. The recent addition of the mixture of experts architecture allows the Gemini models to efficiently scale and reason over significantly longer and more complex data at inference time.


Web search through self-training

Med-Gemini inherits Gemini's foundational capabilities in language and conversations, multimodal understanding, and long-context reasoning. For language-based tasks, we enhance the models’ ability to use web search through self-training and introduce an inference time uncertainty-guided search strategy within an agent framework. This combination enables the model to provide more factually accurate, reliable, and nuanced results for complex clinical reasoning tasks. This leads to the state-ofthe-art (SoTA) performance of 91.1% accuracy on MedQA (USMLE) (Jin et al., 2021) surpassing prior Med-PaLM 2 models by 4.6%.

Med_Gemini3


Med Gemini Evaluation and Performance Benchmark

Med-Gemini has been evaluated on 14 medical benchmarks spanning text, multimodal and long-context applications, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark. evaluation benchmarks spanning (1) text-based reasoning, (2) multimodal, and (3) long-context processing tasks, demonstrating Med-Gemini’s performance across a wide range of capabilities in medicine.

Med_Gemini4


Evaluation of advanced reasoning on text-based tasks

Med-Gemini has been evaluated on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them. On the popular MedQA (USMLE) benchmark, the best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks, including NEJM Image Challenges and MMMU (health & medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. Med-Gemini also demonstrates effectiveness in long-context capabilities, surpassing prior methods in tasks such as needle-in-a-haystack retrieval from long de-identified health records and medical video question answering. It even surpasses human experts on tasks like medical text summarization, showing promising potential for multimodal medical dialogue, research, and education.


Beyond these benchmarks-  it has been further evaluated,  Med-Gemini-M 1.0 on three challenging use cases that require long-form text generation. To this end, they conducted an expert evaluation where a panel of clinicians compare the responses of model


• Medical summarization: Generate an structured report that patients receive at the end of a medical appointment to summarize and guide their care journeys.

• Referral letter generation: Generate a referral letter to another healthcare provider given a de-identified outpatient medical note that contains a recommendation for a referral.

• Medical simplification: Generate a plain language summary (PLS) in plain English from a medical systematic review.


Evaluation of long-context capabilities on video and EHR task

It consider three tasks to demonstrate Med-Gemini-M 1.5’s ability to seamlessly understand and reason over long context medical information

• Long unstructured EHR notes understanding

• Medical instructional video QA

• Critical view of safety (CVS) assessment of surgical video


Demonstrate the effectiveness of Med-Gemini’s long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization and referral letter generation, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. 


Example of an input prompt, along with the retrieved search results 

Below we provide a MedQA-RS example of an input prompt, along with the retrieved search results and an example of a generated CoT, which is then used to further fine-tune Med-Gemini-L 1.0. For brevity, we only display one representative search result in the example below.


Med_Gemini5

                Example of an input prompt and response          



Med_gemini6



Video dialogue example


Med_gemini7

Feedback from surgeon

Med_gemini8

Med_gemini9

Conclusion


The advances of Med-Gemini have great promise, Med Gemini Test results offer compelling evidence for the promise of Med-Gemini in many areas of medicine, acknowledge the role of AI systems as assistive tools for expert clinicians, but it remains crucial to carefully consider the nuances of the medical field,  and conduct rigorous validation before real-world deployments at scale. medical is safety-critical domain. “The unique nature of medical data and the critical need for safety demand specialized prompting, fine-tuning, or potentially both along with careful alignment of these models,” the paper explained. 





No comments:

Post a Comment