Improve model card

#3
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +40 -34
README.md CHANGED
@@ -1,10 +1,12 @@
1
  ---
2
- license: apache-2.0
3
  base_model:
4
  - Qwen/Qwen2-VL-2B-Instruct
5
  language:
6
  - en
 
 
7
  ---
 
8
  <div align="center">
9
  <h1>
10
  MedVLM-R1
@@ -15,10 +17,17 @@ language:
15
  <a href="https://arxiv.org/abs/2502.19634" target="_blank">Paper</a>
16
  </div>
17
 
18
- # <span id="Start">Introduction</span>
19
  MedVLM-R1 is a medical Vision-Language Model built upon [Qwen2-VL-2B](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) and fine-tuned using the [GRPO](https://arxiv.org/abs/2402.03300) reinforcement learning framework. Trained on 600 MRI VQA samples from the [HuatuoGPT-Vision dataset](https://huggingface.co/datasets/FreedomIntelligence/Medical_Multimodal_Evaluation_Data), MedVLM-R1 excels in out-of-distribution performance on CT and X-ray VQA tasks. It also demonstrates explicit medical reasoning capabilities beyond merely providing final answers, ensuring greater interpretability and trustworthiness in clinical applications.
20
 
21
- # <span id="Start">Quick Start</span>
 
 
 
 
 
 
 
22
 
23
  ### 1. Load the model
24
  ```python
@@ -45,28 +54,8 @@ temp_generation_config = GenerationConfig(
45
  pad_token_id=151643,
46
  )
47
  ```
48
- ### 2. Load the VQA Data
49
- Pick one of the following examples. These are samples from [OmniMedVQA](https://huggingface.co/datasets/foreverbeliever/OmniMedVQA) data and are bundled by [HuatuoGPT-Vision](https://huggingface.co/datasets/FreedomIntelligence/Medical_Multimodal_Evaluation_Data).
50
-
51
- ```python
52
- question = {"image": ['images/successful_cases/mdb146.png'], "problem": "What content appears in this image?\nA) Cardiac tissue\nB) Breast tissue\nC) Liver tissue\nD) Skin tissue", "solution": "B", "answer": "Breast tissue"}
53
-
54
- question = {"image": ["images/successful_cases/person19_virus_50.jpeg"], "problem": "What content appears in this image?\nA) Lungs\nB) Bladder\nC) Brain\nD) Heart", "solution": "A", "answer": "Lungs"}
55
-
56
- question = {"image":["images/successful_cases/abd-normal023599.png"],"problem":"Is any abnormality evident in this image?\nA) No\nB) Yes.","solution":"A","answer":"No"}
57
-
58
- question = {"image":["images/successful_cases/foot089224.png"],"problem":"Which imaging technique was utilized for acquiring this image?\nA) MRI\nB) Electroencephalogram (EEG)\nC) Ultrasound\nD) Angiography","solution":"A","answer":"MRI"}
59
-
60
- question = {"image":["images/successful_cases/knee031316.png"],"problem":"What can be observed in this image?\nA) Chondral abnormality\nB) Bone density loss\nC) Synovial cyst formation\nD) Ligament tear","solution":"A","answer":"Chondral abnormality"}
61
-
62
- question = {"image":["images/successful_cases/shoulder045906.png"],"problem":"What can be visually detected in this picture?\nA) Bone fracture\nB) Soft tissue fluid\nC) Blood clot\nD) Tendon tear","solution":"B","answer":"Soft tissue fluid"}
63
-
64
- question = {"image":["images/successful_cases/brain003631.png"],"problem":"What attribute can be observed in this image?\nA) Focal flair hyperintensity\nB) Bone fracture\nC) Vascular malformation\nD) Ligament tear","solution":"A","answer":"Focal flair hyperintensity"}
65
-
66
- question = {"image":["images/successful_cases/mrabd005680.png"],"problem":"What can be observed in this image?\nA) Pulmonary embolism\nB) Pancreatic abscess\nC) Intraperitoneal mass\nD) Cardiac tamponade","solution":"C","answer":"Intraperitoneal mass"}
67
- ```
68
- ### 3. Run the inference
69
 
 
70
  ```python
71
  QUESTION_TEMPLATE = """
72
  {Question}
@@ -75,7 +64,30 @@ QUESTION_TEMPLATE = """
75
  2. Then provide the correct single-letter choice (A, B, C, D,...) inside <answer>...</answer> tags.
76
  3. No extra information or text outside of these tags.
77
  """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
 
 
 
 
79
  message = [{
80
  "role": "user",
81
  "content": [{"type": "image", "image": f"file://{question['image'][0]}"}, {"type": "text","text": QUESTION_TEMPLATE.format(Question=question['problem'])}]
@@ -101,20 +113,14 @@ output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=
101
  print(f'model output: {output_text[0]}')
102
 
103
  ```
 
104
  ### Failure cases
105
  MedVLM-R1's reasoning fails when testing on more difficult VQA examples. Although it can output correct choices in the following examples, the reasoning of them is either superficial or contradictory.
106
  ```python
107
- question = {"image":["images/failure_cases/mrabd021764.png"],"problem":"What is the observable finding in this image?\nA) Brain lesion\nB) Intestinal lesion\nC) Gallbladder lesion\nD) Pancreatic lesion","solution":"D","answer":"Pancreatic lesion"}
108
-
109
- question = {"image":["images/failure_cases/spine010017.png"],"problem":"What can be observed in this image?\nA) Cystic lesions\nB) Fractured bones\nC) Inflamed tissue\nD) Nerve damage","solution":"A","answer":"Cystic lesions"}
110
-
111
- question = {"image":["images/failure_cases/ankle056120.png"],"problem":"What attribute can be observed in this image?\nA) Bursitis\nB) Flexor pathology\nC) Tendonitis\nD) Joint inflammation","solution":"B","answer":"Flexor pathology"}
112
-
113
- question = {"image":["images/failure_cases/lung067009.png"],"problem":"What is the term for the anomaly depicted in the image?\nA) Pulmonary embolism\nB) Airspace opacity\nC) Lung consolidation\nD) Atelectasis","solution":"B","answer":"Airspace opacity"}
114
-
115
  ```
116
 
117
- # <span id="Start">Acknowledgement</span>
118
  We thank all machine learning / medical workers for making public codebase / datasets available to the community 🫶🫶🫶
119
 
120
  If you find our work helpful, feel free to give us a cite.
@@ -126,4 +132,4 @@ If you find our work helpful, feel free to give us a cite.
126
  journal={arXiv preprint arXiv:2502.19634},
127
  year={2025}
128
  }
129
- ```
 
1
  ---
 
2
  base_model:
3
  - Qwen/Qwen2-VL-2B-Instruct
4
  language:
5
  - en
6
+ license: apache-2.0
7
+ pipeline_tag: image-text-to-text
8
  ---
9
+
10
  <div align="center">
11
  <h1>
12
  MedVLM-R1
 
17
  <a href="https://arxiv.org/abs/2502.19634" target="_blank">Paper</a>
18
  </div>
19
 
20
+ # Introduction
21
  MedVLM-R1 is a medical Vision-Language Model built upon [Qwen2-VL-2B](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) and fine-tuned using the [GRPO](https://arxiv.org/abs/2402.03300) reinforcement learning framework. Trained on 600 MRI VQA samples from the [HuatuoGPT-Vision dataset](https://huggingface.co/datasets/FreedomIntelligence/Medical_Multimodal_Evaluation_Data), MedVLM-R1 excels in out-of-distribution performance on CT and X-ray VQA tasks. It also demonstrates explicit medical reasoning capabilities beyond merely providing final answers, ensuring greater interpretability and trustworthiness in clinical applications.
22
 
23
+ **Paper Abstract:**
24
+
25
+ Reasoning is a critical frontier for advancing medical image analysis, where transparency and trustworthiness play a central role in both clinician trust and regulatory approval. Although Medical Visual Language Models (VLMs) show promise for radiological tasks, most existing VLMs merely produce final answers without revealing the underlying reasoning. To address this gap, we introduce MedVLM-R1, a medical VLM that explicitly generates natural language reasoning to enhance transparency and trustworthiness. Instead of relying on supervised fine-tuning (SFT), which often suffers from overfitting to training distributions and fails to foster genuine reasoning, MedVLM-R1 employs a reinforcement learning framework that incentivizes the model to discover human-interpretable reasoning paths without using any reasoning references. Despite limited training data (600 visual question answering samples) and model parameters (2B), MedVLM-R1 boosts accuracy from 55.11% to 78.22% across MRI, CT, and X-ray benchmarks, outperforming larger models trained on over a million samples. It also demonstrates robust domain generalization under out-of-distribution tasks. By unifying medical image analysis with explicit reasoning, MedVLM-R1 marks a pivotal step toward trustworthy and interpretable AI in clinical practice.
26
+
27
+
28
+ Github repo: https://github.com/jzpan/MedVLM-R1
29
+
30
+ # Quick Start
31
 
32
  ### 1. Load the model
33
  ```python
 
54
  pad_token_id=151643,
55
  )
56
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
+ ### 2. Question Template
59
  ```python
60
  QUESTION_TEMPLATE = """
61
  {Question}
 
64
  2. Then provide the correct single-letter choice (A, B, C, D,...) inside <answer>...</answer> tags.
65
  3. No extra information or text outside of these tags.
66
  """
67
+ ```
68
+
69
+ ### 3. Load the VQA Data
70
+ Pick one of the following examples. These are samples from [OmniMedVQA](https://huggingface.co/datasets/foreverbeliever/OmniMedVQA) data and are bundled by [HuatuoGPT-Vision](https://huggingface.co/datasets/FreedomIntelligence/Medical_Multimodal_Evaluation_Data).
71
+
72
+ ```python
73
+ question = {"image": ['images/successful_cases/mdb146.png'], "problem": "What content appears in this image?
74
+ A) Cardiac tissue
75
+ B) Breast tissue
76
+ C) Liver tissue
77
+ D) Skin tissue", "solution": "B", "answer": "Breast tissue"}
78
+
79
+ question = {"image": ["images/successful_cases/person19_virus_50.jpeg"], "problem": "What content appears in this image?
80
+ A) Lungs
81
+ B) Bladder
82
+ C) Brain
83
+ D) Heart", "solution": "A", "answer": "Lungs"}
84
+
85
+ # ... other example questions
86
+ ```
87
 
88
+ ### 4. Run the inference
89
+
90
+ ```python
91
  message = [{
92
  "role": "user",
93
  "content": [{"type": "image", "image": f"file://{question['image'][0]}"}, {"type": "text","text": QUESTION_TEMPLATE.format(Question=question['problem'])}]
 
113
  print(f'model output: {output_text[0]}')
114
 
115
  ```
116
+
117
  ### Failure cases
118
  MedVLM-R1's reasoning fails when testing on more difficult VQA examples. Although it can output correct choices in the following examples, the reasoning of them is either superficial or contradictory.
119
  ```python
120
+ # ... failure case examples
 
 
 
 
 
 
 
121
  ```
122
 
123
+ # Acknowledgement
124
  We thank all machine learning / medical workers for making public codebase / datasets available to the community 🫶🫶🫶
125
 
126
  If you find our work helpful, feel free to give us a cite.
 
132
  journal={arXiv preprint arXiv:2502.19634},
133
  year={2025}
134
  }
135
+ ```