Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Qwen3-VL-8B-Thinking
|
| 2 |
+
Run **Qwen3-VL-8B-Thinking** optimized for **Apple Silicon** on MLX with [NexaSDK](https://sdk.nexa.ai).
|
| 3 |
+
|
| 4 |
+
## Quickstart
|
| 5 |
+
|
| 6 |
+
1. **Install NexaSDK** and create a free account at [sdk.nexa.ai](https://sdk.nexa.ai)
|
| 7 |
+
2. **Activate your device** with your access token:
|
| 8 |
+
|
| 9 |
+
```bash
|
| 10 |
+
nexa config set license '<access_token>'
|
| 11 |
+
```
|
| 12 |
+
3. Run the model locally with one line of code:
|
| 13 |
+
|
| 14 |
+
```bash
|
| 15 |
+
nexa infer NexaAI/qwen3vl-8B-Thinking-fp16-mlx
|
| 16 |
+
```
|
| 17 |
+
|
| 18 |
+
## Model Description
|
| 19 |
+
**Qwen3-VL-8B-Thinking** is an 8-billion-parameter multimodal large language model from Alibaba Cloud’s Qwen team.
|
| 20 |
+
As part of the **Qwen3-VL** (Vision-Language) family, it is designed for deep multimodal reasoning — combining visual understanding, long-context comprehension, and structured chain-of-thought generation across text, images, and videos.
|
| 21 |
+
|
| 22 |
+
The **Thinking** variant focuses on advanced reasoning transparency and analytical precision. Compared to the *Instruct* version, it produces richer intermediate reasoning steps, enabling detailed explanation, planning, and multi-hop analysis across visual and textual inputs.
|
| 23 |
+
|
| 24 |
+
## Features
|
| 25 |
+
- **Deep Visual Reasoning**: Interprets complex scenes, charts, and documents with multi-step logic.
|
| 26 |
+
- **Chain-of-Thought Generation**: Produces structured reasoning traces for improved interpretability and insight.
|
| 27 |
+
- **Extended Context Handling**: Maintains coherence across longer multimodal sequences.
|
| 28 |
+
- **Multilingual Competence**: Understands and generates in multiple languages for global applicability.
|
| 29 |
+
- **High Accuracy at 8B Scale**: Achieves strong benchmark performance in multimodal reasoning and analysis tasks.
|
| 30 |
+
|
| 31 |
+
## Use Cases
|
| 32 |
+
- Research and analysis requiring visual reasoning transparency
|
| 33 |
+
- Complex multimodal QA and scientific problem solving
|
| 34 |
+
- Visual analytics and explanation generation
|
| 35 |
+
- Advanced agent systems needing structured thought or planning steps
|
| 36 |
+
- Educational tools requiring detailed, interpretable reasoning
|
| 37 |
+
|
| 38 |
+
## Inputs and Outputs
|
| 39 |
+
**Input:**
|
| 40 |
+
- Text, image(s), or multimodal combinations (including sequential frames or documents)
|
| 41 |
+
- Optional context for multi-turn or multi-modal reasoning
|
| 42 |
+
|
| 43 |
+
**Output:**
|
| 44 |
+
- Structured reasoning outputs with intermediate steps
|
| 45 |
+
- Detailed answers, explanations, or JSON-formatted reasoning traces
|
| 46 |
+
|
| 47 |
+
## License
|
| 48 |
+
Refer to the [official Qwen license](https://huggingface.co/Qwen) for usage and redistribution details.
|