Qwen2.5-0.5B GGUF for Text-to-SQL (CPU Inference)

This is a GGUF format version optimized for CPU inference with llama.cpp.

Quick Links

🔧 LoRA Adapter: vindows/qwen2.5-0.5b-text-to-sql
🔥 Merged GPU Model: vindows/qwen2.5-0.5b-text-to-sql-merged

Model Details

Format: GGUF f16 (float16 precision)
File Size: 949MB
Optimized For: CPU inference with llama.cpp
Recommended RAM: 4GB+

Performance

Spider Benchmark (200 examples)

Metric	Score
Exact Match	0.00%
Normalized Match	0.00%
Component Accuracy	91.94%
Average Similarity	21.78%

Usage with llama.cpp

Installation

# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

# Download the model
huggingface-cli download vindows/qwen2.5-0.5b-text-to-sql-gguf qwen2.5-0.5b-text-to-sql-f16.gguf

Run Inference

./llama-cli \
  -m qwen2.5-0.5b-text-to-sql-f16.gguf \
  -p "Convert the following natural language question to SQL:\n\nDatabase: concert_singer\nQuestion: How many singers do we have?\n\nSQL:" \
  -n 128 \
  --temp 0.1

Python Usage (llama-cpp-python)

pip install llama-cpp-python

from llama_cpp import Llama

# Load model
llm = Llama(
    model_path="qwen2.5-0.5b-text-to-sql-f16.gguf",
    n_ctx=2048,
    n_threads=8
)

# Generate SQL
prompt = """Convert the following natural language question to SQL:

Database: concert_singer
Question: How many singers do we have?

SQL:"""

output = llm(prompt, max_tokens=128, temperature=0.1, stop=["\n\n"])
sql = output['choices'][0]['text'].strip()
print(sql)

Quantization Options

This model is provided in f16 format. For smaller file sizes with slight quality trade-off, you can quantize further:

# Quantize to Q4_K_M (recommended for most use cases)
./llama-quantize qwen2.5-0.5b-text-to-sql-f16.gguf qwen2.5-0.5b-text-to-sql-Q4_K_M.gguf Q4_K_M

# Quantize to Q8_0 (higher quality, larger size)
./llama-quantize qwen2.5-0.5b-text-to-sql-f16.gguf qwen2.5-0.5b-text-to-sql-Q8_0.gguf Q8_0

Files

qwen2.5-0.5b-text-to-sql-f16.gguf - F16 quantized model (949MB)

Limitations

See main model card for limitations.

License

Apache 2.0

Downloads last month: 21

GGUF

Model size

0.5B params

Architecture

qwen2

Hardware compatibility

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vindows/qwen2.5-0.5b-text-to-sql-gguf

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Quantized

(149)

this model