Qwen2.5-0.5B GGUF for Text-to-SQL (CPU Inference)

This is a GGUF format version optimized for CPU inference with llama.cpp.

Quick Links

Model Details

  • Format: GGUF f16 (float16 precision)
  • File Size: 949MB
  • Optimized For: CPU inference with llama.cpp
  • Recommended RAM: 4GB+

Performance

Spider Benchmark (200 examples)

Metric Score
Exact Match 0.00%
Normalized Match 0.00%
Component Accuracy 91.94%
Average Similarity 21.78%

Usage with llama.cpp

Installation

# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

# Download the model
huggingface-cli download vindows/qwen2.5-0.5b-text-to-sql-gguf qwen2.5-0.5b-text-to-sql-f16.gguf

Run Inference

./llama-cli \
  -m qwen2.5-0.5b-text-to-sql-f16.gguf \
  -p "Convert the following natural language question to SQL:\n\nDatabase: concert_singer\nQuestion: How many singers do we have?\n\nSQL:" \
  -n 128 \
  --temp 0.1

Python Usage (llama-cpp-python)

pip install llama-cpp-python
from llama_cpp import Llama

# Load model
llm = Llama(
    model_path="qwen2.5-0.5b-text-to-sql-f16.gguf",
    n_ctx=2048,
    n_threads=8
)

# Generate SQL
prompt = """Convert the following natural language question to SQL:

Database: concert_singer
Question: How many singers do we have?

SQL:"""

output = llm(prompt, max_tokens=128, temperature=0.1, stop=["\n\n"])
sql = output['choices'][0]['text'].strip()
print(sql)

Quantization Options

This model is provided in f16 format. For smaller file sizes with slight quality trade-off, you can quantize further:

# Quantize to Q4_K_M (recommended for most use cases)
./llama-quantize qwen2.5-0.5b-text-to-sql-f16.gguf qwen2.5-0.5b-text-to-sql-Q4_K_M.gguf Q4_K_M

# Quantize to Q8_0 (higher quality, larger size)
./llama-quantize qwen2.5-0.5b-text-to-sql-f16.gguf qwen2.5-0.5b-text-to-sql-Q8_0.gguf Q8_0

Files

  • qwen2.5-0.5b-text-to-sql-f16.gguf - F16 quantized model (949MB)

Limitations

See main model card for limitations.

License

Apache 2.0

Downloads last month
21
GGUF
Model size
0.5B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for vindows/qwen2.5-0.5b-text-to-sql-gguf

Base model

Qwen/Qwen2.5-0.5B
Quantized
(149)
this model