πŸ“Š SMITH β€” Static Malware Interpreter & Threat Heuristic

SMITH (Static Malware Interpreter & Threat Heuristic) is a transformer-based artificial intelligence model designed for structured and interpretable static malware analysis. Built through parameter-efficient fine-tuning (LoRA) on top of a Qwen2.5 backbone and augmented with retrieval-based threat context, SMITH generates JSON-formatted output that includes reasoning, identified indicators, confidence scores, actionable recommendations, and MITRE ATT&CK technique mappings. This makes SMITH a valuable tool for cybersecurity analysts, automation systems, and threat intelligence pipelines.


🧠 Model Description

SMITH is a domain-specialized large language model tailored for static analysis of malware artifacts and descriptions. It interprets features such as permissions, API calls, imported functions, and known Indicators of Compromise (IoC), and produces structured, machine-readable assessments that can be integrated into analysis workflows or SOC automation.

The model combines:

  • Fine-tuned transformer intelligence for reasoning over static features
  • Retrieval-augmented context from a curated threat corpus
  • Structured output for easy integration with tools and scripts

SMITH is optimized to run on practical hardware while delivering interpretable cybersecurity insights.


🎯 Intended Use Cases

SMITH is intended for defensive cybersecurity applications, including:

  • Interpreting static malware artifacts (e.g., APK metadata, PE strings)
  • Producing structured JSON analysis suitable for automation
  • Mapping malware behavior to MITRE ATT&CK techniques
  • Assisting security analysts with context-aware reasoning
  • Generating YARA rules based on detected indicators

⚠️ Static analysis only β€” SMITH does NOT execute malware or perform dynamic analysis.


πŸ§ͺ Example Usage (Python)

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "zeltera/smith"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    dtype=torch.float16
)

prompt = """
You are a cybersecurity malware analysis assistant.
Respond ONLY in valid JSON with these fields:
- reasoning
- indicators
- confidence
- recommendation
- mitre_attack

Input:
APK requests READ_SMS and communicates with api.telegram.org
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=300)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🧠 Model Architecture and Training

Base Model: Qwen2.5 family (compact, efficient)

Fine-Tuning: LoRA with domain-specific malware analysis examples

Output Format: Structured JSON

Hardware: Designed for inference on commodity GPUs and CPU environments

⚠️ Limitations

SMITH is strictly for static analysis of malware descriptions.

Does not execute or sandbox any executable or mobile binary.

Analysis should be reviewed by qualified security staff.

Confidence scores are heuristic and not absolute.

πŸ“š Ethical and Safe Use

SMITH is intended for defensive cybersecurity and threat intelligence purposes. It should not be used to generate or assist in creating malware, malicious code, or harmful artifacts. Users should comply with all relevant laws and organizational policies.

Author: Gunawan Adi Wijaya

Downloads last month
-
Safetensors
Model size
0.5B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support

Model tree for zeltera/SMITH

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(507)
this model