π SMITH β Static Malware Interpreter & Threat Heuristic
SMITH (Static Malware Interpreter & Threat Heuristic) is a transformer-based artificial intelligence model designed for structured and interpretable static malware analysis. Built through parameter-efficient fine-tuning (LoRA) on top of a Qwen2.5 backbone and augmented with retrieval-based threat context, SMITH generates JSON-formatted output that includes reasoning, identified indicators, confidence scores, actionable recommendations, and MITRE ATT&CK technique mappings. This makes SMITH a valuable tool for cybersecurity analysts, automation systems, and threat intelligence pipelines.
π§ Model Description
SMITH is a domain-specialized large language model tailored for static analysis of malware artifacts and descriptions. It interprets features such as permissions, API calls, imported functions, and known Indicators of Compromise (IoC), and produces structured, machine-readable assessments that can be integrated into analysis workflows or SOC automation.
The model combines:
- Fine-tuned transformer intelligence for reasoning over static features
- Retrieval-augmented context from a curated threat corpus
- Structured output for easy integration with tools and scripts
SMITH is optimized to run on practical hardware while delivering interpretable cybersecurity insights.
π― Intended Use Cases
SMITH is intended for defensive cybersecurity applications, including:
- Interpreting static malware artifacts (e.g., APK metadata, PE strings)
- Producing structured JSON analysis suitable for automation
- Mapping malware behavior to MITRE ATT&CK techniques
- Assisting security analysts with context-aware reasoning
- Generating YARA rules based on detected indicators
β οΈ Static analysis only β SMITH does NOT execute malware or perform dynamic analysis.
π§ͺ Example Usage (Python)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "zeltera/smith"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
dtype=torch.float16
)
prompt = """
You are a cybersecurity malware analysis assistant.
Respond ONLY in valid JSON with these fields:
- reasoning
- indicators
- confidence
- recommendation
- mitre_attack
Input:
APK requests READ_SMS and communicates with api.telegram.org
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=300)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π§ Model Architecture and Training
Base Model: Qwen2.5 family (compact, efficient)
Fine-Tuning: LoRA with domain-specific malware analysis examples
Output Format: Structured JSON
Hardware: Designed for inference on commodity GPUs and CPU environments
β οΈ Limitations
SMITH is strictly for static analysis of malware descriptions.
Does not execute or sandbox any executable or mobile binary.
Analysis should be reviewed by qualified security staff.
Confidence scores are heuristic and not absolute.
π Ethical and Safe Use
SMITH is intended for defensive cybersecurity and threat intelligence purposes. It should not be used to generate or assist in creating malware, malicious code, or harmful artifacts. Users should comply with all relevant laws and organizational policies.
Author: Gunawan Adi Wijaya
- Downloads last month
- -
Model tree for zeltera/SMITH
Base model
Qwen/Qwen2.5-0.5B