jokegen2-1t-rl
LoRA adapter for Kimi-K2-Thinking. RL-tuned with comedy rubrics.
what it is
Reinforcement learning on top of SFT. It uses a qualitative rubric similar to how Kimi was able to score high on eqbench. The rubrics score for specificity, turn, compression, and commitment. There is also one prescriptive rubric to prevent reward hacking, no laugh signaling, no explanation, no hedging, and no AI-isms -- otherwise the model will learn to append something similar to "lmao ๐" to everything.
For this experiment, I used qwen3 8b to grade the jokes.
quickstart
pip install tinker transformers
export TINKER_API_KEY=your_key
import tinker
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("moonshotai/Kimi-K2-Thinking", trust_remote_code=True)
sampler = tinker.ServiceClient().create_sampling_client(
model_path="tinker://4977bc84-470e-50e8-89fd-0bf9cd13c372:train:0/sampler_weights/000050"
)
prompt = "<|im_start|>system\nYou write sharp, witty comedy.<|im_end|>\n<|im_start|>user\nwrite a joke about startups<|im_end|>\n<|im_start|>assistant\n"
response = sampler.sample(
prompt=tinker.types.ModelInput.from_ints(tokenizer.encode(prompt)),
sampling_params=tinker.types.SamplingParams(max_tokens=256, temperature=0.8, stop=["<|im_end|>"]),
).result()
print(tokenizer.decode(response.sequences[0].tokens[len(tokenizer.encode(prompt)):]))
local inference
tbd
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for sdan/jokegen2-1t-rl
Base model
moonshotai/Kimi-K2-Thinking