jokegen2-1t-rl

GitHub

LoRA adapter for Kimi-K2-Thinking. RL-tuned with comedy rubrics.

what it is

Reinforcement learning on top of SFT. It uses a qualitative rubric similar to how Kimi was able to score high on eqbench. The rubrics score for specificity, turn, compression, and commitment. There is also one prescriptive rubric to prevent reward hacking, no laugh signaling, no explanation, no hedging, and no AI-isms -- otherwise the model will learn to append something similar to "lmao ๐Ÿ’€" to everything.

For this experiment, I used qwen3 8b to grade the jokes.

quickstart

pip install tinker transformers
export TINKER_API_KEY=your_key
import tinker
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("moonshotai/Kimi-K2-Thinking", trust_remote_code=True)
sampler = tinker.ServiceClient().create_sampling_client(
    model_path="tinker://4977bc84-470e-50e8-89fd-0bf9cd13c372:train:0/sampler_weights/000050"
)

prompt = "<|im_start|>system\nYou write sharp, witty comedy.<|im_end|>\n<|im_start|>user\nwrite a joke about startups<|im_end|>\n<|im_start|>assistant\n"
response = sampler.sample(
    prompt=tinker.types.ModelInput.from_ints(tokenizer.encode(prompt)),
    sampling_params=tinker.types.SamplingParams(max_tokens=256, temperature=0.8, stop=["<|im_end|>"]),
).result()
print(tokenizer.decode(response.sequences[0].tokens[len(tokenizer.encode(prompt)):]))

local inference

tbd

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sdan/jokegen2-1t-rl

Adapter
(5)
this model