MoeReward/combined_rlhf_dataset_grpo_imdb_main_2K
Viewer
•
Updated
•
2k
•
9
MoeReward/combined_rlhf_dataset_grpo_metamath_main_2K
Viewer
•
Updated
•
2k
•
9
MoeReward/combined_rlhf_dataset_grpo_arc_main_2K
Viewer
•
Updated
•
2k
•
9
MoeReward/combined_rlhf_dataset_grpo_nq_main_2K
Viewer
•
Updated
•
2k
•
8
MoeReward/combined_rlhf_dataset_grpo_equal_dist_2K
Viewer
•
Updated
•
2k
•
10
MoeReward/combined_rlhf_dataset_grpo_imdb_main
Viewer
•
Updated
•
4k
•
8
MoeReward/combined_rlhf_dataset_grpo_metamath_main
Viewer
•
Updated
•
4k
•
14
MoeReward/combined_rlhf_dataset_grpo_arc_main
Viewer
•
Updated
•
4k
•
8
MoeReward/combined_rlhf_dataset_grpo_nq_main
Viewer
•
Updated
•
4k
•
12
MoeReward/combined_rlhf_dataset_grpo_equal_dist
Viewer
•
Updated
•
4k
•
6
MoeReward/preference_dataset_stepmath_ood
Viewer
•
Updated
•
10.8k
•
9
MoeReward/combined_preference_dataset_ood
MoeReward/combined_rlhf_dataset_alpaca
Viewer
•
Updated
•
52k
•
12
MoeReward/combined_rlhf_dataset_math
Viewer
•
Updated
•
40k
•
11
MoeReward/combined_rlhf_dataset_code
Viewer
•
Updated
•
20k
•
7
MoeReward/combined_preference_dataset_ood_alpaca_heavy
Viewer
•
Updated
•
3k
•
3
MoeReward/combined_preference_dataset_ood_coding_heavy
Viewer
•
Updated
•
3k
•
7
MoeReward/combined_preference_dataset_ood_math_heavy
Viewer
•
Updated
•
3k
•
6
MoeReward/combined_preference_dataset_ood_equal_dist
Viewer
•
Updated
•
3k
•
5
MoeReward/combined_preference_dataset_qwen2.5_1.5b_base
Viewer
•
Updated
•
47.4k
•
5
MoeReward/combined_preference_dataset_qwen2.5_1.5b_base_alpaca_heavy
Viewer
•
Updated
•
10k
•
6
MoeReward/combined_preference_dataset_qwen2.5_1.5b_base_coding_heavy
Viewer
•
Updated
•
10k
•
14
•
1
MoeReward/combined_preference_dataset_qwen2.5_1.5b_base_math_heavy
Viewer
•
Updated
•
10k
•
6
MoeReward/combined_preference_dataset_qwen2.5_1.5b_base_equal_dist
Viewer
•
Updated
•
10k
•
6
MoeReward/combined_preference_dataset_qwen2.5_base
Viewer
•
Updated
•
57.6k
•
5
MoeReward/combined_preference_dataset_qwen2.5_base_alpaca_heavy
Viewer
•
Updated
•
10k
•
5
MoeReward/combined_preference_dataset_qwen2.5_base_coding_heavy
Viewer
•
Updated
•
10k
•
14
•
1
MoeReward/combined_preference_dataset_qwen2.5_base_math_heavy
Viewer
•
Updated
•
10k
•
5
MoeReward/combined_preference_dataset_qwen2.5_base_equal_dist
Viewer
•
Updated
•
10k
•
5
MoeReward/combined_rlhf_dataset_balanced
Viewer
•
Updated
•
10k
•
9