AymenELKani commited on
Commit
7f2acb5
·
verified ·
1 Parent(s): 6f717f4

End of DPO training

Browse files
README.md CHANGED
@@ -1,53 +1,69 @@
1
  ---
 
2
  library_name: transformers
3
- base_model: AymenELKani/codeReasoningGPT
4
  tags:
5
  - generated_from_trainer
6
- model-index:
7
- - name: results
8
- results: []
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
 
14
- # results
 
15
 
16
- This model is a fine-tuned version of [AymenELKani/codeReasoningGPT](https://huggingface.co/AymenELKani/codeReasoningGPT) on an unknown dataset.
17
 
18
- ## Model description
 
19
 
20
- More information needed
21
-
22
- ## Intended uses & limitations
23
-
24
- More information needed
25
-
26
- ## Training and evaluation data
27
-
28
- More information needed
29
 
30
  ## Training procedure
31
 
32
- ### Training hyperparameters
33
-
34
- The following hyperparameters were used during training:
35
- - learning_rate: 5e-05
36
- - train_batch_size: 2
37
- - eval_batch_size: 2
38
- - seed: 42
39
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
40
- - lr_scheduler_type: linear
41
- - num_epochs: 3
42
- - mixed_precision_training: Native AMP
43
-
44
- ### Training results
45
 
46
 
 
47
 
48
  ### Framework versions
49
 
50
- - Transformers 4.56.2
51
- - Pytorch 2.8.0+cu126
52
- - Datasets 4.0.0
53
- - Tokenizers 0.22.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: AymenELKani/codeReasoningGPT-v2
3
  library_name: transformers
4
+ model_name: codeReasoningGPT-v2
5
  tags:
6
  - generated_from_trainer
7
+ - dpo
8
+ - trl
9
+ licence: license
10
  ---
11
 
12
+ # Model Card for codeReasoningGPT-v2
 
13
 
14
+ This model is a fine-tuned version of [AymenELKani/codeReasoningGPT-v2](https://huggingface.co/AymenELKani/codeReasoningGPT-v2).
15
+ It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
+ ## Quick start
18
 
19
+ ```python
20
+ from transformers import pipeline
21
 
22
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="AymenELKani/codeReasoningGPT-v2", device="cuda")
24
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
+ print(output["generated_text"])
26
+ ```
 
 
 
 
27
 
28
  ## Training procedure
29
 
30
+
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
 
33
+ This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
34
 
35
  ### Framework versions
36
 
37
+ - TRL: 0.23.0
38
+ - Transformers: 4.56.2
39
+ - Pytorch: 2.8.0+cu126
40
+ - Datasets: 4.0.0
41
+ - Tokenizers: 0.22.0
42
+
43
+ ## Citations
44
+
45
+ Cite DPO as:
46
+
47
+ ```bibtex
48
+ @inproceedings{rafailov2023direct,
49
+ title = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
50
+ author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
51
+ year = 2023,
52
+ booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
53
+ url = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
54
+ editor = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
55
+ }
56
+ ```
57
+
58
+ Cite TRL as:
59
+
60
+ ```bibtex
61
+ @misc{vonwerra2022trl,
62
+ title = {{TRL: Transformer Reinforcement Learning}},
63
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
64
+ year = 2020,
65
+ journal = {GitHub repository},
66
+ publisher = {GitHub},
67
+ howpublished = {\url{https://github.com/huggingface/trl}}
68
+ }
69
+ ```
adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "AymenELKani/codeReasoningGPT-v2",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 8,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 4,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "c_attn",
29
+ "c_proj"
30
+ ],
31
+ "target_parameters": null,
32
+ "task_type": "CAUSAL_LM",
33
+ "trainable_token_indices": null,
34
+ "use_dora": false,
35
+ "use_qalora": false,
36
+ "use_rslora": false
37
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:14397ddf3e7b48eb4222ac7358feaa0d2fe11bdc842b30143f924dd40e9dd074
3
+ size 1631040
tokenizer.json CHANGED
@@ -1,21 +1,7 @@
1
  {
2
  "version": "1.0",
3
- "truncation": {
4
- "direction": "Right",
5
- "max_length": 512,
6
- "strategy": "LongestFirst",
7
- "stride": 0
8
- },
9
- "padding": {
10
- "strategy": {
11
- "Fixed": 512
12
- },
13
- "direction": "Right",
14
- "pad_to_multiple_of": null,
15
- "pad_id": 50256,
16
- "pad_type_id": 0,
17
- "pad_token": "<|endoftext|>"
18
- },
19
  "added_tokens": [
20
  {
21
  "id": 50256,
 
1
  {
2
  "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  "added_tokens": [
6
  {
7
  "id": 50256,
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6050a8527b407628d80fd8b3dc3d1acf32a6210b374385f453ba60ccdb3528d9
3
- size 5713
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:306ca8161138d50c8189c385e52a39d7c067213b7cae679320d4a371fa203f64
3
+ size 6737