--- license: mit language: - en pipeline_tag: text-generation datasets: - haykgrigorian/TimeCapsuleLLM-London-1800-1875-v2-15GB library_name: transformers --- # haykgrigorian/v2mini-eval1: Llama-Architecture 318M Model ## Model Overview **v2mini-eval1** model, trained from scratch on 15GB of 1800-1875 london texts using the modern Llama architecture. This model was trained for v2's dataset evaluation. | Detail | Value | | :--- | :--- | | **Model Architecture** | LlamaForCausalLM (Decoder-Only Transformer) | | **Parameter Count** | **~318 Million (318M)** | | **Training Type** | Trained **from Scratch** (Random Initialization) | | **Tokenizer** | Custom BPE, Vocab Size 32,000 | | **Sequence Length** | 1024 tokens | | **Attention Type** | Grouped Query Attention (GQA) | ## Configuration Details This model is a custom size and configuration based on Llama: | Parameter | Value | | :--- | :--- | | **Number of Layers** | 20 | | **Hidden Size (d)** | 1024 | | **Intermediate Size ($\text{d}_{\text{ff}}$)** | 2752 | | **Attention Heads** | 16 (Query) / 8 (Key/Value) | | **Activation Function** | SiLU (`silu`) | | **Normalization** | RMS Norm (`rms_norm_eps`: 1e-05) | | **Position Embeddings** | Rotary Positional Embeddings (RoPE) | ## Model Issues This is an evaluation model, it was trained from scratch using a 15GB sample from a 90GB dataset for 10k steps. There was a tokenization issue and output comes out like this: - default: "D oes that work more of h ise x cell ent st ir ring , in his pl ays" - fixed: "Does that work more of his excellent stirring, in his plays" This is just a tokenizer issue, just fix the output yourself or if you're lazy feed it to an LLM and have it fixed. ### How to Load and Run the Model Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts ### Test script A run file for testing and evaluating this model is available on the main project repository: * **Test Script Link:** [test_v2mini_eval1.py on GitHub](https://github.com/haykgrigo3/TimeCapsuleLLM/blob/main/london_1800_1875_v2mini_eval1/test_v2mini_eval1.py)