SentenceTransformer based on Snowflake/snowflake-arctic-embed-l-v2.0

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-l-v2.0. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Snowflake/snowflake-arctic-embed-l-v2.0
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '[MASK] multiplied, in any case in which the service that was required has been performed in part, by the percentage which the length of service that was not performed is to the length of the service that was required to be performed. The amount will be determined under the following formula:',
    'the period of appointment begins, multiplied, in any case in which the service that was required has been performed in part, by the percentage which the length of service that was not performed is to the length of the service that was required to be performed. The amount will be determined under the following formula:',
    '[SUBSECTION c] Special enrollment periods.. A Part D eligible individual may enroll in a PDP or disenroll from a PDP and enroll in another PDP or MA-PD plan (as provided at § 422.62(b) of this chapter), as applicable, under any of the following circumstances: [CLAUSE 1] The individual involuntarily loses creditable prescription drug coverage or such coverage is involuntarily reduced so that it is no longer creditable coverage as defined under § 423.56(a). Loss of credible prescription drug coverage due to failure to pay any required premium is not considered involuntary loss of the coverage. [CLAUSE 2] The individual was not adequately informed, as required by standards established by CMS under § 423.56, that he or she has lost his or her creditable prescription drug coverage, that he or she never had credible prescription drug coverage, or the coverage is involuntarily reduced so that it is no longer creditable prescription drug coverage.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9334, 0.2975],
#         [0.9334, 1.0000, 0.2973],
#         [0.2975, 0.2973, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 24,880 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 43 tokens
    • mean: 123.38 tokens
    • max: 340 tokens
    • min: 43 tokens
    • mean: 126.43 tokens
    • max: 366 tokens
  • Samples:
    sentence_0 sentence_1
    [SECTION HEADING] § [REF] Waiver of right of recovery. In determining whether there is good cause for waiver of any right of recovery which he may have against any applicant by reason of any payments made pursuant to a loan guarantee under this subpart, the Secretary shall take into consideration the extent to which: [SECTION HEADING] § 57.1517 Waiver of right of recovery. In determining whether there is good cause for waiver of any right of recovery which he may have against any applicant by reason of any payments made pursuant to a loan guarantee under this subpart, the Secretary shall take into consideration the extent to which:
    [SUBSECTION A] Any unmet deductible applied to the charges related to the reasonable costs that the facility incurs in providing the covered services; [CLAUSE 7] Rural health clinic services that meet the requirements set forth in part 491 of this chapter. [CITATIONS] [SUBSECTION A] Any unmet deductible applied to the charges related to the reasonable costs that the facility incurs in providing the covered services; [CLAUSE 7] Rural health clinic services that meet the requirements set forth in part 491 of this chapter. [CITATIONS]
    [SUBSECTION b] Waiver of the right to appear.. (1) An enrollee may submit to OMHA a written statement indicating that he or she does not wish to appear at the hearing. [ITEM i] For expedited hearings, an enrollee may indicate in writing or orally [MASK] ii] The OMHA hearing office must document all oral waivers in writing and maintain the documentation in the case files. [CLAUSE 2] The enrollee may subsequently withdraw his or her waiver in writing at any time before the notice of the hearing decision is issued; however, by withdrawing the waiver the enrollee agrees to an extension of the adjudication period as specified in § 423.2016, that may be necessary to schedule and hold the hearing. [SUBSECTION b] Waiver of the right to appear.. (1) An enrollee may submit to OMHA a written statement indicating that he or she does not wish to appear at the hearing. [ITEM i] For expedited hearings, an enrollee may indicate in writing or orally that he or she does not wish to appear at the hearing. [ITEM ii] The OMHA hearing office must document all oral waivers in writing and maintain the documentation in the case files. [CLAUSE 2] The enrollee may subsequently withdraw his or her waiver in writing at any time before the notice of the hearing decision is issued; however, by withdrawing the waiver the enrollee agrees to an extension of the adjudication period as specified in § 423.2016, that may be necessary to schedule and hold the hearing.
  • Loss: DenoisingAutoEncoderLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.3215 500 6.6865
0.6431 1000 5.7788
0.9646 1500 5.5835
1.2862 2000 5.3876
1.6077 2500 5.2766
1.9293 3000 5.2108
2.2508 3500 5.1317
2.5723 4000 5.0701
2.8939 4500 5.0288

Framework Versions

  • Python: 3.12.6
  • Sentence Transformers: 5.2.0
  • Transformers: 4.56.0
  • PyTorch: 2.8.0+cu129
  • Accelerate: 1.10.1
  • Datasets: 4.4.1
  • Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

DenoisingAutoEncoderLoss

@inproceedings{wang-2021-TSDAE,
    title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
    author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    pages = "671--688",
    url = "https://arxiv.org/abs/2104.06979",
}
Downloads last month
3
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for atx-labs/snowflake-custom-noise-tsdae-cfr-finetuned

Finetuned
(19)
this model