Update README.md

2cb0a6e verified 6 months ago

3.11 kB

metadata

license: mit
datasets:
  - mteb/nfcorpus
language:
  - en
pipeline_tag: text-retrieval
library_name: sentence-transformers
tags:
  - mteb
  - text
  - transformers
  - text-embeddings-inference
  - sparse-encoder
  - sparse
  - csr
model-index:
  - name: NV-Embed-v2
    results:
      - dataset:
          name: MTEB NFCorpus
          type: mteb/nfcorpus
          revision: ec0fa4fe99da2ff19ca1214b7966684033a58814
          config: default
          split: test
          languages:
            - eng-Latn
        metrics:
          - type: ndcg@1
            value: 0.43189
          - type: ndcg@3
            value: 0.41132
          - type: ndcg@5
            value: 0.40406
          - type: ndcg@10
            value: 0.39624
          - type: ndcg@20
            value: 0.38517
          - type: ndcg@100
            value: 0.40068
          - type: ndcg@1000
            value: 0.49126
          - type: map@10
            value: 0.14342
          - type: map@100
            value: 0.21866
          - type: map@1000
            value: 0.2427
          - type: recall@10
            value: 0.1968
          - type: recall@100
            value: 0.45592
          - type: recall@1000
            value: 0.78216
          - type: precision@1
            value: 0.45511
          - type: precision@10
            value: 0.32353
          - type: mrr@10
            value: 0.537792
          - type: main_score
            value: 0.39624
        task:
          type: Retrieval
base_model:
  - nvidia/NV-Embed-v2

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our Github.

Usage

📌 Tip: For NV-Embed-V2, using Transformers versions later than 4.47.0 may lead to performance degradation, as model_type=bidir_mistral in config.json is no longer supported.

We recommend using Transformers 4.47.0.

Sentence Transformers Usage

You can evaluate this model loaded by Sentence Transformers with the following code snippet:

import mteb
from sentence_transformers import SparseEncoder

model = SparseEncoder("Y-Research-Group/CSR-NV_Embed_v2-Retrieval-NFcorpus", trust_remote_code=True)
model.prompts = {
    "NFCorpus-query": "Instruct: Given a question, retrieve relevant documents that answer the question\nQuery:"
}

task = mteb.get_tasks(tasks=["NFCorpus"])
evaluation = mteb.MTEB(tasks=task)
evaluation.run(
    model,
    eval_splits=["test"],
    output_folder="./results/NFCorpus",
    show_progress_bar=True,
    encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8},
)  # MTEB don't support sparse tensors yet, so we need to convert to dense tensors

Citation

@misc{wen2025matryoshkarevisitingsparsecoding,
      title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation}, 
      author={Tiansheng Wen and Yifei Wang and Zequn Zeng and Zhong Peng and Yudi Su and Xinyang Liu and Bo Chen and Hongwei Liu and Stefanie Jegelka and Chenyu You},
      year={2025},
      eprint={2503.01776},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.01776}, 
}