You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

⚖️ To use commercially, please obtain a license. 🙏 Thank you for supporting my research! 🤗

ClinicalEncoder25: The First Diagnosable ColBERT for Clinical Reasoning

ClinicalEncoder25 is a breakthrough in AI for healthcare—a non-generative, interpretable reasoning model that understands clinical text at millisecond speed, with token-level precision. Built on the new Diagnosable ColBERT architecture, it maps every word to a semantic clinical graph, enabling real-time reasoning, retrieval, and debugging.

📖 Read the full announcement: ClinicalEncoder25: The First Diagnosable ColBERT
🧪 Try the live demo: Hover over any word for more details!

Why ClinicalEncoder25?

Most AI models today focus on generation, but understanding comes first. ClinicalEncoder25 is designed for deep, interpretable reasoning in clinical and medical texts, with:

Millisecond-latency document encoding
Token-level semantic mapping to medical ontologies (UMLS, SnomedCT, ICD-10, etc.)
Hallucination-free, non-generative reasoning
Live debugging and interpretability via the Diagnosable ColBERT architecture

It’s the first model to combine late-interaction retrieval, clinical coding, and topic extraction in a single, unified representation.

Model Details

Model Description

Model Type: PyLate ColBERT
Base Model: ettin-encoder-400m
Document Length: 2048 tokens (supports up to 8194 tokens outside pylate)
Query Length: 64 tokens
Output Dimensionality: 128 (after projection) / 1024 (before projection)
Similarity Function: MaxSim
Language: English
License: CC-BY-NC 4.0

Key Features

Diagnosable ColBERT: Every token is interpretable and mapped to a clinical concept.
ClinicalMap25 Integration: Use without the dense projection layer to map tokens directly to medical concepts at the L2 level.
Efficient Retrieval: Uses FastPLAID for fast, scalable similarity search.

Model Sources

Documentation: PyLate Documentation
Repository: PyLate on GitHub
Hugging Face: PyLate models on Hugging Face

Full Model Architecture

ColBERT(
  (0): Transformer({'max_seq_length': 2047, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Dense({'in_features': 1024, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
)

Usage

First install the PyLate library:

pip install -U pylate

Retrieval

Use this model with PyLate to index and retrieve documents. The index uses FastPLAID for efficient similarity search.

Indexing documents

Load the ColBERT model and initialize the PLAID index, then encode and index your documents:

from pylate import indexes, models, retrieve

# Step 1: Load the ColBERT model
model = models.ColBERT(
    model_name_or_path="Parallia/ClinicalEncoder25-Diagnosable-Colbert-L2-for-medical-texts",
)

# Step 2: Initialize the PLAID index
index = indexes.PLAID(
    index_folder="pylate-index",
    index_name="index",
    override=True,  # This overwrites the existing index if any
)

# Step 3: Encode the documents
documents_ids = ["1", "2", "3"]
documents = ["the patient is cold", "the weather is cold", "hypothermia"]

documents_embeddings = model.encode(
    documents,
    batch_size=32,
    is_query=False,  # Ensure that it is set to False to indicate that these are documents, not queries
    show_progress_bar=True,
)

# Step 4: Add document embeddings to the index by providing embeddings and corresponding ids
index.add_documents(
    documents_ids=documents_ids,
    documents_embeddings=documents_embeddings,
)

Note that you do not have to recreate the index and encode the documents every time. Once you have created an index and added the documents, you can re-use the index later by loading it:

# To load an index, simply instantiate it with the correct folder/name and without overriding it
index = indexes.PLAID(
    index_folder="pylate-index",
    index_name="index",
)

Retrieving top-k documents for queries

Once the documents are indexed, you can retrieve the top-k most relevant documents for a given set of queries. To do so, initialize the ColBERT retriever with the index you want to search in, encode the queries and then retrieve the top-k documents to get the top matches ids and relevance scores:

# Step 1: Initialize the ColBERT retriever
retriever = retrieve.ColBERT(index=index)

# Step 2: Encode the queries
queries_embeddings = model.encode(
    ["low body temperature", "it is snowing"],
    batch_size=32,
    is_query=True,  #  # Ensure that it is set to False to indicate that these are queries
    show_progress_bar=True,
)

# Step 3: Retrieve top-k documents
scores = retriever.retrieve(
    queries_embeddings=queries_embeddings,
    k=10,  # Retrieve the top 10 matches for each query
)

Reranking

If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:

from pylate import rank, models

queries = [
    "low body temperature",
    "it is snowing",
]

documents = [
    ["document A", "document B"],
    ["document 1", "document C", "document B"],
]

documents_ids = [
    [1, 2],
    [1, 3, 2],
]

model = models.ColBERT(
    model_name_or_path="pylate_model_id",
)

queries_embeddings = model.encode(
    queries,
    is_query=True,
)

documents_embeddings = model.encode(
    documents,
    is_query=False,
)

reranked_documents = rank.rerank(
    documents_ids=documents_ids,
    queries_embeddings=queries_embeddings,
    documents_embeddings=documents_embeddings,
)

Finetuning

This model is not a typical ColBERT model, and should be able to be used as a finetuning base for many different tasks, including medical NER and NEL.

To do so, just load the model using the AutoModel.from_pretrained method of the transformers library, and finetune the model for your favorite task.

This model should perform amazingly well for tasks that require deep understanding of medical concepts.

Downloads last month: 40

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for Parallia/ClinicalEncoder25-Diagnosable-Colbert-L2-for-medical-texts

Base model

jhu-clsp/ettin-encoder-400m

Finetuned

(5)

this model

Collection including Parallia/ClinicalEncoder25-Diagnosable-Colbert-L2-for-medical-texts

ClinicalEncoder25: a Diagnosable ColBERT for medical texts

Collection

In this collection, you will find other released models and datasets for ClinicalEncoder25, our brand new retrieval and reasoning model for healthcare • 2 items • Updated 2 days ago