diff --git a/README.md b/README.md
index 66fee20..2fa0c9a 100644
--- a/README.md
+++ b/README.md
@@ -9,17 +9,16 @@
 <div align="center">
   <h2>
     <a href="https://huggingface.co/minishlab"><strong>🤗 Models</strong></a> |
-    <a href="https://github.com/MinishLab/model2vec/tree/main/tutorials"><strong>📚 Tutorials</strong></a> |
-    <a href="https://minishlab.github.io/"><strong>🌐 Blog</strong></a> |
+    <a href="https://minish.ai/packages/model2vec/introduction"><strong>📖 Docs</strong></a> |
     <a href="https://github.com/MinishLab/model2vec/blob/main/results/README.md"><strong>🏆 Results</strong></a> |
-    <a href="https://github.com/MinishLab/model2vec/blob/main/docs"><strong>📖 Docs</strong></a>
-  </h2>
+    <a href="https://github.com/MinishLab/model2vec/tree/main/tutorials"><strong>📚 Tutorials</strong></a> |
+    <a href="https://minish.ai/blog"><strong>🌐 Blog</strong></a>
 </div>
 
 <div align="center">
   <h2>
     <a href="https://pypi.org/project/model2vec/"><img src="https://img.shields.io/pypi/v/model2vec?color=%23007ec6&label=pypi%20package" alt="Package version"></a>
-    <a href="https://pypi.org/project/model2vec/"><img src="https://img.shields.io/pypi/pyversions/model2vec" alt="Supported Python versions"></a>
+    <a href="https://minish.ai/packages/model2vec/introduction"><img src="https://img.shields.io/badge/docs-minish.ai-blue.svg" alt="Docs"></a>
     <a href="https://pepy.tech/project/model2vec">
       <img src="https://static.pepy.tech/badge/model2vec" alt="Downloads">
     </a>
@@ -32,6 +31,9 @@
     <a href="https://github.com/MinishLab/model2vec/blob/main/LICENSE">
       <img src="https://img.shields.io/badge/license-MIT-green" alt="License - MIT">
     </a>
+    <a href="https://github.com/MinishLab/model2vec/stargazers">
+      <img src="https://img.shields.io/github/stars/minishlab/model2vec.svg" alt=Stars">
+    </a>
   </h2>
 </div>
 
@@ -39,7 +41,7 @@
 
 
 
-Model2Vec is a technique to turn any sentence transformer into a really small static model, reducing model size by a factor up to 50 and making the models up to 500 times faster, with a small drop in performance. Our [best model](https://huggingface.co/minishlab/potion-base-8M) is the most performant static embedding model in the world. See our results [here](results/README.md), or dive in to see how it works.
+Model2Vec is a technique to turn any sentence transformer into a small, fast static embedding model. Model2Vec reduces model size by a factor up to 50 and makes models up to 500 times faster, with a small drop in performance. Our [best model](https://huggingface.co/minishlab/potion-base-8M) is the most performant static embedding model in the world. See our [results](results/README.md), read our [docs](https://minish.ai/packages/model2vec/introduction), or dive in to see how it works.
 
 <div align="center">
 <h3>
@@ -69,15 +71,14 @@ embeddings = model.encode(["It's dangerous to go alone!", "It's a secret to ever
 # Make sequences of token embeddings
 token_embeddings = model.encode_as_sequence(["It's dangerous to go alone!", "It's a secret to everybody."])
 ```
-
-Instead of using one of our models, you can also distill your own Model2Vec model from a Sentence Transformer model. First, install the `distillation` extras with:
+For advanced usage, see our [inference docs](https://minish.ai/packages/model2vec/inference). Instead of using one of our models, you can also distill your own Model2Vec model from a Sentence Transformer model. First, install the `distillation` extras with:
 
 ```bash
 pip install model2vec[distill]
 ```
 
 
- Then, you can distill a model in ~30 seconds on a CPU with the following code snippet:
+Then, you can distill a model in ~30 seconds on a CPU with the following code snippet:
 
 ```python
 from model2vec.distill import distill
@@ -89,7 +90,7 @@ m2v_model = distill(model_name="BAAI/bge-base-en-v1.5", pca_dims=256)
 m2v_model.save_pretrained("m2v_model")
 ```
 
-After distillation, you can also fine-tune your own classification models on top of the distilled model, or on a pre-trained model. First, make sure you install the `training` extras with:
+For advanced usage, see our [distillation docs](https://minish.ai/packages/model2vec/distillation), which includes some [distillation best practices](https://minish.ai/packages/model2vec/distillation#distillation-best-practices). After distillation, you can also fine-tune your own classification models on top of the distilled model, or on a pre-trained model. First, make sure you install the `training` extras with:
 
 ```bash
 pip install model2vec[train]
@@ -115,13 +116,13 @@ classifier.fit(ds["train"]["text"], ds["train"]["label"])
 classification_report = classifier.evaluate(ds["test"]["text"], ds["test"]["label"])
 ```
 
-For advanced usage, please refer to our [usage documentation](https://github.com/MinishLab/model2vec/blob/main/docs/usage.md).
+For advanced usage, see our [training docs](https://minish.ai/packages/model2vec/training).
 
 ## Updates & Announcements
 
 - **23/05/2025**: We released [potion-multilingual-128M](https://huggingface.co/minishlab/potion-multilingual-128M), a multilingual model trained on 101 languages. It is the best performing static embedding model for multilingual tasks, and is capable of generating embeddings for any text in any language. The results can be found in our [results](results/README.md#mmteb-results-multilingual) section.
 
-- **01/05/2025**: We released backend support for `BPE` and `Unigram` tokenizers, along with quantization and dimensionality reduction. New Model2Vec models are now 50% of the original models, and can be quantized to int8 to be 25% of the size, without loss of performance.
+- **01/05/2025**: We released backend support for `BPE` and `Unigram` tokenizers, along with quantization and dimensionality reduction. New Model2Vec models are now 50% of the original models size, and can be quantized to int8 to be 25% of the size, without loss of performance.
 
 - **12/02/2025**: We released **Model2Vec training**, allowing you to fine-tune your own classification models on top of Model2Vec models. Find out more in our [training documentation](https://github.com/MinishLab/model2vec/blob/main/model2vec/train/README.md) and [results](results/README.md#training-results).
 
@@ -137,24 +138,18 @@ For advanced usage, please refer to our [usage documentation](https://github.com
 - **Lightning-fast Inference**: up to 500 times faster on CPU than the original model.
 - **Fast, Dataset-free Distillation**: distill your own model in 30 seconds on a CPU, without a dataset.
 - **Fine-tuning**: fine-tune your own classification models on top of Model2Vec models.
-- **Integrated in many popular libraries**: Model2Vec is integrated direclty into popular libraries such as [Sentence Transformers](https://github.com/UKPLab/sentence-transformers) and [LangChain](https://github.com/langchain-ai/langchain). For more information, see our [integrations documentation](https://github.com/MinishLab/model2vec/blob/main/docs/integrations.md).
+- **Integrated in many popular libraries**: Model2Vec is integrated direclty into popular libraries such as [Sentence Transformers](https://github.com/UKPLab/sentence-transformers) and [LangChain](https://github.com/langchain-ai/langchain). For more information, see our [integrations documentation](https://minish.ai/packages/model2vec/integrations).
 - **Tightly integrated with HuggingFace hub**: easily share and load models from the HuggingFace hub, using the familiar `from_pretrained` and `push_to_hub`. Our own models can be found [here](https://huggingface.co/minishlab).
 
 ## What is Model2Vec?
 
 Model2vec creates a small, fast, and powerful model that outperforms other static embedding models by a large margin on all tasks we could find, while being much faster to create than traditional static embedding models such as GloVe. Like BPEmb, it can create subword embeddings, but with much better performance. Distillation doesn't need _any_ data, just a vocabulary and a model.
 
-The core idea is to forward pass a vocabulary through a sentence transformer model, creating static embeddings for the indiviudal tokens. After this, there are a number of post-processing steps we do that results in our best models. For a more extensive deepdive, please refer to the following resources:
-- Our initial [Model2Vec blog post](https://huggingface.co/blog/Pringled/model2vec). Note that, while this post gives a good overview of the core idea, we've made a number of substantial improvements since then.
-- Our [Tokenlearn blog post](https://minishlab.github.io/tokenlearn_blogpost/). This post describes the Tokenlearn method we used to train our [potion models](https://huggingface.co/collections/minishlab/potion-6721e0abd4ea41881417f062).
-- Our official [documentation](https://github.com/MinishLab/model2vec/blob/main/docs/what_is_model2vec.md). This document provides a high-level overview of how Model2Vec works.
+The core idea is to forward pass a vocabulary through a sentence transformer model, creating static embeddings for the indiviudal tokens. After this, there are a number of post-processing steps we do that results in our best models, as well as an optional pre-training step to further boost performance. For a more extensive deepdive, please refer to our [official documentation on how Model2Vec works](https://minish.ai/packages/model2vec/introduction#how-mode2vec-works).
 
 ## Documentation
 
-Our official documentation can be found [here](https://github.com/MinishLab/model2vec/blob/main/docs/README.md). This includes:
-- [Usage documentation](https://github.com/MinishLab/model2vec/blob/main/docs/usage.md): provides a technical overview of how to use Model2Vec.
-- [Integrations documentation](https://github.com/MinishLab/model2vec/blob/main/docs/integrations.md): provides examples of how to use Model2Vec in various downstream libraries.
-- [Model2Vec technical documentation](https://github.com/MinishLab/model2vec/blob/main/docs/what_is_model2vec.md): provides a high-level overview of how Model2Vec works.
+Our official documentation can be found [here](https://minish.ai/packages/model2vec/introduction). This includes in-depth documentation on [inference](https://minish.ai/packages/model2vec/inference), [distillation](https://minish.ai/packages/model2vec/distillation), [training](https://minish.ai/packages/model2vec/training), and [integrations](https://minish.ai/packages/model2vec/integrations).
 
 
 ## Model List
diff --git a/assets/images/logo.png b/assets/images/logo.png
deleted file mode 100644
index 6ffa6e1..0000000
Binary files a/assets/images/logo.png and /dev/null differ
diff --git a/assets/images/logo_v2.png b/assets/images/logo_v2.png
deleted file mode 100644
index 3a11ec9..0000000
Binary files a/assets/images/logo_v2.png and /dev/null differ
diff --git a/assets/images/model2vec_model_diagram.png b/assets/images/model2vec_model_diagram.png
deleted file mode 100644
index 4df89cb..0000000
Binary files a/assets/images/model2vec_model_diagram.png and /dev/null differ
diff --git a/assets/images/model2vec_model_diagram_transparant_dark.png b/assets/images/model2vec_model_diagram_transparant_dark.png
deleted file mode 100644
index 6b94884..0000000
Binary files a/assets/images/model2vec_model_diagram_transparant_dark.png and /dev/null differ
diff --git a/assets/images/model2vec_model_diagram_transparant_light.png b/assets/images/model2vec_model_diagram_transparant_light.png
deleted file mode 100644
index 6adb7fa..0000000
Binary files a/assets/images/model2vec_model_diagram_transparant_light.png and /dev/null differ
diff --git a/assets/images/sentences_per_second_vs_average_score.png b/assets/images/sentences_per_second_vs_average_score.png
deleted file mode 100644
index 210f462..0000000
Binary files a/assets/images/sentences_per_second_vs_average_score.png and /dev/null differ
diff --git a/assets/images/speed_vs_accuracy.png b/assets/images/speed_vs_accuracy.png
deleted file mode 100644
index 210f462..0000000
Binary files a/assets/images/speed_vs_accuracy.png and /dev/null differ
diff --git a/assets/images/speed_vs_accuracy_v2.png b/assets/images/speed_vs_accuracy_v2.png
deleted file mode 100644
index 448c0a7..0000000
Binary files a/assets/images/speed_vs_accuracy_v2.png and /dev/null differ
diff --git a/assets/images/speed_vs_accuracy_v3.png b/assets/images/speed_vs_accuracy_v3.png
deleted file mode 100644
index 192bb2f..0000000
Binary files a/assets/images/speed_vs_accuracy_v3.png and /dev/null differ
diff --git a/assets/images/speed_vs_accuracy_v4.png b/assets/images/speed_vs_accuracy_v4.png
deleted file mode 100644
index e63461d..0000000
Binary files a/assets/images/speed_vs_accuracy_v4.png and /dev/null differ
diff --git a/assets/images/speed_vs_mteb_score.png b/assets/images/speed_vs_mteb_score.png
deleted file mode 100644
index f9d9c01..0000000
Binary files a/assets/images/speed_vs_mteb_score.png and /dev/null differ
diff --git a/assets/images/speed_vs_mteb_score_v2.png b/assets/images/speed_vs_mteb_score_v2.png
deleted file mode 100644
index 4c4d11b..0000000
Binary files a/assets/images/speed_vs_mteb_score_v2.png and /dev/null differ
diff --git a/docs/README.md b/docs/README.md
index 7392176..fdd61a5 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -1,6 +1,8 @@
 # Documentation
 
-This directory contains the documentation for Model2Vec. The documentation is formatted in Markdown. The documentation is organized as follows:
-- [usage.md](https://github.com/MinishLab/model2vec/blob/main/docs/usage.md): This document provides a technical overview of how to use Model2Vec.
-- [integrations.md](https://github.com/MinishLab/model2vec/blob/main/docs/integrations.md): This document provides examples of how to use Model2Vec in various downstream libraries.
-- [what_is_model2vec.md](https://github.com/MinishLab/model2vec/blob/main/docs/what_is_model2vec.md): This document provides a high-level overview of how Model2Vec works.
+Model2Vec's extensive documentation can be found on [our documentation website](https://minish.ai/packages/model2vec/introduction). This includes in-depth documentation on:
+- [Inference](https://minish.ai/packages/model2vec/inference)
+- [Distillation](https://minish.ai/packages/model2vec/distillation)
+- [Training](https://minish.ai/packages/model2vec/training)
+- [Integrations](https://minish.ai/packages/model2vec/integrations)
+- [How Model2Vec works](https://minish.ai/packages/model2vec/introduction#how-mode2vec-works)
diff --git a/docs/integrations.md b/docs/integrations.md
deleted file mode 100644
index efaec87..0000000
--- a/docs/integrations.md
+++ /dev/null
@@ -1,155 +0,0 @@
-
-# Integrations
-
-Model2Vec can be used in a variety of downstream libraries. This document provides examples of how to use Model2Vec in some of these libraries.
-
-## Table of Contents
-- [Sentence Transformers](#sentence-transformers)
-- [LangChain](#langchain)
-- [Txtai](#txtai)
-- [Chonkie](#chonkie)
-- [Transformers.js](#transformersjs)
-
-## Sentence Transformers
-
-Model2Vec can be used directly in [Sentence Transformers](https://github.com/UKPLab/sentence-transformers):
-
-The following code snippet shows how to load a Model2Vec model into a Sentence Transformer model:
-```python
-from sentence_transformers import SentenceTransformer
-
-# Load a Model2Vec model from the Hub
-model = SentenceTransformer("minishlab/potion-base-8M")
-# Make embeddings
-embeddings = model.encode(["It's dangerous to go alone!", "It's a secret to everybody."])
-```
-
-The following code snippet shows how to distill a model directly into a Sentence Transformer model:
-
-```python
-from sentence_transformers import SentenceTransformer
-from sentence_transformers.models import StaticEmbedding
-
-static_embedding = StaticEmbedding.from_distillation("BAAI/bge-base-en-v1.5", device="cpu", pca_dims=256)
-model = SentenceTransformer(modules=[static_embedding])
-embeddings = model.encode(["It's dangerous to go alone!", "It's a secret to everybody."])
-```
-
-For more documentation, please refer to the [Sentence Transformers documentation](https://sbert.net/docs/package_reference/sentence_transformer/models.html#sentence_transformers.models.StaticEmbedding).
-
-
-## LangChain
-
-Model2Vec can be used in [LangChain](https://github.com/langchain-ai/langchain) using the `langchain-community` package. For more information, see the [LangChain Model2Vec docs](https://python.langchain.com/docs/integrations/text_embedding/model2vec/). The following code snippet shows how to use Model2Vec in LangChain after installing the `langchain-community` package with `pip install langchain-community`:
-
-```python
-from langchain_community.embeddings import Model2vecEmbeddings
-from langchain_community.vectorstores import FAISS
-from langchain.schema import Document
-
-# Initialize a Model2Vec embedder
-embedder = Model2vecEmbeddings("minishlab/potion-base-8M")
-
-# Create some example texts
-texts = [
-    "Enduring Stew",
-    "Hearty Elixir",
-    "Mighty Mushroom Risotto",
-    "Spicy Meat Skewer",
-    "Fruit Salad",
-]
-
-# Embed the texts
-embeddings = embedder.embed_documents(texts)
-
-# Or, create a vector store and query it
-documents = [Document(page_content=text) for text in texts]
-vector_store = FAISS.from_documents(documents, embedder)
-query = "Risotto"
-query_vector = embedder.embed_query(query)
-retrieved_docs = vector_store.similarity_search_by_vector(query_vector, k=1)
-```
-
-## Txtai
-
-Model2Vec can be used in [txtai](https://github.com/neuml/txtai) for text embeddings, nearest-neighbors search, and any of the other functionalities that txtai offers. The following code snippet shows how to use Model2Vec in txtai after installing the `txtai` package (including the `vectors` dependency) with `pip install txtai[vectors]`:
-
-```python
-from txtai import Embeddings
-
-# Load a model2vec model
-embeddings = Embeddings(path="minishlab/potion-base-8M", method="model2vec", backend="numpy")
-
-# Create some example texts
-texts = ["Enduring Stew", "Hearty Elixir", "Mighty Mushroom Risotto", "Spicy Meat Skewer", "Chilly Fruit Salad"]
-
-# Create embeddings for downstream tasks
-vectors = embeddings.batchtransform(texts)
-
-# Or create a nearest-neighbors index and search it
-embeddings.index(texts)
-result = embeddings.search("Risotto", 1)
-```
-
-## Chonkie
-
-Model2Vec is the default model for semantic chunking in [Chonkie](https://github.com/bhavnicksm/chonkie). To use Model2Vec for semantic chunking in Chonkie, simply install Chonkie with `pip install chonkie[semantic]` and use one of the `potion` models in the `SemanticChunker` class. The following code snippet shows how to use Model2Vec in Chonkie:
-
-```python
-from chonkie import SDPMChunker
-
-# Create some example text to chunk
-text = "It's dangerous to go alone! Take this."
-
-# Initialize the SemanticChunker with a potion model
-chunker = SDPMChunker(
-    embedding_model="minishlab/potion-base-8M",
-    similarity_threshold=0.3
-)
-
-# Chunk the text
-chunks = chunker.chunk(text)
-```
-
-## Transformers.js
-
-To use a Model2Vec model in [transformers.js](https://github.com/huggingface/transformers.js), the following code snippet can be used as a starting point:
-
-```javascript
-import { AutoModel, AutoTokenizer, Tensor } from '@huggingface/transformers';
-
-const modelName = 'minishlab/potion-base-8M';
-
-const modelConfig = {
-    config: { model_type: 'model2vec' },
-    dtype: 'fp32',
-    revision: 'refs/pr/1'
-};
-const tokenizerConfig = {
-    revision: 'refs/pr/2'
-};
-
-const model = await AutoModel.from_pretrained(modelName, modelConfig);
-const tokenizer = await AutoTokenizer.from_pretrained(modelName, tokenizerConfig);
-
-const texts = ['hello', 'hello world'];
-const { input_ids } = await tokenizer(texts, { add_special_tokens: false, return_tensor: false });
-
-const cumsum = arr => arr.reduce((acc, num, i) => [...acc, num + (acc[i - 1] || 0)], []);
-const offsets = [0, ...cumsum(input_ids.slice(0, -1).map(x => x.length))];
-
-const flattened_input_ids = input_ids.flat();
-const modelInputs = {
-    input_ids: new Tensor('int64', flattened_input_ids, [flattened_input_ids.length]),
-    offsets: new Tensor('int64', offsets, [offsets.length])
-};
-
-const { embeddings } = await model(modelInputs);
-console.log(embeddings.tolist()); // output matches python version
-```
-
-Note that this requires that the Model2Vec has a `model.onnx` file and several required tokenizers file. To generate these for a model that does not have them yet, the following code snippet can be used:
-
-```bash
-python scripts/export_to_onnx.py --model_path <path-to-a-model2vec-model> --save_path "<path-to-save-the-onnx-model>"
-```
diff --git a/docs/usage.md b/docs/usage.md
deleted file mode 100644
index 931987b..0000000
--- a/docs/usage.md
+++ /dev/null
@@ -1,250 +0,0 @@
-
-# Usage
-
-This document provides an overview of how to use Model2Vec for inference, distillation, training, and evaluation.
-
-## Table of Contents
-- [Inference](#inference)
-  - [Inference with a pretrained model](#inference-with-a-pretrained-model)
-  - [Inference with the Sentence Transformers library](#inference-with-the-sentence-transformers-library)
-- [Distillation](#distillation)
-    - [Distilling from a Sentence Transformer](#distilling-from-a-sentence-transformer)
-    - [Distilling from a loaded model](#distilling-from-a-loaded-model)
-    - [Distilling with the Sentence Transformers library](#distilling-with-the-sentence-transformers-library)
-    - [Distilling with a custom vocabulary](#distilling-with-a-custom-vocabulary)
-- [Training](#training)
-    - [Training a classifier](#training-a-classifier)
-- [Evaluation](#evaluation)
-    - [Installation](#installation)
-    - [Evaluation Code](#evaluation-code)
-
-## Inference
-
-### Inference with a pretrained model
-
-Inference works as follows. The example shows one of our own models, but you can also just load a local one, or another one from the hub.
-```python
-from model2vec import StaticModel
-
-# Load a model from the Hub. You can optionally pass a token when loading a private model
-model = StaticModel.from_pretrained(model_name="minishlab/potion-base-8M", token=None)
-
-# Make embeddings
-embeddings = model.encode(["It's dangerous to go alone!", "It's a secret to everybody."])
-
-# Make sequences of token embeddings
-token_embeddings = model.encode_as_sequence(["It's dangerous to go alone!", "It's a secret to everybody."])
-```
-
-### Inference with the Sentence Transformers library
-
-The following code snippet shows how to use a Model2Vec model in the [Sentence Transformers](https://github.com/UKPLab/sentence-transformers) library. This is useful if you want to use the model in a Sentence Transformers pipeline.
-
-```python
-from sentence_transformers import SentenceTransformer
-
-# Load a Model2Vec model from the Hub
-model = SentenceTransformer("minishlab/potion-base-8M")
-
-# Make embeddings
-embeddings = model.encode(["It's dangerous to go alone!", "It's a secret to everybody."])
-```
-
-## Distillation
-
-### Distilling from a Sentence Transformer
-
-The following code can be used to distill a model from a Sentence Transformer. As mentioned above, this leads to really small model that might be less performant.
-```python
-from model2vec.distill import distill
-
-# Distill a Sentence Transformer model
-m2v_model = distill(model_name="BAAI/bge-base-en-v1.5", pca_dims=256)
-
-# Save the model
-m2v_model.save_pretrained("m2v_model")
-
-```
-
-### Distilling from a loaded model
-
-If you already have a model loaded, or need to load a model in some special way, we also offer an interface to distill models in memory.
-
-```python
-from transformers import AutoModel, AutoTokenizer
-
-from model2vec.distill import distill_from_model
-
-# Assuming a loaded model and tokenizer
-model_name = "baai/bge-base-en-v1.5"
-model = AutoModel.from_pretrained(model_name)
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-
-m2v_model = distill_from_model(model=model, tokenizer=tokenizer, pca_dims=256)
-
-m2v_model.save_pretrained("m2v_model")
-
-```
-
-### Distilling with the Sentence Transformers library
-
-The following code snippet shows how to distill a model using the [Sentence Transformers](https://github.com/UKPLab/sentence-transformers) library. This is useful if you want to use the model in a Sentence Transformers pipeline.
-
-```python
-from sentence_transformers import SentenceTransformer
-from sentence_transformers.models import StaticEmbedding
-
-static_embedding = StaticEmbedding.from_distillation("BAAI/bge-base-en-v1.5", device="cpu", pca_dims=256)
-model = SentenceTransformer(modules=[static_embedding])
-embeddings = model.encode(["It's dangerous to go alone!", "It's a secret to everybody."])
-```
-
-### Distilling with a custom vocabulary
-
-If you pass a vocabulary, you get a set of static word embeddings, together with a custom tokenizer for exactly that vocabulary. This is comparable to how you would use GLoVe or traditional word2vec, but doesn't actually require a corpus or data.
-```python
-from model2vec.distill import distill
-
-# Load a vocabulary as a list of strings
-vocabulary = ["word1", "word2", "word3"]
-
-# Distill a Sentence Transformer model with the custom vocabulary
-m2v_model = distill(model_name="BAAI/bge-base-en-v1.5", vocabulary=vocabulary)
-
-# Save the model
-m2v_model.save_pretrained("m2v_model")
-
-# Or push it to the hub
-m2v_model.push_to_hub("my_organization/my_model", token="<it's a secret to everybody>")
-```
-
-By default, this will distill a model with a subword tokenizer, combining the models (subword) vocab with the new vocabulary. If you want to get a word-level tokenizer instead (with only the passed vocabulary), the `use_subword` parameter can be set to `False`, e.g.:
-
-```python
-m2v_model = distill(model_name=model_name, vocabulary=vocabulary, use_subword=False)
-```
-
-**Important note:** we assume the passed vocabulary is sorted in rank frequency. i.e., we don't care about the actual word frequencies, but do assume that the most frequent word is first, and the least frequent word is last. If you're not sure whether this is case, set `apply_zipf` to `False`. This disables the weighting, but will also make performance a little bit worse.
-
-### Quantization
-
-Models can be quantized to `float16` (default) or `int8` during distillation, or when loading from disk.
-
-```python
-from model2vec.distill import distill
-
-# Distill a Sentence Transformer model and quantize is to int8
-m2v_model = distill(model_name="BAAI/bge-base-en-v1.5", quantize_to="int8")
-
-# Save the model. This model is now 25% of the size of a normal model.
-m2v_model.save_pretrained("m2v_model")
-```
-
-You can also quantize during loading.
-
-```python
-from model2vec import StaticModel
-
-model = StaticModel.from_pretrained("minishlab/potion-base-8m", quantize_to="int8")
-```
-
-### Dimensionality reduction
-
-Because almost all Model2Vec models have been distilled using PCA, and because PCA explicitly orders dimensions from most informative to least informative, we can perform dimensionality reduction during loading. This is very similar to how matryoshka embeddings work.
-
-```python
-from model2vec import StaticModel
-
-model = StaticModel.from_pretrained("minishlab/potion-base-8m", dimensionality=32)
-
-print(model.embedding.shape)
-# (29528, 32)
-```
-
-### Combining quantization and dimensionality reduction
-
-Combining these tricks can lead to extremely small models. For example, using this, we can reduce the size of `potion-base-8m`, which is now 30MB, to only 1MB:
-
-```python
-model = StaticModel.from_pretrained("minishlab/potion-base-8m",
-                                    dimensionality=32,
-                                    quantize_to="int8")
-print(model.embedding.nbytes)
-# 944896 bytes = 944kb
-```
-
-This should be enough to satisfy even the strongest hardware constraints.
-
-## Training
-
-### Training a classifier
-
-Model2Vec can be used to train a classifier on top of a distilled model. The following code snippet shows how to train a classifier on top of a distilled model. For more advanced usage, as well as results, please refer to the [training documentation](https://github.com/MinishLab/model2vec/blob/main/model2vec/train/README.md).
-
-```python
-import numpy as np
-from datasets import load_dataset
-from model2vec.train import StaticModelForClassification
-
-# Initialize a classifier from a pre-trained model
-classifer = StaticModelForClassification.from_pretrained("minishlab/potion-base-8M")
-
-# Load a dataset
-ds = load_dataset("setfit/subj")
-train = ds["train"]
-test = ds["test"]
-
-X_train, y_train = train["text"], train["label"]
-X_test, y_test = test["text"], test["label"]
-
-# Train the classifier
-classifier.fit(X_train, y_train)
-
-# Evaluate the classifier
-y_hat = classifier.predict(X_test)
-accuracy = np.mean(np.array(y_hat) == np.array(y_test)) * 100
-```
-
-## Evaluation
-
-### Installation
-
-Our models can be evaluated using our [evaluation package](https://github.com/MinishLab/evaluation). Install the evaluation package with:
-
-```bash
-pip install git+https://github.com/MinishLab/evaluation.git@main
-```
-
-### Evaluation Code
-
-The following code snippet shows how to evaluate a Model2Vec model:
-```python
-from model2vec import StaticModel
-
-from evaluation import CustomMTEB, get_tasks, parse_mteb_results, make_leaderboard, summarize_results
-from mteb import ModelMeta
-
-# Get all available tasks
-tasks = get_tasks()
-# Define the CustomMTEB object with the specified tasks
-evaluation = CustomMTEB(tasks=tasks)
-
-# Load the model
-model_name = "m2v_model"
-model = StaticModel.from_pretrained(model_name)
-
-# Optionally, add model metadata in MTEB format
-model.mteb_model_meta = ModelMeta(
-            name=model_name, revision="no_revision_available", release_date=None, languages=None
-        )
-
-# Run the evaluation
-results = evaluation.run(model, eval_splits=["test"], output_folder=f"results")
-
-# Parse the results and summarize them
-parsed_results = parse_mteb_results(mteb_results=results, model_name=model_name)
-task_scores = summarize_results(parsed_results)
-
-# Print the results in a leaderboard format
-print(make_leaderboard(task_scores))
-```
diff --git a/docs/what_is_model2vec.md b/docs/what_is_model2vec.md
deleted file mode 100644
index 3413fd3..0000000
--- a/docs/what_is_model2vec.md
+++ /dev/null
@@ -1,11 +0,0 @@
-# What is Model2Vec?
-
-This document provides a high-level overview of how Model2Vec works.
-
-The base model2vec technique works by passing a vocabulary through a sentence transformer model, then reducing the dimensionality of the resulting embeddings using PCA, and finally weighting the embeddings using SIF weighting (previously zipf weighting). During inference, we simply take the mean of all token embeddings occurring in a sentence.
-
-Our [potion models](https://huggingface.co/collections/minishlab/potion-6721e0abd4ea41881417f062) are pre-trained using [tokenlearn](https://github.com/MinishLab/tokenlearn), a technique to pre-train model2vec distillation models. These models are created with the following steps:
-- **Distillation**: We distill a Model2Vec model from a Sentence Transformer model, using the method described above.
-- **Sentence Transformer inference**: We use the Sentence Transformer model to create mean embeddings for a large number of texts from a corpus.
-- **Training**: We train a model to minimize the cosine distance between the mean embeddings generated by the Sentence Transformer model and the mean embeddings generated by the Model2Vec model.
-- **Post-training re-regularization**: We re-regularize the trained embeddings by first performing PCA, and then weighting the embeddings using `smooth inverse frequency (SIF)` weighting using the following formula: `w = 1e-3 / (1e-3 + proba)`. Here, `proba` is the probability of the token in the corpus we used for training.
diff --git a/model2vec/distill/distillation.py b/model2vec/distill/distillation.py
index 2f514ae..4320272 100644
--- a/model2vec/distill/distillation.py
+++ b/model2vec/distill/distillation.py
@@ -31,7 +31,7 @@ def distill_from_model(
     token_remove_pattern: str | None = r"\[unused\d+\]",
     quantize_to: DType | str = DType.Float16,
     vocabulary_quantization: int | None = None,
-    pooling: PoolingMode = PoolingMode.MEAN,
+    pooling: PoolingMode | str = PoolingMode.MEAN,
 ) -> StaticModel:
     """
     Distill a staticmodel from a sentence transformer.
@@ -209,7 +209,7 @@ def distill(
     trust_remote_code: bool = False,
     quantize_to: DType | str = DType.Float16,
     vocabulary_quantization: int | None = None,
-    pooling: PoolingMode = PoolingMode.MEAN,
+    pooling: PoolingMode | str = PoolingMode.MEAN,
 ) -> StaticModel:
     """
     Distill a staticmodel from a sentence transformer.
diff --git a/model2vec/distill/inference.py b/model2vec/distill/inference.py
index e9f9d15..0738035 100644
--- a/model2vec/distill/inference.py
+++ b/model2vec/distill/inference.py
@@ -47,7 +47,7 @@ def create_embeddings(
     tokenized: list[list[int]],
     device: str,
     pad_token_id: int,
-    pooling: PoolingMode = PoolingMode.MEAN,
+    pooling: PoolingMode | str = PoolingMode.MEAN,
 ) -> np.ndarray:
     """
     Create output embeddings for a bunch of tokens using a pretrained model.