Skip to content

vcl-iisc/O3SLM_code

Repository files navigation

O3SLM

A lightweight sketch-language model built on top of the LLaVA codebase.

Environment Setup

Create and activate the conda environment:

conda create -n o3slm python=3.10 -y
conda activate o3slm
pip install --upgrade pip  # Enable PEP 660 support
pip install -e .

Install training dependencies:

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Checkpoints

Download the pretrained O3SLM model checkpoints from: <link>

Place the downloaded checkpoints in a directory accessible to your training/evaluation scripts.

Data Preparation

Train

LLaVA Checkpoints

LLaVA: https://huggingface.co/liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5 MM_Projector: 13b: https://huggingface.co/liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-13b-v1.5 7b: https://huggingface.co/liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5

Training Data

Place your conversation JSON files for training in the data_jsons/ directory.

Dataset Structure

Organize your datasets under a single data_root directory with the following structure:

data_root/
├── pretrain_data/
│   ├── images/
│   │   ├── O365/
│   │   └── OI/
│   └── sketches/
│       ├── SketchVCL-OI/
│       │   ├── 1/
│       │   ├── ...
│       │   └── 601/
│       └── SketchVCL-O365/
│           ├── 0/
│           ├── ...
│           └── 364/
├── finetune_data/
│   ├── images/
│   │   ├── coco/
│   │   ├── pixmo_count/
│   │   └── sketchy/
│   └── sketches/
│       └── SketchMIX/
│           ├── 0/
│           ├── ...
│           └── 364/
└── eval_data/
    ├── images/
    │   ├── coco/
    │   ├── pixmo_count/
    │   └── sketchy/
    └── sketches/
        ├── SketchVCL-C/
        ├── QuickDraw/
        ├── Sketchy/
        └── TU_Berlin/

Ensure your training and evaluation scripts point to the correct data_root path and that the machine has read access to these directories.

Training

Prerequisites

  1. Download pretrained model checkpoints (see Checkpoints section)
  2. Prepare your data (see Data Preparation section)
  3. Ensure conversation JSONs are in data_jsons/

Running Training

conda activate o3slm
# Add your training command here
# Example: python train.py --config configs/train.yaml

Evaluation

Evaluation is performed using Evaluation/run_eval.sh. The script supports both local execution and Slurm cluster submission.

Configuration

Before running evaluation, configure the following placeholders in run_eval.sh:

Slurm Parameters:

  • <JOB_NAME>: Job name for Slurm
  • <SLURM_OUTPUT_PATH>: Output log path (e.g., run_output/qd_Det.out)
  • <SLURM_PARTITION>: Cluster partition (e.g., ada)
  • <NTASKS>: Number of tasks (e.g., 1)
  • <CPUS_PER_TASK>: CPUs per task (e.g., 8)
  • <MEMORY>: Memory allocation (e.g., 32G)
  • <GRES>: GPU resources (e.g., gpu:1)
  • <TIME_LIMIT>: Time limit (e.g., 24:00:00)

Evaluation Parameters:

  • <ENV_NAME>: Conda environment name (e.g., o3slm)
  • <WORKDIR>: Project working directory
  • <RUN_NAME>: Experiment run name (e.g., Molmo_qd_detect)
  • <SKETCH_PATH>: Path to sketch dataset
  • <DATASET_PATH>: Path to image dataset
  • <MODEL_NAME>: Model to evaluate
  • <DATASET_NAME>: Dataset identifier
  • <TASK1>, <TASK2>: Tasks to run (typically count and detection)

Supported Options

Models:

  • Molmo
  • LLaVA
  • Onevision
  • Pixtral
  • Qwen
  • O3SLM
  • GPT
  • Gemini

Datasets:

  • qd (QuickDraw)
  • sketchy (Sketchy)
  • tub (TU-Berlin)
  • coco (COCO)

Tasks:

  • detection
  • count

Sketch Paths:

  • eval_data/sketches/Sketchy/tx_000100000000/
  • eval_data/sketches/QuickDraw/
  • eval_data/sketches/TU_Berlin/
  • eval_data/sketches/coco_sketches/

Local Execution

Run evaluation locally without Slurm:

conda activate o3slm
cd Evaluation

python count.py \
  --name Molmo_qd_detect \
  --sketch_path /path/to/eval_data/sketches/QuickDraw/ \
  --dataset /path/to/eval_data/images/pixmo_count \
  --model Molmo \
  --dataset_name qd \
  --task count

python detections.py \
  --name Molmo_qd_detect \
  --sketch_path /path/to/eval_data/sketches/QuickDraw/ \
  --dataset /path/to/eval_data/images/pixmo_count \
  --model Molmo \
  --dataset_name qd \
  --task detection

Slurm Submission

To submit the evaluation job to a Slurm cluster:

sbatch Evaluation/run_eval.sh

Citation

@inproceedings{O3SLM2025,
  title={O3SLM: Sketch-Language Modeling},
  author={...},
  booktitle={...},
  year={2025}
}

Acknowledgements

  • Built on the LLaVA codebase
  • Additional acknowledgements to be added from the final paper

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published