HomeVision

Real estate listing photo analysis. Upload photos of a property, get back room classifications, detected amenities and features, and photo quality scores — all powered by OpenCLIP zero-shot inference.

What It Does

Room classification — Kitchen, Bathroom, Bedroom, Living Room, Dining Room, Exterior (with confidence scores)
Amenity detection — Stainless steel appliances, fireplace, pool, scenic view, natural light, updated kitchen, and more
Feature detection — 50+ features across categories like flooring, countertops, fixtures, and outdoor elements
Photo quality scoring — Sharpness (blur detection), brightness, resolution, and an overall quality score per photo
Batch processing — Upload 1–20 photos at once; results come back per-photo with an aggregated summary
Optional fine-tuned adapter — Train a linear adapter on labeled data for domain-specific predictions

Tech Stack

Layer	Stack
Frontend	Next.js 14, React 18, TypeScript
Backend	NestJS 10, TypeORM, Postgres 16
Inference	FastAPI, PyTorch, OpenCLIP (ViT-B-32), OpenCV
Infra	Docker Compose, multi-stage Dockerfiles

Project Structure

HomeVision/
├── apps/
│   ├── api/              # NestJS backend
│   │   └── src/
│   │       ├── jobs/     # Job creation, polling, inference orchestration
│   │       └── labels/   # CRUD for training labels
│   ├── inference/        # FastAPI + OpenCLIP inference
│   │   ├── main.py       # /analyze/batch endpoint
│   │   ├── train_adapter.py
│   │   └── evaluate.py
│   └── web/              # Next.js frontend
│       └── src/
│           ├── app/      # Pages (home, labeling)
│           └── components/
├── docker-compose.yml          # Base services
├── docker-compose.dev.yml      # Dev overrides (hot-reload, volume mounts)
└── docker-compose.adapter.yml  # Optional: mount trained adapter files

Architecture

Web — Upload UI, async job polling, photo grid with results, click-to-expand detail modal
API — Accepts uploads, creates async jobs in Postgres, forwards images to the inference service, stores results as JSON
Inference — Loads OpenCLIP ViT-B-32 at startup, caches text embeddings for all prompts, runs zero-shot classification and quality analysis per image

Postgres stores job status and result JSON. Images are processed in-memory only — nothing is written to disk or cloud storage.

Getting Started

Prerequisites

Docker Desktop (includes Docker Compose)
Ports 3000, 3001, 5432, and 8000 available

Run

git clone https://github.com/LinHAO-1/HomeVision.git && cd HomeVision
docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build

That starts all four services: Postgres, inference, API, and web — with hot-reload enabled for development. No additional setup or dependencies required.

Open http://localhost:3000 in your browser.

Compose files

The project uses layered Docker Compose files:

File	Purpose
`docker-compose.yml`	Base services (Postgres, inference, API, web)
`docker-compose.dev.yml`	Dev overrides — hot-reload, source volume mounts
`docker-compose.adapter.yml`	Optional — mounts trained adapter files for enhanced predictions

For development, use the first two. Add the third after training an adapter (see Labeling & Training).

Environment

Optionally create a .env file in the project root to use an external Postgres instance:

DATABASE_URL=postgresql://user:pass@host:5432/dbname

If omitted, the stack uses the local Postgres container with default credentials.

Stop

docker compose -f docker-compose.yml -f docker-compose.dev.yml down

Add -v to also remove the Postgres data volume.

Usage

Open http://localhost:3000
Click Choose files and select 1–20 property photos (JPEG, PNG, GIF, or WebP — max 5 MB each)
Click Analyze
Results appear once processing completes — usually a few seconds per photo

The results page shows:

Summary — overall quality score and most common amenities across the batch
Photo grid — each photo card displays room type, amenities, detected features, and quality indicators
Detail modal — click any photo for a larger view with full breakdown

API Reference

Swagger docs are available at http://localhost:3001/api/docs when the API is running.

Create a job

curl -X POST http://localhost:3001/api/v1/jobs \
  -F "files=@kitchen.jpg" \
  -F "files=@bedroom.jpg"

Response:

{ "jobId": "c0a80164-...", "status": "processing" }

Poll for results

curl http://localhost:3001/api/v1/jobs/<jobId>

Response (when completed):

{
  "id": "c0a80164-...",
  "status": "completed",
  "results": {
    "summary": {
      "topAmenities": [
        { "label": "Natural Light", "count": 3, "avgScore": 0.35 }
      ],
      "topFeatures": [
        { "label": "Hardwood Floors", "category": "Flooring", "count": 2, "avgScore": 0.31 }
      ],
      "overallQualityScore": 0.85
    },
    "photos": [
      {
        "filename": "kitchen.jpg",
        "roomType": { "label": "Kitchen", "score": 0.33, "topPrompt": "a photo of a kitchen" },
        "amenities": [
          { "label": "Stainless Steel Appliances", "score": 0.41, "prompt": "stainless steel appliances" }
        ],
        "features": [
          { "label": "Granite Countertops", "score": 0.29, "category": "Kitchen", "prompt": "..." }
        ],
        "quality": {
          "blurVar": 120.5,
          "brightness": 150.0,
          "width": 1920,
          "height": 1080,
          "isBlurry": false,
          "isDark": false,
          "overallScore": 0.90
        }
      }
    ]
  },
  "errorMessage": null
}

Inference directly

curl -X POST http://localhost:8000/analyze/batch \
  -F "files=@photo.jpg"

Returns the same per-photo array without the job wrapper.

Labeling & Training

HomeVision includes a labeling tool and adapter training pipeline for improving predictions on your own data. The adapter is completely optional. Without it, the app uses zero-shot OpenCLIP inference out of the box.

The full workflow has four steps, done in order:

1. Label photos

Open http://localhost:3000/label (labeling is already enabled in the dev compose). Upload a photo and the model shows its predictions. Correct the room type, amenities, and features as needed, then save. Each saved label is stored in your database.

2. Put training images on disk

The training script needs access to the actual image files on disk. Create an images/ folder at the repo root and place the same photos you labeled there (e.g. images/kitchen/photo1.jpg). The filenames must match what you used when labeling.

The dev compose mounts ./images to /data/images inside the inference container, so anything you put in images/ is available to the training script.

3. Export labels and train

First, export your labels from the database into a JSON file that the training script can read:

curl -s http://localhost:3001/api/v1/labels/export -o apps/inference/labels.json

This pulls every label you saved and writes it to apps/inference/labels.json.

Then run training inside the inference container. No local Python setup needed:

docker compose -f docker-compose.yml -f docker-compose.dev.yml exec inference \
  python train_adapter.py --labels labels.json --images-dir /data/images/kitchen \
  --output-weights adapter.pt --output-meta adapter_meta.json --epochs 60

This trains a lightweight linear adapter on frozen CLIP embeddings. The output files (adapter.pt and adapter_meta.json) land in apps/inference/ on your host since the dev compose mounts that directory to /app in the container.

4. Load the adapter

Restart the stack with the adapter override so inference picks up the trained weights:

docker compose -f docker-compose.yml -f docker-compose.dev.yml -f docker-compose.adapter.yml up --build

The adapter override mounts apps/inference/adapter.pt and apps/inference/adapter_meta.json into the inference container. No copying files around. Training output and inference input are the same location.

You can verify the adapter loaded by hitting the health endpoint:

curl http://localhost:8000/health
# {"status":"ok","adapter_loaded":true}

Evaluate (optional)

After training, you can check how well the model performs against your labeled data:

docker compose -f docker-compose.yml -f docker-compose.dev.yml exec inference \
  python evaluate.py --labels labels.json --images-dir /data/images/kitchen

This reports per label accuracy and shows which images the model struggles with the most. Useful for deciding whether to add more labels or adjust your training data before retraining.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
apps		apps
.gitignore		.gitignore
README.md		README.md
docker-compose.adapter.yml		docker-compose.adapter.yml
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HomeVision

What It Does

Tech Stack

Project Structure

Architecture

Getting Started

Prerequisites

Run

Compose files

Environment

Stop

Usage

API Reference

Create a job

Poll for results

Inference directly

Labeling & Training

1. Label photos

2. Put training images on disk

3. Export labels and train

4. Load the adapter

Evaluate (optional)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

LinHAO-1/HomeVision

Folders and files

Latest commit

History

Repository files navigation

HomeVision

What It Does

Tech Stack

Project Structure

Architecture

Getting Started

Prerequisites

Run

Compose files

Environment

Stop

Usage

API Reference

Create a job

Poll for results

Inference directly

Labeling & Training

1. Label photos

2. Put training images on disk

3. Export labels and train

4. Load the adapter

Evaluate (optional)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages