Skip to content

How to train model using custom data? #36

@charlescwwang

Description

@charlescwwang

Issue Description
I tried to train model using my data with 12 labels. (coco dataset format)
When I try to train the model, the following error occurs.

Additional Context
This is my command

python yolo/lazy.py task=train task.epoch=10 task.data.batch_size=8 model=v9-m dataset=data device=cuda name=test-2

This is log

[06/28 18:19:05]   INFO  | 📄 Created log folder: runs/train/test-2
[06/28 18:19:05]   INFO  | 📦 Loaded train cache
[06/28 18:19:05]   INFO  | 🚜 Building YOLO
[06/28 18:19:05]   INFO  |   🏗️  Building backbone
[06/28 18:19:05]   INFO  |   🏗️  Building neck
[06/28 18:19:05]   INFO  |   🏗️  Building head
[06/28 18:19:05]   INFO  |   🏗️  Building detection
[06/28 18:19:05]   INFO  |   🏗️  Building auxiliary
[06/28 18:19:05]   INFO  | ✅ Success load model & weight
[06/28 18:19:06]   INFO  | 🧸 Found no stride of model, performed a dummy test for auto-anchor size
[06/28 18:19:08]   INFO  | ✅ Success load loss function
                             Model Layers                             
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ Index ┃     Layer Type     ┃ Tags ┃    Params ┃ Channels (IN->OUT) ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│   1   │        Conv        │      │       928 │       3 ->   32    │
│   2   │        Conv        │      │    18,560 │      32 ->   64    │
│   3   │    RepNCSPELAN     │      │   171,648 │      64 ->  128    │
│   4   │       AConv        │      │   276,960 │     128 ->  240    │
│   5   │    RepNCSPELAN     │  B3  │   629,520 │     240 ->  240    │
│   6   │       AConv        │      │   778,320 │     240 ->  360    │
│   7   │    RepNCSPELAN     │  B4  │ 1,414,080 │     360 ->  360    │
│   8   │       AConv        │      │ 1,556,160 │     360 ->  480    │
│   9   │    RepNCSPELAN     │  B5  │ 2,511,840 │     480 ->  480    │
│  10   │      SPPELAN       │  N3  │   577,440 │     480 ->  480    │
│  11   │      UpSample      │      │         0 │         -          │
│  12   │       Concat       │      │         0 │         -          │
│  13   │    RepNCSPELAN     │  N4  │ 1,586,880 │     840 ->  360    │
│  14   │      UpSample      │      │         0 │         -          │
│  15   │       Concat       │      │         0 │         -          │
│  16   │    RepNCSPELAN     │  P3  │   715,920 │     600 ->  240    │
│  17   │       AConv        │      │   397,808 │     240 ->  184    │
│  18   │       Concat       │      │         0 │         -          │
│  19   │    RepNCSPELAN     │  P4  │ 1,480,320 │     544 ->  360    │
│  20   │       AConv        │      │   778,080 │     360 ->  240    │
│  21   │       Concat       │      │         0 │         -          │
│  22   │    RepNCSPELAN     │  P5  │ 2,627,040 │     720 ->  480    │
│  23   │ MultiheadDetection │ Main │ 4,602,528 │       M -> 1080    │
│  24   │      CBLinear      │  R3  │    57,840 │     240 ->    M    │
│  25   │      CBLinear      │  R4  │   216,600 │     360 ->    M    │
│  26   │      CBLinear      │  R5  │   519,480 │     480 ->    M    │
│  27   │        Conv        │      │       928 │       3 ->   32    │
│  28   │        Conv        │      │    18,560 │      32 ->   64    │
│  29   │    RepNCSPELAN     │      │   171,648 │      64 ->  128    │
│  30   │       AConv        │      │   276,960 │     128 ->  240    │
│  31   │       CBFuse       │      │         0 │         -          │
│  32   │    RepNCSPELAN     │  A3  │   629,520 │     240 ->  240    │
│  33   │       AConv        │      │   778,320 │     240 ->  360    │
│  34   │       CBFuse       │      │         0 │         -          │
│  35   │    RepNCSPELAN     │  A4  │ 1,414,080 │     360 ->  360    │
│  36   │       AConv        │      │ 1,556,160 │     360 ->  480    │
│  37   │       CBFuse       │      │         0 │         -          │
│  38   │    RepNCSPELAN     │  A5  │ 2,511,840 │     480 ->  480    │
│  39   │ MultiheadDetection │ AUX  │ 4,602,528 │       M -> 1080    │
└───────┴────────────────────┴──────┴───────────┴────────────────────┘
[06/28 18:19:08] WARNING | ⚠️ Could not find graphviz backend, continue without drawing the model architecture
[06/28 18:19:08]   INFO  | 📦 Loaded validation cache
[06/28 18:19:08]   INFO  | 🚄 Start Training!
/home/localadmin/anaconda3/envs/yolo-MIT/lib/python3.9/site-packages/torch/optim/lr_scheduler.py:143: UserWarning: Detected call of 
`lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before 
`lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at 
https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
⠧ Validate |  mAP.5  |mAP.5:.95| ━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/10 0:01:02
⠧ Run pycocotools                ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/1  -:--:--
💾 success save at runs/train/test-2/weights/E000.pt/home/localadmin/anaconda3/envs/yolo-MIT/lib/python3.9/site-packages/torch/optim/lr_scheduler.py:156: UserWarning: The epoch parameter in 
`scheduler.step()` was not necessary and is being deprecated where possible. Please use `scheduler.step()` to step the scheduler. During the 
deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you 
are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
  warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
⠸ Validate |  mAP.5  |mAP.5:.95| ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2/10 0:01:05
⠸ Run pycocotools                ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/1  -:--:--
┏━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Epoch ┃ Avg. Precision ┃       ┃ Avg. Recall    ┃       ┃
💾 success save at runs/train/test-2/weights/E001.pt
⠙ Validate |  mAP.5  |mAP.5:.95| ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3/10 0:01:00
⠙ Run pycocotools                ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/1  -:--:--
┏━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Epoch ┃ Avg. Precision ┃       ┃ Avg. Recall    ┃       ┃
Error executing job with overrides: ['task=train', 'task.epoch=10', 'task.data.batch_size=8', 'model=v9-m', 'dataset=data', 'device=cuda', 
'name=test-2']
Traceback (most recent call last):  File "/home/localadmin/YOLO/yolo/lazy.py", line 42, in <module>
    main()  File "/home/localadmin/anaconda3/envs/yolo-MIT/lib/python3.9/site-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/home/localadmin/anaconda3/envs/yolo-MIT/lib/python3.9/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/home/localadmin/anaconda3/envs/yolo-MIT/lib/python3.9/site-packages/hydra/_internal/utils.py", line 457, in _run_app    run_and_report(
  File "/home/localadmin/anaconda3/envs/yolo-MIT/lib/python3.9/site-packages/hydra/_internal/utils.py", line 223, in run_and_report    raise ex
  File "/home/localadmin/anaconda3/envs/yolo-MIT/lib/python3.9/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
  File "/home/localadmin/anaconda3/envs/yolo-MIT/lib/python3.9/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
  File "/home/localadmin/anaconda3/envs/yolo-MIT/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/home/localadmin/anaconda3/envs/yolo-MIT/lib/python3.9/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/home/localadmin/anaconda3/envs/yolo-MIT/lib/python3.9/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "/home/localadmin/YOLO/yolo/lazy.py", line 38, in main
    solver.solve(dataloader)
  File "/home/localadmin/YOLO/yolo/tools/solver.py", line 145, in solve
    mAPs = self.validator.solve(self.validation_dataloader, epoch_idx=epoch_idx)
  File "/home/localadmin/YOLO/yolo/tools/solver.py", line 256, in solve
    result = calculate_ap(self.coco_gt, predict_json)
  File "/home/localadmin/YOLO/yolo/utils/solver_utils.py", line 12, in calculate_ap
    coco_dt = coco_gt.loadRes(pd_path)
  File "/home/localadmin/anaconda3/envs/yolo-MIT/lib/python3.9/site-packages/pycocotools/coco.py", line 332, in loadRes
    assert set(annsImgIds) == (set(annsImgIds) & set(self.getImgIds())), \
AssertionError: Results do not correspond to current coco set
⠙ Validate |  mAP.5  |mAP.5:.95| ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3/10 0:01:00
⠙ Run pycocotools                ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/1  -:--:--
┏━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Epoch ┃ Avg. Precision ┃       ┃ Avg. Recall    ┃       ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━┩
│    0  │ AP @ .5:.95    │  0.00 │ AP @        .5 │  0.00 │
│       │                │       │                │       │
│    1  │ AP @ .5:.95    │  0.00 │ AR maxDets   1 │  0.00 │
│    1  │ AP @     .5    │  0.00 │ AR maxDets  10 │  0.00 │
│    1  │ AP @    .75    │  0.00 │ AR maxDets 100 │  0.00 │
│    1  │ AP  (small)    │  0.00 │ AR     (small) │  0.00 │
│    1  │ AP (medium)    │  0.00 │ AR    (medium) │  0.00 │
│    1  │ AP  (large)    │  0.00 │ AR     (large) │  0.00 │
└───────┴────────────────┴───────┴────────────────┴───────┘

Future Considerations
Please suggest any potential future improvements related to this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions