[FIX] Fixes image_id calculation when using COCO dataset and images contain non-int convertible file names. #79
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
When using COCO dataset (.json files) the
image_idis incorrectly derived from the image file name in multiple parts of the code base. It is assumed that once image file names are converted toint, they will be equal to theimage_idstored in the .json files. This is incorrect. As a result, in practical datasets where image names contain non-int characters, thecalculate_apfunction fails. That is because the derived stringimage_idand the actual intimage_idas defined in the .json files don't match anymore. This PR addresses this issue by making the following necessary changes.Closes: #67, #36
Changes:
image_id(as defined in COCO .json file) throughout code base.YoloDataset.__getitem__()now returnsimage_id, instead ofimage_path. Context: Only the actualintid is needed when using the COCO formatted data set and a call tocalculate_apis made.YoloDataset.datanow containsintimage_idalong with image path and labels data. Previouslyself.datadidn't includeimage_idinformation..txtdatasets,YoloDataset.datacontains stringimage_id.image_idin this case is image file name without extension as before.dataset_utils.create_image_metadata()now returns three dicts instead of two. Returns three dictionaries mapping image id to list of annotations, image id to image information, and image name to image id. Image id is theintidassigned to an image in the COCO formatted .json file. Context: This enablesfilter_datato handleimage_idof COCO datasets accurately.Type of Change
Checklist:
Licensing:
By submitting this pull request, I confirm that:
Additional Information
@henrytsui000 please help me with tests. In the
tests/data, by only changing one of the image file names to include a "_" should cover test for the MR. E.g.000000151480.jpg->000000151480_.jpgin both image file name and theinstances_val.json.I intend to work on this repo quite frequently. Would really appreciate your help for quick collaboration.
cc: @WongKinYiu