Describe the bug
When using the MVTecAD dataset I simply load it up, use the prepare_data and setup and when I call the train_dataloader on top it seems like the data is empty. This is due to a bug related to the interaction between the Split Enum and Pandas.
Error statement
Zero subset length encountered during splitting. This means one of your subsets
might be empty or devoid of either normal or anomalous images.
Traceback (most recent call last):
File "/Users/amirgheser/SSA-IAD/bug.py", line 9, in <module>
for i, data in enumerate(datamodule.train_dataloader()):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/amirgheser/SSA-IAD/.venv/lib/python3.12/site-packages/anomalib/data/datamodules/base/image.py", line 373, in train_dataloader
return DataLoader(
^^^^^^^^^^^
File "/Users/amirgheser/SSA-IAD/.venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 394, in __init__
sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/amirgheser/SSA-IAD/.venv/lib/python3.12/site-packages/torch/utils/data/sampler.py", line 149, in __init__
raise ValueError(
Proposed fix
In the make_mvtec_ad_dataset function simply add this line or anything with a similar logic.
split = split.value if isinstance(split, Split) else split
Basically the dataframe is being fully and correctly handled until we reach the last if statement if split:
where the dataframe is finally being emptied because the equality between enum and string is always False.
Dataset
MVTecAD
Model
N/A
Steps to reproduce the behavior
Executing this simple script to check that the dataloader is working fails
if __name__ == "__main__":
from anomalib.data import MVTecAD
datamodule = MVTecAD(category="bottle")
datamodule.prepare_data()
datamodule.setup()
for i, data in enumerate(datamodule.train_dataloader()):
print(data.keys())
print(data["image"].shape)
break
OS information
OS information:
- OS: [e.g. Ubuntu 20.04]
- Python version: [e.g. 3.10.0]
- Anomalib version: [e.g. 0.3.6]
- PyTorch version: [e.g. 1.9.0]
- CUDA/cuDNN version: [e.g. 11.1]
- GPU models and configuration: [e.g. 2x GeForce RTX 3090]
- Any other relevant information: [e.g. I'm using a custom dataset]
Expected behavior
No error message :)
Screenshots
No response
Pip/GitHub
pip
What version/branch did you use?
2.4.0
Configuration YAML
Logs
Code of Conduct
Describe the bug
When using the MVTecAD dataset I simply load it up, use the prepare_data and setup and when I call the train_dataloader on top it seems like the data is empty. This is due to a bug related to the interaction between the Split Enum and Pandas.
Error statement
Proposed fix
In the
make_mvtec_ad_datasetfunction simply add this line or anything with a similar logic.Basically the dataframe is being fully and correctly handled until we reach the last if statement
if split:where the dataframe is finally being emptied because the equality between enum and string is always False.
Dataset
MVTecAD
Model
N/A
Steps to reproduce the behavior
Executing this simple script to check that the dataloader is working fails
OS information
OS information:
Expected behavior
No error message :)
Screenshots
No response
Pip/GitHub
pip
What version/branch did you use?
2.4.0
Configuration YAML
.Logs
.Code of Conduct