The full development set is approximately 6.5 GB .
The dataset is hosted by the and can be accessed through platforms like Zenodo . Download 736 740 zip
Five unique human-annotated descriptions for every audio clip. The full development set is approximately 6