You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Conceptual 12M (CC12M) is a dataset with 12 million image-text pairs specifically meant to be used for visionand-language pre-training. Its data collection pipeline is a relaxed version of the one used in Conceptual Captions 3M (CC3M). Indo_MultiModal_CC_12M is the Indonesian language version.
License
The dataset may be freely used for any purpose, although acknowledgement of Google LLC ("Google") as the data source would be appreciated. The dataset is provided "AS IS" without any warranty, express or implied. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
The text was updated successfully, but these errors were encountered:
NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?id_mm_cc_12m
The text was updated successfully, but these errors were encountered: