Learning with Noisy Data
The objective of learning with noisy data is to enhance the performance of models using a training dataset that contains noise. This noise can arise from out-of-distribution (OOD) data or data with noisy labels. Existing models have two limitations: (1) They often assume that distribution of synthetic noisy data can be directly applied to real-world datasets, which is not always feasible or effective. (2) They typically handle out-of-distribution (OOD) and noisy labels as separate challenges, without tackling two problems with one stone. To overcome these limitations, we propose leveraging a large-scale pretraining model called CLIP to adapt to the task of learning with noisy data. By utilizing CLIP, we aim to address the challenges associated with noisy data in a more comprehensive manner.
We approach this problem as a novel task of universal domain adaptation [r1]. By fine-tuning the CLIP model using noisy source data, we aim to enhance the generalization capability of the model to target data. This approach allows us to tackle the challenges of noisy data while considering the broader context of domain adaptation.
[r1] Deng, B., & Jia, K. (2023). Universal Domain Adaptation from Foundation Models. arXiv preprint arXiv:2305.11092.
Research: We are currently in the process of finalizing the detailed results and methods.
Application: The research is conducted in collaboration with Docomo, and the model is utilized to enhance the performance of a classifier trained on metallic corrosion data with noisy labels. Noisy labels are inherent in corrosion data due to the difficulty human annotators face in accurately recognizing corrosion patches.