Google 學術搜尋

Retraining with Predicted Hard Labels Provably Increases Model Accuracy

R Das, IS Dhillon, A Epasto, A Javanmard… - arXiv preprint arXiv …, 2024 - arxiv.org

R Das, IS Dhillon, A Epasto, A Javanmard, J Mao, V Mirrokni, S Sanghavi, P Zhong

arXiv preprint arXiv:2406.11206, 2024•arxiv.org

The performance of a model trained with\textit {noisy labels} is often improved by
simply\textit {retraining} the model with its own predicted\textit {hard} labels (ie, $1 $/$0 $
labels). Yet, a detailed theoretical characterization of this phenomenon is lacking. In this
paper, we theoretically analyze retraining in a linearly separable setting with randomly
corrupted labels given to us and prove that retraining can improve the population accuracy
obtained by initially training with the given (noisy) labels. To the best of our knowledge, this …

The performance of a model trained with \textit{noisy labels} is often improved by simply \textit{retraining} the model with its own predicted \textit{hard} labels (i.e., / labels). Yet, a detailed theoretical characterization of this phenomenon is lacking. In this paper, we theoretically analyze retraining in a linearly separable setting with randomly corrupted labels given to us and prove that retraining can improve the population accuracy obtained by initially training with the given (noisy) labels. To the best of our knowledge, this is the first such theoretical result. Retraining finds application in improving training with label differential privacy (DP) which involves training with noisy labels. We empirically show that retraining selectively on the samples for which the predicted label matches the given label significantly improves label DP training at \textit{no extra privacy cost}; we call this \textit{consensus-based retraining}. For e.g., when training ResNet-18 on CIFAR-100 with label DP, we obtain improvement in accuracy with consensus-based retraining.

arxiv.org

顯示更多顯示較少

儲存引用相關文章 HTML 版

引用

進階搜尋

已儲存至「我的圖書館」

Retraining with Predicted Hard Labels Provably Increases Model Accuracy