Pangram Labs reposted this
Pangram AI detection is now multilingual! In a push to improve our detection of non-English AI-generated text, we retrained our model to be natively multilingual. Now, Pangram officially supports AI detection for the top twenty languages spoken on the internet. Arabic, Czech, German, Greek, Spanish, Persian, French, Hindi, Hungarian, Italian, Japanese, Dutch, Polish, Portuguese, Romanian, Russian, Swedish, Turkish, Ukrainian, Urdu, Vietnamese, Chinese, Korean, and Hindi. Unofficially, even more. How did we do it? We changed the tokenizer to better support non-English languages. We slightly increased the size of the model to better fit the new distribution. We ran an active learning data campaign against multilingual web-scale data. We applied data augmentation to machine translate a small fraction of our dataset before training. For training details and benchmark results, please check out our latest blog post!