pleias

Technology, Information and Internet

Access all 11 employees

About us

Website: pleias.fr
External link for pleias
Industry: Technology, Information and Internet
Company size: 2-10 employees
Type: Self-Owned

Employees at pleias

See all employees

Updates

pleias

962 followers
1h Edited
Report this post
We are delighted to announce that Cassandre, an HR assistant developed by pleias for the Académie de Lyon, has been shortlisted for the Grand Prix IA & RH of Hub FranceIA. Many thanks to Jerome Blondon who leaded this project as well as to pleias' Applied AI team (Carlos Rosas Hinostroza, Irène Girard, Pierre-Carl Langlais).
Hub France IA

13,192 followers
5h Edited

Grand Prix IA & RH 🎉 Le verdict est tombé pour le Grand Prix IA et RH du Hub France IA ! 🎊 🚀 Nous avons reçu plus de 50 dossiers de qualité, la sélection a été rude ! 🤯 Un grand merci à tous les participants pour leur implication et leurs idées innovantes. 🙏 🥁 Roulement de tambour... 🥁 Après une délibération intense, nous avons le plaisir d'annoncer les 8 finalistes, par ordre alphabétique : - Groupe Keolis avec EURODECISION - La Poste Groupe avec Probayes - agap2 avec LE MUST Employer -Safran avec 360Learning - Rectorat de l'académie de Lyon avec pleias - Roquette - Semantikmatch - TOP™ 🔥 On a hâte de les voir défendre leurs idées devant notre jury d'experts. Rendez-vous le 10 décembre pour découvrir le grand gagnant de cette 1ere édition ! 🤩 👏 Nous souhaitions également dire bravo aux projets suivants qui se sont démarqués et intègrent le TOP 15 : ✨ Alstom ; Crédit Agricole d'Ile-de-France avec Leihia ; Manpower France avec Sanofi ; Omind ; Omogen ; People360 ; Skillup.co avec DominoRH #IA #RH #Innovation #HubFranceIA #GrandPrix #Finalistes ➡️ Restez connectés pour suivre l'aventure du Grand Prix IA et RH ! Nos partenaires : Parlons RH ; Le Lab RH ; ActuIA ; Alan Jean-Roch Houllier, Claire LARSONNEUR, Emmanuel Teboul, Gérald PETITJEAN, Michel ROMANET-CHANCRIN, Pierre Guenoun, Jerome Blondon, Anastasia Stasenko, Fanny Girerd, Maxime Cariou, Pierre-Louis Bescond, Stephane Barbot, Stéphane Ureña
Like Comment Share
pleias

962 followers
3d
Report this post
pleias Joins The AI Alliance to Co-lead Open Trusted Data Initiative and Releases the Largest Open Dataset for LLM training - Common Corpus Open Trusted Data Initiative (OTDI) is a joint effort of The AI Alliance members to ensure the AI community has access to open and trusted data with the appropriate “provenance”. This work is fully aligned with pleias’ mission and values, so today pleias joins the AI alliance to lead the OTDI efforts in the Alliance and releases the Common Corpus. Common Corpus represents a significant advancement in open-source AI development, comprising over 2 trillion tokens of high-quality, multilingual text data. This unprecedented collection includes diverse content ranging from books and scientific articles to government documents and computer code, with substantial coverage across English, French, and other European languages. By making this dataset publicly available, we are enabling the development of powerful, efficient language models that comply with EU AI Act requirements while democratising the field of pretraining of Large Language Models. The dataset also features extensive documentation of data provenance and procedures, making it fully transparent and auditable. Through careful content filtering, the collection maintains strong educational value while eliminating harmful content. The dataset draws from trustworthy sources including open scientific literature, cultural heritage materials, open-source code, and government and legal documents. This focus on high-quality, knowledge-grounded content ensures that models trained on Common Corpus will benefit from reliable, accurate information. The development of Common Corpus has been made possible through strategic partnerships with leading organizations in the field. Wikimedia Foundation Enterprise, EleutherAI, and Ai2 have provided valuable technical expertise and resources. Additionally, support from the Ministère de la Culture Culture and Direction interministérielle du numérique (DINUM) has been instrumental in accessing and curating high-quality content. In addition to the dataset release, pleias is publishing comprehensive documentation of its data preparation methodologies and introducing new models and libraries for pretraining data preparation and filtering. This commitment to transparency exemplifies the organization's dedication to open science principles.

Like Comment Share
pleias

962 followers
3mo
Report this post
pleias releases its first foundation model out of its future suite of specialised ultra-fast and green pretrains for document processing at scale At pleias we are successfully experimenting with a new category of models: specialized foundation SLMs. These models are designed from the ground up for specific tasks, exclusively trained on a large custom instruction dataset (at least 5-10B tokens) and, so far, yielding performance comparable to much larger generalist models. Today, we release the first example of specialized pre-training, OCRonos-Vintage. It's a 124 million parameter model trained on 18 billion tokens from cultural heritage archives to perform OCR correction. Despite an extremely small size and lack of generalist capacities, OCRonos-Vintage is currently one the best available model for this task. We are currently deploying it at scale to pre-process and correct the badly OCRised parts of a dataset of more than 700 billion tokens from Gallica (BNF), Chronicling America and other cultural heritage corpora. https://lnkd.in/e2N3XFjk

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

huggingface.co

Like Comment Share
pleias reposted this

SCIN360

258 followers
6mo
Report this post
✨📚 Et si on parlait droit d'auteur ? En cette journée mondiale de la propriété intellectuelle, SCIN360 souhaite mettre en avant une jeune strat-up française et audacieuse : pleias. Annoncée il y a un mois, elle permet la mise à disposition d'un corpus de 500 milliards de mots dans plus de 70 langues destinés à entraîner les grands modèles de langage, utilisés par les IA. Sa particularité ? Utiliser uniquement des textes provenant de bibliothèques publiques et libres de droit, offrant des ressources open source tout en respectant les droits d'auteur et en promouvant une approche transparente et éthique. 📖 Après des plaintes de plusieurs auteurs en juillet de 2023 contre OpenAI et Meta, ou encore le procès entre le "New York Times" et OpenAI, Pleias remet les pendules à l'heure. 🕰️ Avec une première levée de fonds à l'horizon, et le soutien de plusieurs projets de recherche collaborative sur l'IA générative open source : Pleias se positionne comme un acteur majeur dans la construction d'une IA respectueuse du droit d'auteur et de la diversité linguistique. Et vous, que pensez-vous de l'Open Source pour l'avenir de l'IA ?💡 #OpenSource #IA #CommonCorpus #Innovation #intelligenceartificielle #IAEthique
Like Comment Share

pleias

Technology, Information and Internet

About us

Employees at pleias

Prof. Dr. Ivan Yamshchikov

radical techno-optimist

Anastasia Stasenko

Co-founder pleias | ENS-Ulm | Associate Senior Lecturer at Sorbonne-Nouvelle

Carlos Rosas Hinostroza

Data scientist - Sociologist

Matthieu Delsart

X-HEC Data Science

Updates

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

huggingface.co

Join now to see what you are missing

Similar pages

opsci

Fairly Trained

Hugging Face

Direction interministérielle du numérique (DINUM)

Nomic AI

Scaleway

GENCI

OpenLLM 🇫🇷 🇪🇺

EleutherAI

STATION F