Nebius’ Post

Nebius reposted this

View organization page for DVC.ai, graphic

7,384 followers

The DVC team is excited to release DataChain today! DataChain is an open-source Python library for processing and curating unstructured data at scale. 🤖 AI-Driven Data Curation: Use local ML models, LLM APIs calls to enrich your data. 🚀 GenAI Dataset scale: Handle 10s of millions of files or file snippets. 🐍 Python-friendly: Python objects instead of JSON to represent annotations DataChain enables the parallel processing of multiple data files or samples. It can chain different operations such as filtering, aggregation, and merging datasets. The resulting datasets can be saved, versioned, and extracted as files or converted to a PyTorch data loader. DataChain can serialize Python objects (via Pydantic) to an embedded SQLite database. It efficiently deserializes Python objects or runs vectorized analytical queries in the DB without deserialization. The typical use cases are: ◆ LLM judging LLMs dialogues (see code in image) ◆ Auto-deserializing LLM responses to Pydantic. ◆ Vectorized analytics over Python objects ◆ Annotating cloud images with a local model. ◆ Dataset curation using AI annotations. DataChain excels at optimizing batch operations, such as parallelizing synchronous API calls or leveraging heavy batch processing tasks. We believe that DataChain will serve as a solid foundation for new and upcoming unstructured data-wrangling libraries, as well as custom AI-driven curation solutions. ⭐️ Give DataChain a try for your Generative AI data management, give it a star, and as always, your feedback is welcome! Link to repo to get started in the comments! #Generativeai #AI #computervision #LLM

  • No alternative text description for this image
Jenifer De Figueiredo

Community Manager, Master plate spinner, Connector of people and ideas

2mo
Elena Samuylova

Co-Founder Evidently AI (YC S21) | Building open-source tools to evaluate and monitor AI-powered products.

2mo

Congrats on the launch Dmitry Petrov, Ivan Shcheklein and the team!

Konstantin Savenkov

CEO @ Intento - AI agents for enterprise localization.

2mo

Awesome news! What's the primary use-case - is it RAG? Or managing data for fine-tuning?

Ash Vardanian

Building Unum | Investing @ Aloniq & AAL

2mo

Congrats on the release!

Aleksandr Patrushev

Senior Product Manager ML/AI @ Nebius | ex-Amazon, ex-VMware, ex-IBM

2mo

Amazing features!

Gaurav Jain

Managing Partner at Afore Capital | Founding Product Manager for Android Nexus

2mo

Amazing stuff, congrats!!

Viktor Andriichuk

AI Solutions Architect | Driving Business Growth Through Custom AI Implementations

3w

Sorry, don’t catch the idea… (((

Like
Reply
Miguel Hollander

Vice President at Main Street Advisors

2mo

Awesome, congrats!

Stepan Ilyin

Advanced API Security for Modern Enterprises - I am hiring

2mo

Nice one!

See more comments

To view or add a comment, sign in

Explore topics