As AI models grow, so does the demand for high-quality data. 🚀 Rights holders—including publishers, content creators, and organisations with extensive content libraries—have a unique opportunity to unlock new revenue streams. Join our upcoming webinar, "Understanding Today’s AI Data Licensing Market," on Tuesday, November 19, from 14:30 - 15:30 GMT to learn how to monetise your content. 🔎 This session will cover: ✅ Rising demand for curated, high-quality data. ✅ Copyright protection and understanding bot traffic. ✅ Structuring fair licensing deals, pricing, and distribution strategies. 📆 Register now to secure your spot! https://lu.ma/u2r4l0un #TrainingData #DataMonetisation #AIDataLicensing #Webinar
Valyu
Technology, Information and Internet
London, England 385 followers
High Quality Licensed Data AI Models and Apps (Training & Context Enrichment)
About us
Generative AI has increased the demand for high quality, diverse datasets for model training, performance and personalisation. This growing demand is raising challenges like copyright of training data, provenance, attribution and compensation for content owners or platforms leading to challenges in model scaling, LLM application development, legal use and revenue allocation. Data licensing is crucial to address these issues, ensuring that data usage complies with legal standards, respects rights, and provides appropriate credit and means of compensation to content platforms and creators. Valyu is a smart contract based platform that connects data providers with AI companies seeking diverse, high-quality training data. We bridge the gap between content platforms and AI companies, facilitating the licensing, discovery, packaging and distribution of high-quality datasets. Our platform also offers data valuation and tooling to simplify dataset licensing, provenance, and distribution process. Founded by leading academics and engineers from University College London (UCL), our team has extensive experience in enterprise data/ML companies and large scale data infrastructures for AI. We use advancements in ML and cryptography, and smart contracts to enable responsible data commercialisation for AI. Our mission is to accelerate AI with the responsible use and monetisation of data. We love building products that people enjoy and pushing the boundaries of engineering and research! :) #WeBuild 🛠️ Learn more at valyu.network
- Website
-
https://www.valyu.network/
External link for Valyu
- Industry
- Technology, Information and Internet
- Company size
- 2-10 employees
- Headquarters
- London, England
- Type
- Privately Held
- Founded
- 2022
- Specialties
- Data Governance, Data Monetisation, Machine Learning, Data Licensing, Data Valuation, Copyright , Data Valuation, LLMs, RAG, Data Provenance, and Attribution
Locations
-
Primary
18 Soho Square
London, England W1D 3QL, GB
Employees at Valyu
Updates
-
Last Friday, we hosted an event with Common Crawl Foundation and UCL discussing the role of open data and web crawling in research and innovation in today’s AI landscape. The event highlighted the wide-ranging uses of open datasets, explored recent trends in data restrictions, and emphasised the need for fairer approaches to content access. A big thanks to Thom Vaughan, Pedro Ortiz Suarez from Common Crawl, and Philip Treleaven from UCL for presenting and taking part of the event! We’ve written a recap of the event here: https://lnkd.in/dMam_uQS Stay tuned for our upcoming event! #OpenData #WebCrawling #TrainingData #GenerativeAI
Event Recap: Open Data, Research and Web Archiving in the Age of AI and LLMs • Valyu Blog
valyu.network
-
Great talks on our "Open Data, Research and Web Archiving in the Age of AI and LLMs" event today and thanks to everyone who attended and especially Thom Vaughan and Pedro Ortiz Suarez from Common Crawl Foundation for presenting. Stay tuned for our next event! #OpenData #WebCrawling Hirsh Pithadia
-
OpenAI just released Search GPT. As AI models and agents rely on real-time data access to enrich their contexts, this is transforming how content is accessed and consumed. In response, some publishers and rights holders are blocking crawlers from companies like OpenAI, raising questions about the impact on referral traffic and digital content ecosystems. This very timely event today (afternoon) discusses the 500% rise in bot blockage, its implications for open data and rights holder consent, and how we can rethink bot and agent consent mechanisms for a more balanced digital future. https://lu.ma/r8ms7es2
Open Data, Research and Web Archiving in the Age of AI and LLMs · Luma
lu.ma
-
What does the future hold for rights holders in the age of generative AI? Our newest article unpacks the latest lawsuits and licensing developments that forms new pathways between AI companies and content creators. Read more here: https://lnkd.in/duFE6hYh #AITrainingData #GenerativeAI #AILicensing #AILawsuits #ContentMonetisation
Rights Holders vs. Generative AI: Latest Lawsuits and Licensing Developments • Valyu Blog
valyu.network
-
Valyu reposted this
Co-founder / CTO @ Valyu: Trusted Data for your AI apps and models. MEng Mathematics & Computer Science @ UCL
Back in London from the Frankfurter Buchmesse 2024 📖 It was great to see firsthand how the publishing industry is navigating AI. While it is a powerful tool, key issues remain around copyright, fair use, and the distribution of data for training and providing content as context for these models in real time. Main takeaway? The publishing industry is rapidly taking action; although further alignment among authors, contributors, and publishers is needed, progress is being made.
Valyu’s team Harvey Yorke & Hendrik van der Sande just wrapped up at the Frankfurter Buchmesse 2024 exploring the implications of AI on publishing, attribution and data licensing of content! 📚 #fbm24 #FrankfurtBookFair #AIandPublishing #AIDataLicensing #TrainingData
-
Valyu’s team Harvey Yorke & Hendrik van der Sande just wrapped up at the Frankfurter Buchmesse 2024 exploring the implications of AI on publishing, attribution and data licensing of content! 📚 #fbm24 #FrankfurtBookFair #AIandPublishing #AIDataLicensing #TrainingData
-
Valyu reposted this
High-performing RAG (Retrieval-Augmented Generation) infrastructure is essential for data owners, enabling their content to enrich AI-generated responses with accurate and relevant context whilst making sure they are being attributed and quoted properly. Astute RAG tackles the challenges of current RAG solutions like imperfect retrieval and knowledge conflicts in large language models (LLMs). This approach adapts by generating internal knowledge from LLMs, consolidating information iteratively from both internal and external sources, and resolving conflicts based on reliability. Key points: 1️⃣ Mitigating imperfect retrieval: Astute RAG combines information stored in the model and external data, filtering out irrelevant or misleading information. 2️⃣ Resolving knowledge conflicts: It identifies and compares conflicting sources, generating reliable, context-rich answers. 3️⃣ Enhancing inference quality: By synthesising responses from multiple sources, it ensures more accurate and deeper understanding and context enrichment. 4️⃣ Versatility across domains: Effective in fields like healthcare, finance, and education. Tested on frontier models like Gemini and Claude, Astute RAG outperforms traditional methods, especially in complex real-world scenarios. Its ability to improve robustness in RAG systems makes it a valuable tool for AI applications requiring nuanced and reliable information synthesis. 📑 Read the paper here: https://lnkd.in/d5rsSiAd
-
Valyu reposted this
Is high quality data essential for maximising model efficiency and generalisation across tasks? 🤔 When scaling Diffusion Transformers (DiT), the compute budget and dataset characteristics play a crucial role in determining model performance. A recent analysis found that similar to text based models, a power-law relationship between pre-training loss and compute holds across varying budgets, the quality and size of the dataset significantly affects scaling laws. Key insights include: 1️⃣ Better data quality allows better performance at the same compute budget. 2️⃣ Dataset size follows a power-law with compute, but larger models are required for generalising across complex datasets. 3️⃣ Out-of-domain data introduces a performance drop, even when scaling laws are applied. 4️⃣ Complexity and the mix of data types can influence model parameter efficiency. While data scaling laws generally still apply, understanding dataset-specific nuances are essential for maximising model performance and generalisation across tasks. 📑 Read the paper here: https://lnkd.in/enjvn9dG If you’re running into any data related issues feel free to DM me or Valyu! #AI #DiffusionTransformers #MachineLearning #ScalingLaws #ComputeOptimization #AIResearch
-
🚀 Generative AI is being utilised in all industries—from media and entertainment to product development. But did you know that high-quality content is at the heart of it all? 🌟 Our latest article breaks down how generative AI works, why content is so valuable for AI training, and how rights holders can license their data to tap into new revenue streams. If you’re a rights holder or content creator, this is a must-read for understanding your role in shaping the future of AI. Read the article here: https://lnkd.in/eHs4biqF #GenerativeAI #DataLicensing #TrainingData #ContentMonetisation Harvey Yorke
The Beginner’s Guide to Generative AI: With Use Cases & Examples • Valyu Blog
valyu.network