GenAI sorely needs governed, accurate inputs. So we should expect data observability companies to start monitoring and optimizing vectorized text that supports language models.
Monte Carlo just announced plans to do this. Thank you Eric Avidon of TechTarget for including my perspective about their announcement. Here is his article, including insights from co-founder Lior Gavish, with excerpts below.
https://lnkd.in/gcffQbKq
"Data observability specialist Monte Carlo on Wednesday unveiled new features designed to ensure data quality, including integrations with vector databases and Apache Kafka.
"Data observability is the process of monitoring data as it progresses through the pipeline from ingestion through analysis to make sure the data used to inform decisions is accurate and up to date...
"Vector databases have gained importance in the year since OpenAI released ChatGPT, which marked a significant improvement in generative AI and large language model (LLM) technology...
"Eckerson Group research shows under a quarter of data experts rate their data governance and data quality controls sufficient for AI and machine learning initiatives, including generative AI.
"Monte Carlo is taking a significant step to help them ensure accuracy by observing the quality of text files as they are transformed into vectors and embedded into vector databases," said analyst Kevin Petrie of Eckerson Group. "They also ensure the quality of database records that will complement generative AI in many of these initiatives."
"Vector databases are earlier in the adoption curve with customers," Gavish said. "They're starting to look at them. We're hearing that customers are building pipelines for generative AI and need to add observability into that. But the landscape there is early. It's not like everyone is using vector databases yet."
"Beyond vendors such as Pinecone and Chroma that specialize in vector databases, many database and broader data management vendors offer vector databases within their larger offerings. That could be an opportunity for Monte Carlo.
"For example, both Neo4j and SingleStore now offer vector search capabilities.
"Now that they've taken this first step with vector databases, Monte Carlo should round out its support by integrating with the full range of vector databases, including both dedicated platforms and broader platforms that include vector capabilities," Petrie said.
Wayne Eckerson Eckerson Group Jay Piscioneri #dataobservability #generativeai #genai #vectordatabase
Big Data Architect @ Accenture | Building Next-Gen Data Platforms
1moEdward Ponce Santos Thanks for sharing these insights on Cortex Analyst and its potential to make data interaction more accessible, especially for non-technical users. It's exciting to see how AI models like Llama 3 and Mistral are driving this innovation. The mention of Cortex Guard and its role in filtering sensitive content is particularly interesting. For those interested in secure AI-driven data environments, I’d recommend looking into the "AI Explainability 360" toolkit by IBM. It could complement Snowflake's efforts in ensuring transparency and compliance in data governance. Looking forward to seeing how these features evolve!