We just launched a Kaggle competition for code retrieval focused on Hugging Face's Transformers library 🎉 If you are looking for your next job or internship, this is a great way to get your hands dirty! 👩💻 You can easily bootstrap a solution with our OSS repository Sage. Happy coding!
About us
Building the future of coding
- Website
-
https://storia.ai
External link for Storia AI
- Industry
- IT Services and IT Consulting
- Company size
- 2-10 employees
- Headquarters
- San Francisco
- Type
- Privately Held
Locations
-
Primary
San Francisco, US
Employees at Storia AI
Updates
-
Tired of building AI coding agents based on vibes? Together with our friends from Morph Labs, we made a real-world dataset that gives us a ladder to climb: 1,000 questions about the Transformers library. Here are our initial learnings about proprietary APIs: 1. For embeddings, OpenAI's text-embedding-3-small performed best. 2. For reranking, NVIDIA takes the lead. 3. Sparse retrieval (BM25) actively hurts performance when retrieving *code*. The reason: it prioritizes documents in natural language, so Markdown files tend to undeservedly bubble to the top. 4. Chunk sizes: 200-800 tokens per chunk works well, as suggested by the CodeRag paper. We recommend 800, since it makes codebase indexing faster. 5. Top-K for retrieval: We see diminishing returns beyond 25 documents (but don't forget to rerank them!) Read the full report here: https://lnkd.in/e2EQ77nr
-
We just added rerankers to the Sage repo. Use your favorite open-source rerankers from Hugging Face or plug in your Cohere, NVIDIA or Jina AI API key. On our real-world code retrieval dataset (more on this soon!), the benefit of rerankers is indisputable. Try them out at https://lnkd.in/eVw2GA7k
-
ChatGPT knows about most things in the world, but does it know about *you*? Our friends at Khoj AI (YC S23) are making LLMs more personal by giving them access to your notes. Before you worry about privacy, know that they're fully open source and you can run models locally! See https://lnkd.in/eyfzdUGr to learn more about Khoj's OSS repo!
-
Tired of chasing those GPUs? Our friends at Felafax AI (YC S24) are building AI infra for non-NVIDIA GPUs that is 2x more cost-efficient and performant. They're being good citizens of the world and open-sourcing their code. Check them out! Felafax repo: https://lnkd.in/ecXJUWbA Chat with Felafax repo: https://lnkd.in/erKffy5N
GitHub - felafax/felafax: Felafax is building AI infra for non-NVIDIA GPUs
github.com
-
Need to give your LLM access to crucial business info? Our friends at Panora (YC S24) have open-sourced a unified API with a gazillion integrations, including CRMs (e.g. Hubspot), file storage (e.g. Google Drive) and more. Chat with their open-source codebase now at https://lnkd.in/eKPh5AZE.
Code Sage
sage.storia.ai
-
Storia AI reposted this
When generating code that isn't just in-line autocomplete, context is everything. All the information about a codebase's history, the architecture diagrams, the technical design docs, the Slack conversations, etc inform how to effectively generate code. Today we're taking an important step in contextual code understanding by indexing Github issues in our open-source repo2vec library. That means you can now chat with your codebase, and repo2vec will pull in relevant conversations from issues that can guide more accurate answers. Not gonna lie, it's pretty dope. https://lnkd.in/d9z8jRXA
-
Storia AI reposted this
By popular demand, we're adding the ability to index GitHub Issues in our open-sourced **repo2vec**. Coming soon to our hosted version too! GitHub Issues now surface as context in the conversation when relevant. For instance, here I'm talking to Hugging Face's `transformers` library, asking for an unsupported feature. Normally an LLM would hallucinate and try hard to make it happen. But having access to the issues, it can more confidently push back. And if you're in GitHub star-giving mode, you can scratch your itch at https://lnkd.in/eECQHCvf
-
How can you use state-of-the-art generative vision models? comfyui from Comfy Org is a powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. It's one of the leading tools in image-based generative AI. https://lnkd.in/gE_yNP4J Now you can get help using comfyui by chatting with the repo on Storia Sage. https://lnkd.in/eewqNqbz
GitHub - comfyanonymous/ComfyUI: The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
github.com
-
Open source software makes the world go round. It's amazing how much of software runs on the hard work of volunteers scattered around the globe. We want to highlight some open source projects that are empowering individuals to build awesome applications. Stanza is the Stanford University NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages. A very powerful NLP framework. https://lnkd.in/dbu_RE8 Now you can get help using Stanza by chatting with the repo on Storia Sage. https://lnkd.in/eMhX_3A7
Code Sage
sage.storia.ai