Weights & Biases’ Post

View organization page for Weights & Biases, graphic

74,865 followers

2mo

Join us at the W&B Hackathon to build and improve LLM Judges. Whether you’re refining an existing model or creating a new annotation UI, this event is for AI Engineers who are ready to push the boundaries. Cash prizes and LLM API credits available. Register here: https://lnkd.in/g8VXptg7

2 Comments

Koyelia Ghosh Roy

Senior AVP - Transformational BI & Generative AI Leader @ EXL | 2024 3AI Pinnacle Award for Inspiring Women Leader | 2024 EmpowHer access award finalist by Women in Cloud | 2023 Role Model by Women in Cloud | Speaker

2mo

Is this in-person hackathon?

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Mohammed Arsalan

Posts on Generative AI | learner | Winner of Huggingface / Cohere / Machine Hack / Adobe global hackathons🏅 | Prompt engineer🦜 | Creator of Shaheen 🦅, Baith-al-suroor ,meme world 🤗.
7mo
Report this post
Octopus v2: On-device 📱 language model for super agent👮, a new method that empowers an on-device 2B model to outperform GPT-4 in both accuracy and latency, and decrease the context length by 95%. I was working on digital assistant for visually impaired people for upcoming Deutsche telekom hackathon, i think combining this with GPT-4 V model will be perfect for usecase, if anybody wants to team up feel free to ping. Paper 📄 - https://lnkd.in/gGVDvaPY
Like Comment
To view or add a comment, sign in
Brett Zane-Ulman
1mo
Report this post
🔥 Join Gabe Monroy from Google, Sanjeev Hasiza of Sensormatic, and Jyoti Bansal at {unscripted}'s closing keynote for an insightful session. Be there when they explore how to measure developer productivity in an AI-driven landscape, the impact of generative AI on code generation and maintenance, and the broader role of AI in automating and enhancing the Software Development Life Cycle (SDLC). 🗓 9/25 📍 https://lnkd.in/gGiF93mq
Like Comment
To view or add a comment, sign in
Max Maio

Co-Founder at Midship YC S24 - Extract data from any document
2w
Report this post
OpenAI's new model o1 is crazy powerful. We built an AI financial analyst that can make it's own financial models depending on the input files

Kieran Taylor

Co-founder at Midship | Extract data from any document | YC S24
2w

We built an AI financial analyst embedded in Excel for the OpenAI o1 hackathon. It takes a company's annual report, extracts the financial data and builds its own projections. Scary stuff.

1 Comment
Like Comment
To view or add a comment, sign in
Himanshu Srivastava

Building GSI Partner Business @ Harness | Modernizing Enterprise Software Delivery
1mo
Report this post
🔥 Join Gabe Monroy from Google, Sanjeev Hasiza of Sensormatic, and Jyoti Bansal at {unscripted}'s closing keynote for an insightful session. Be there when they explore how to measure developer productivity in an AI-driven landscape, the impact of generative AI on code generation and maintenance, and the broader role of AI in automating and enhancing the Software Development Life Cycle (SDLC). 🗓 9/25 📍 https://lnkd.in/gcHZWEi4
Like Comment
To view or add a comment, sign in
George Boone

Enterprise Sales at Harness.io
1mo
Report this post
🔥 Join Gabe Monroy from Google, Sanjeev Hasiza of Sensormatic, and Jyoti Bansal at {unscripted}'s closing keynote for an insightful session. Be there when they explore how to measure developer productivity in an AI-driven landscape, the impact of generative AI on code generation and maintenance, and the broader role of AI in automating and enhancing the Software Development Life Cycle (SDLC). 🗓 9/25 📍 https://lnkd.in/e_iwprMg
Like Comment
To view or add a comment, sign in
Doug May

SVP, Productivity at Harness | GTM Executive | Investor | Advisor
1mo
Report this post
🔥 Join Gabe Monroy from Google, Sanjeev Hasiza of Sensormatic, and Jyoti Bansal at {unscripted}'s closing keynote for an insightful session. Be there when they explore how to measure developer productivity in an AI-driven landscape, the impact of generative AI on code generation and maintenance, and the broader role of AI in automating and enhancing the Software Development Life Cycle (SDLC). 🗓 9/25 📍 https://lnkd.in/ekUgskcS
Like Comment
To view or add a comment, sign in
Nicholas B. Garcia

Harness - AI/ML Leader in DevSecOps
1mo
Report this post
🔥 Join Gabe Monroy from Google, Sanjeev Hasiza of Sensormatic, and Jyoti Bansal at {unscripted}'s closing keynote for an insightful session. Be there when they explore how to measure developer productivity in an AI-driven landscape, the impact of generative AI on code generation and maintenance, and the broader role of AI in automating and enhancing the Software Development Life Cycle (SDLC). 🗓 9/25 📍 https://lnkd.in/eBuqN-ih
Like Comment
To view or add a comment, sign in
Sam Schumacher

Data Scientist & ML Engineer
7mo
Report this post
For data teams interested in Hackathons - a cool side effect of multi modal LLMs; If the project is an internal tool, never again will you need a dashboard to be the artefact by the end of the day! The graphing capabilities of an LLM mean convincing the end user of its usefulness is something they can figure out, given a CSV and subscription to GPT4!
Like Comment
To view or add a comment, sign in
Ryan Crowe

Accelerate Software Delivery | Improve Developer Efficiency | Optimize your Cloud Investment
1mo
Report this post
🔥 Join Gabe Monroy from Google, Sanjeev Hasiza of Sensormatic, and Jyoti Bansal at {unscripted}'s closing keynote for an insightful session. Be there when they explore how to measure developer productivity in an AI-driven landscape, the impact of generative AI on code generation and maintenance, and the broader role of AI in automating and enhancing the Software Development Life Cycle (SDLC). 🗓 9/25 📍 https://lnkd.in/g72D3nej
Like Comment
To view or add a comment, sign in
Ayşegül Güzel

HumaneIntelligence Fellow | Responsible & Sustainable AI | AI for Social Impact | Certified in Ethics of AI at LSE
3mo
Report this post
HuggingFace announces the new Open LLM Leaderboard with many changes in benchmark selection, normalization techniques in the evaluation, the choice of the new interface, and voting process for model selection to name some. Here are the reasons for such a change from their words: Over the past year, the benchmarks we were using got overused/saturated: -They became too easy for models. For instance, models are now reaching baseline human performance on HellaSwag, MMLU, and ARC, a phenomenon called saturation. -Some newer models also showed signs of contamination. By this, we mean that models were possibly trained on benchmark data or on data very similar to benchmark data. As such, some scores stopped reflecting the general performance of the model and started to overfit on some evaluation datasets instead of reflecting the more general performance of the task being tested. This was, in particular, the case for GSM8K and TruthfulQA, which were included in some instruction fine-tuning sets. -Some benchmarks contained errors. MMLU was recently investigated in depth by several groups (see MMLU-Redux and MMLU-Pro), which surfaced mistakes in its responses and proposed new versions. Another example was that GSM8K used a specific end-of-generation token (:), which unfairly pushed down the performance of many verbose models. We thus chose to completely change the evaluations we are running for the Open LLM Leaderboard v2! Check the full article here: https://lnkd.in/dvVtBViK

Open-LLM performances are plateauing, let’s make the leaderboard steep again - a Hugging Face Space by open-llm-leaderboard

huggingface.co
Like Comment
To view or add a comment, sign in

74,865 followers

View Profile Follow

Weights & Biases’ Post

More Relevant Posts

Explore topics