GPT-4o struggles with debugging its own code. It often repeats the same incorrect solution without improvement. For coding tasks, consider using other LLMs like Anthropic Claude 3 Opus or Claude 3.5 Sonnet, which perform substantially better. Despite being trained for real-time voice conversations, GPT-4o seems less effective at long-form, multi-turn conversations compared to older GPT models. The function calling with GPT-4o is actually worse than GPT-4 Turbo.
Arka Bagchi’s Post
More Relevant Posts
-
Founder at The Burgeon Group | Operations Consultant for SMEs | Driving Efficiency & Innovation for Growth | Open to Leadership Roles in Operational Excellence
What an insightful course. My brain is rushing with operational applications for custom GPT's. I just finished the course “Build Your Own GPTs” by Alina Zhang! Check it out: https://lnkd.in/dE4HVsCx #chatbotdevelopment.
To view or add a comment, sign in
-
We created a benchmark called ProcBench where the task is just to follow the instructed procedures. The tasks are relatively simple for humans but prone to errors for LLMs as the number of steps to solve them increases. Even top-tier LLMs like o1-preview show significant performance drops as procedure complexity increases. Since these tasks seem to illuminate a critical weakness of the current LLMs, it will be fascinating to tackle them to overcome the current weaknesses of LLMs. Perhaps, solving them might be an important step toward AGI. It’s intriguing to see whether simply larger LLMs can eventually solve them, or we need a new paradigm and approach. "ProcBench: A Benchmark for Procedural Reasoning in Large Language Models." by Fujisawa et al. https://lnkd.in/gu7Ezx-f
ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure
arxiv.org
To view or add a comment, sign in
-
Spend Analysis. Ai in Procurement. Digital Procurement. I specialize in creating advanced Procurement Centre of Excellence setups.
48 hours of using GPT 4o, few things that I was not expecting : (Caution: I am a average user and do not have access to their desktop app or early beta user feature that you saw in demo) 1) 4o is bad at coding. It fails to remember your constraints provided. I had a better debugging assistance from 4. But don’t be naive in your assumptions and expect a fully functional solution from these models. These models are good at batch size small code blocks. You are the engineer and brain! I am back to GPT4 for this! 2) After the introduction of 4o, both 4 and 4o have become quite slow. It gives you outputs in chunks and often lag. But, this could be because of heavy traffic load and their new multimodal input capabilities. But, it is slow at the moment. I haven’t noticed a significant better output yet! May be it is just my use cases at the moment (largely coding related) but the incremental improvement isn’t noticeable yet! I’ll keep stress testing this for Procurement use cases and share my feedback! Supernegotiate
To view or add a comment, sign in
-
Professor, Founder & CEO of Orditus, an AI startup. Developer of Chatlize.ai, RTutor.ai, iDEP & ShinyGO. Topics: AI, Data Science, Bioinformatics
For coding, the real gem is perhaps o1-mini. It has better performance than GPT-4o at a smaller cost. The downside is speed, and slightly reduced context window. The o1-preview is way too slow and costly.
To view or add a comment, sign in
-
-
I recently started using the DALL·E integration in GPT for storyboarding our projects and it's proving to be extremely useful. A single sentence prompt with a few key words produced something more than just usable for our proof of concept. Our DP for the job referenced in the image, Jack Leahy also remarked 'it's annoying how good that looks'. The biggest struggle I had generating 'realistic' images even a few months ago was learning how to structure my prompts in a way that the software would understand. That obstacle seems to be getting smaller and smaller! The really cool thing is that the generated image follows the basic framework for capturing something interesting – backlit, reflections of light, leading lines, atmosphere, etc.... The ability to visualize your ideas so early on in the creative process is very powerful.
To view or add a comment, sign in
-
-
Claude Sonnet 3.5 destroys GPT-4 at writing code, and it's not even close. Faster. Fewer errors. Less verbose. Higher quality code. To compare multiple models: https://bit.ly/3WwtFo6. You get GPT-4, GPT-4o, GPT-4o mini, Sonnet, and Gemini in the same place.
To view or add a comment, sign in
-
Technologist & Human. Trusted Implementer, Advisor, Consultant, and Coach for Profit, People, and Planet. Interested in sustainability, responsible use of technology in generative ai and renewables.
In terms of which models to use for code conversion, Claude 3 is becoming a favorite. I still use copilot in context for individual debugging and questions. But for the simple act of converting code, Claude is the most amenable to following instructions. And for this task, I'm not noticing any serious hit using Sonnet (medium) vs Opus (large). GPT-4 is more opinionated about how it chooses to convert the code, and in this particular use case, that is not what I want.
To view or add a comment, sign in
-
-
Finally some public OCR datasets! This is very helpful to train document models and make character recognition a problem of the past. We've seen the speed that OpenSource LLMs gave to generative writing and code fixing, it won't take much before we'll be able to parse documents even from dead languages! https://lnkd.in/dFnfevYR
Pablo Montalvo (@m_olbap) on X
twitter.com
To view or add a comment, sign in
-
i know: Elon bad. But look at this: "An early version of Grok-2 has been tested on the LMSYS leaderboard under the name "sus-column-r." At the time of this blog post, it is outperforming both Claude 3.5 Sonnet and GPT-4-Turbo."
Grok-2 Beta Release
x.ai
To view or add a comment, sign in
-
👨💻 Tech Whisperer | 🚀 Exploring the digital frontier, one line of code at a time | 💡 Innovator at heart | 🤖 AI aficionado | #TechLife
Large Language Models in Code Generation: Overcoming Common Bugs and Improving Accuracy *** One effective method to enhance code accuracy in large language models (LLMs) involves introducing a self-critique mechanism. This iterative process allows LLMs to analyze their generated code, identify errors based on a detailed bug taxonomy, and correct them using compiler feedback. Implementing this approach can significantly reduce bugs and increase the passing rate of generated code. *** https://lnkd.in/eVbG7KS7
To view or add a comment, sign in
-