Iris.ai la ut dette på nytt
Recently I was impressed by an article from Apple testing the ability of LLMs to do formal logical reasoning and how they perform on mathematical tasks. https://lnkd.in/dh56qnyn 💡The paper introduces a new benchmark, GSM-Symbolic, and it reveals that even minor variations in mathematical questions can reduce accuracy by up to 65%. 🧠 This finding is quite interesting since it highlights some pitfals in training and testing the LLMs especially on reasoning tasks and it shows that even the most powerful models still have issues with formal languages. We Iris.ai have a clear focus on properly evaluating LLMs for business cases and definitely see a lot of limitations in current benchmarks. Hopefully soon we will have a much more comprehensive measurement mechanisms which will provide an environment for further development of LLMs. #AI #MachineLearning #LLMs #Mathematics #Innovation #Technology