Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems
D Kai, M Zhenguo, Y Xiaoran - arXiv preprint arXiv:2409.00131, 2024 - arxiv.org
D Kai, M Zhenguo, Y Xiaoran
arXiv preprint arXiv:2409.00131, 2024•arxiv.orgThis study focuses on improving the performance of lightweight Large Language Models
(LLMs) in mathematical reasoning tasks. We introduce a novel method for measuring
mathematical logic similarity and design an automatic screening mechanism to construct a
set of reference problems that integrate both semantic and logical similarity. By employing
carefully crafted positive and negative example prompts, we guide the model towards
adopting sound reasoning logic. To the best of our knowledge, this is the first attempt to …
(LLMs) in mathematical reasoning tasks. We introduce a novel method for measuring
mathematical logic similarity and design an automatic screening mechanism to construct a
set of reference problems that integrate both semantic and logical similarity. By employing
carefully crafted positive and negative example prompts, we guide the model towards
adopting sound reasoning logic. To the best of our knowledge, this is the first attempt to …
This study focuses on improving the performance of lightweight Large Language Models (LLMs) in mathematical reasoning tasks. We introduce a novel method for measuring mathematical logic similarity and design an automatic screening mechanism to construct a set of reference problems that integrate both semantic and logical similarity. By employing carefully crafted positive and negative example prompts, we guide the model towards adopting sound reasoning logic. To the best of our knowledge, this is the first attempt to utilize retrieval-enhanced generation for mathematical problem-solving. Experimental results demonstrate that our method achieves a 15.8% improvement over the Chain of Thought approach on the SVAMP dataset and a 21.5 % improvement on the GSM8K dataset. Further application of this method to a large-scale model with 175 billion parameters yields performance comparable to the best results on both aforementioned datasets. Finally, we conduct an analysis of errors during the reasoning process, providing valuable insights and directions for future research on reasoning tasks using large language models.
arxiv.org