OpenAI Unveils New ChatGPT Model Significantly Better at Math and Reasoning
New AI Model “o1” Scores 83% on International Mathematical Olympiad Exam, Showcasing Major Advances in Problem Solving and Logical Reasoning.
Experts caution against overreliance on AI, urging users to fact-check despite improvements in reasoning and math capabilities.
Seattle, WA — On September 12, OpenAI introduced a breakthrough in artificial intelligence (AI) with a new version of ChatGPT, called “o1,” which is said to be vastly superior in handling mathematical and scientific problems. Previous iterations of ChatGPT, including those known for their linguistic prowess, often struggled with logic and reasoning tasks. However, this new model, according to OpenAI, has made remarkable strides, scoring an impressive 83% on the International Mathematical Olympiad (IMO) qualifying exam—a significant improvement from the mere 13% scored by earlier models.
The release of this model addresses a key criticism of previous AI models: their inability to effectively reason through problems, particularly in math and logic-heavy contexts. OpenAI’s chief scientist, Jakub Pachocki, explained to The New York Times that the new model operates with what could be described as “thinking through the problem,” a shift in how AI tackles complex questions.
Niloofar Mireshghallah, a postdoctoral scholar at the University of Washington’s Paul G. Allen School of Computer Science & Engineering, spoke with UW News to share insights into why previous models struggled with reasoning tasks and what makes this new iteration different.
“There are two main reasons AI models had trouble with tasks like math,” Mireshghallah explained. “First, next-word prediction, the foundation of these models, isn’t great for figuring out complex rules or principles like those used in math. Second, commonsense reasoning—something we take for granted, like knowing a fridge door shouldn’t be left open—isn’t something these models can easily learn from existing text.”
This limitation was evident in prior versions of ChatGPT, where even simple word problems posed a challenge. However, with the introduction of the “chain-of-thought reasoning” feature, the new model takes a step-by-step approach, akin to showing its work in solving a math problem. Instead of producing an immediate answer, the AI now breaks the problem down into parts, allowing it to identify potential errors in its reasoning process.
Jakub Pachocki pointed to this feature as a game-changer for AI, saying, “This model can take its time, think through the problem, and look for angles to provide the best answer.”
Mireshghallah elaborated, explaining that this approach, known as “test-time computation,” allows the AI to perform more internal reasoning. “Big companies used to improve models by scaling them up in size and training data. But there’s only so much pre-training data, and making models bigger isn’t always the answer,” she said. “Now, the focus is on test-time reasoning, where the model breaks problems into smaller steps, examines each one, and refines its answers.”
An example of this can be seen in how the model tackles a simple word problem: “If Sally has three apples and gives two away, how many does she have left?” While a basic AI might just provide the final answer of “1 apple,” the new model would outline its reasoning as follows:
- Sally starts with 3 apples
- She gives away 2 apples
- Subtract: 3 – 2 = 1
- Therefore, Sally has 1 apple left.
This transparent reasoning process makes it easier to spot mistakes and refine answers—a crucial improvement for solving multi-step problems, analyzing scenarios, and handling tasks that require logical deductions.
Despite these advancements, Mireshghallah urged caution, noting that while the new model is more sophisticated, it is not infallible. “Yes, the responses are better, but there are still failure modes,” she said. “People should continue to fact-check the outputs and not be fooled by the model’s ability to ‘think’ and take its time.”
As AI technology continues to evolve, the new ChatGPT model marks a significant step forward in enhancing reasoning capabilities. Yet, experts like Mireshghallah remind us that while AI is improving, it still requires careful oversight to avoid errors.
For now, the tech world awaits further developments in this rapidly advancing field, with AI companies like OpenAI pushing the boundaries of what machines can achieve in human-like reasoning and problem-solving.