Research shows that ChatGPT does not solve new coding problems and cannot replace programmers

Since ChatGPT was introduced in 2022, there has been a lot of talk about AI chatbots having the potential to replace humans in some jobs. While some tech experts believe that AI chatbots will take over some human jobs, such as coding, others argue that the technology will never be as intelligent as humans and will simply help them do their jobs better. Some tech giants have also said in the past that in a few years, human programmers will not be needed at all, as AI will take over coding. But is that really the case? A recent study says that this is not necessarily the case.

The study, published in the June issue of IEEE Transactions on Software Engineering, compared code produced by ChatGPT with code written by human programmers, focusing on functionality, complexity, and security. The study found that ChatGPT’s success rate in producing functional code varied widely. Depending on the difficulty of the task, the programming language used, and other factors, the AI’s success ranged from as little as 0.66 percent to as much as 89 percent. This wide range suggests that while ChatGPT can sometimes match or even outperform human programmers, it also has significant limitations.

Yutian Tang, a lecturer at the University of Glasgow involved in the study, noted that AI-based code generation can increase productivity and automate some software development tasks. However, it is crucial to understand both the strengths and weaknesses of these AI models. Tang emphasized the need for comprehensive analysis to identify potential problems and improve AI code generation techniques.

To delve into these limitations, the research team tested GPT-3.5’s ability to solve 728 coding problems from the LeetCode platform in five programming languages: C, C++, Java, JavaScript, and Python. The study found that ChatGPT was quite adept at solving pre-2021 coding problems on LeetCode, achieving success rates of around 89 percent for easy problems, 71 percent for medium problems, and 40 percent for hard problems.

However, AI performance dropped significantly on coding problems introduced after 2021. For example, ChatGPT’s success rate on easy problems dropped from 89% to 52%. On hard problems, the success rate dropped from 40% to just 0.66%. This suggests that ChatGPT is struggling on newer coding problems, potentially because its training data does not cover these newer challenges.

Tang proposed a reasonable hypothesis for ChatGPT’s variable performance. He suggested that the AI performs better on algorithm problems from before 2021 because those problems are more likely to be included in its training dataset. As coding has evolved, ChatGPT has not been exposed to new problems and solutions because it lacks the critical thinking skills of a programmer. This limitation means that while ChatGPT can effectively solve problems it has encountered before, it struggles with newer, unfamiliar problems.

The study’s findings suggest that while AI models like ChatGPT promise to boost productivity and automate some coding tasks, they are not yet a substitute for human programmers. AI’s inability to solve newer coding problems underscores the need for ongoing development and training to keep up with the ever-evolving field of software engineering.

Posted by:

Divyanshi Sharma

Published:

July 8, 2024