How to optimize ChatGPT and other LLMs for software engineering

robot coding assistant — Image generated with Bing Image Creator

This article is part of our review of the latest artificial intelligence research.

Since the release of ChatGPT, software engineers and organizations have been looking for ways to leverage large language models (LLM) to increase productivity. There are many examples of LLM generating code for complex problems, but not enough information on how it is integrated into the software development process.

In a recent study, a team of researchers from Chalmers University of Technology – University of Gothenburg and the Swedish research institute RISE observed 24 professional software engineers from 10 different companies who used ChatGPT in their daily tasks for a week. Their findings provide important insights into the types of tasks software engineers use ChatGPT for and the factors that influence their experience.

The findings have important implications for companies looking to integrate LLM into their workflows.

Software engineering tasks for ChatGPT

The study, which includes an overview of chat sessions with ChatGPT 3.5 and a survey, shows that software engineers use LLM for three main categories of tasks:

Code generation and modification: This category, which researchers call “artifact manipulation,” includes tasks such as code generation, refactoring, and repair. Artifact manipulation accounts for approximately one third of ChatGPT interactions. These interactions are typically short because users either get the results they want quickly or give up trying. However, longer dialogues occur when users persistently try to obtain sources from ChatGPT or correct errors in the generated solutions.

Expert consultations: Engineers often ask ChatGPT for resources, instructions, advice, or detailed information to help them with work tasks. The purpose of these interactions is not to obtain a specific solution, but rather to receive a nudge in the right direction. In these interactions, ChatGPT serves as a virtual collaborator or a more productive alternative to searching the Internet. Consultations accounted for 62% of the interactions of software engineers who participated in the study.

Tips and learning: Software engineers sometimes use ChatGPT to gain broader theoretical or practical knowledge related to their professional tasks. These dialogues constitute a small part of the interaction, but often include many follow-up questions to clarify previous answers.

Advantages and disadvantages

ChatGPT’s greatest strength was helping software engineers learn new concepts. Interacting with LLM about its inside knowledge is much easier than searching for resources on the Internet.

Study participants also used ChatGPT to assist them in their brainstorming sessions. LLMs can help generate many alternative solutions and ideas that can be valuable in the planning and design stages of software development.

On the other hand, some participants stated that they did not trust the generated artifacts, especially for complex and company-specific tasks. This lack of trust often led to careful double-checking of any suggestions provided by ChatGPT, which could be counterproductive.

Another significant problem is the lack of context. LLMs do not know company-specific information and must be provided with this context. Compiling and providing this context at the prompt adds friction that worsens the user experience. In some cases, privacy concerns and company policies prevent engineers from sharing detailed information, which can lead to frustration and incomplete interactions.

Another important trade-off of using ChatGPT that is less frequently mentioned in other studies is reduced team communication and concentration. Participants sometimes used the chatbot to answer questions that could be better directed to a colleague. ChatGPT can reduce focus because engineers may spend too much time refining hints to generate perfectly working code, rather than fixing slightly flawed results themselves.

What does this mean for businesses?

If you’re hiring software engineers, improving their productivity with an LLM depends on strengthening the benefits and minimizing the trade-offs. This study was performed on ChatGPT 3.5, which is currently very far from the performance of pioneering models such as GPT-4o and Claude 3 Opus. Current models have broader knowledge and are much better at preventing hallucinations and false information.

If you liked this article, please consider supporting TechTalks with a paid subscription (and gain access to subscriber-only posts)

However, when it comes to enterprise applications, there are several problems that even pioneering models cannot solve. One of them is context. No matter how much training a model receives, it will know nothing about your company’s proprietary information.

Having chat interfaces that automatically provide contextual information to the model as engineers interact with them will play a key role in taking the user experience to the next level. This can be done in several ways, including extended fetch generation (RAG), where contextual information is automatically added to the user prompt before being sent to the model. Alternatively, LLM can be integrated into the user’s IDE, where it automatically uses the code and project files as context when answering questions.

Another problem that needs to be solved are restrictions on privacy and data sharing. Many companies have strict policies regarding the type of information that can be shared with third parties. This may limit the types of interactions engineers can have with LLMs such as ChatGPT. The solution is to use open models like Llama 3. The open source community has made impressive progress alongside private models. You can also run them on your servers, integrate them with your infrastructure and be sure that data will never leave your organization.

Another issue raised in the study is the involvement of energy engineers in model development. The way you phrase your request and place your instructions has a significant impact on your LLM performance. Reducing the hassle of rapid engineering can help improve user experience and save time engineers spend interacting with LLM. An impressive move in this regard is Anthropic’s hint generator, which automatically creates the optimal hint for the task you want to perform. Another example is OPRO, a technique developed by DeepMind that automatically optimizes tooltips.

Finally, the study mentioned decreased concentration due to the use of ChatGPT. This challenge can be alleviated to some extent by incorporating the LLM into teamwork. An interesting example is Glue, a new enterprise chat app that adds LLM as an agent to discussion threads. Moving from an isolated LLM experience to including an agent in group conversations can yield some really interesting results.

There is no doubt that LLMs will be great tools – but not replacements – for software engineers. Creating the right scaffolding to use them will ensure you get the most out of them.