How to Improve GenAI Troubleshooting

In 2024, enterprise software producers are focusing on generative artificial intelligence to increase productivity. OpenAI recently released GPT-4o, which includes voice and image interpretation and generation. Ravi Sawhney discusses how to incorporate this technology into end-user workplace technology and introduces the concept of multi-agent workflows, an idea that allows organizations to emulate entire knowledge assemblies.

In 2021, I wrote an article for Overview of LSE activities where I demonstrated the power of OpenAI’s GPT3 to interpret human language by converting it to code. At the time, the technology was in its infancy and didn’t create the spark that ChatGPT did when it was released to the public in November 2022. This moment truly marked the beginning of the Generative Artificial Intelligence (GenAI) boom. Below are some personal thoughts on the importance of GenAI and the challenges of using it at work. I also introduce the concept of multi-agent workflows as a way to expand the potential of where we can go.

It’s all about productivity

In 2024, almost all enterprise software vendors are betting on GenAI, which has perhaps taken some of the focus away from existing machine learning approaches such as supervised and unsupervised learning, which continue to be key components of any complete AI environment. All the reasons why organizations do this relate to what interested me in this technology in the first place: productivity.

In its more basic form, GenAI can be considered the most powerful autofill technology we have ever seen. The ability of large language models (LLM) to predict the next word is so great that they can step in and perform knowledge worker tasks such as classification, editing, summarization, questions and answers, as well as content creation.

Additionally, variations of this technology can operate in different modalities, much like the human senses, including interpretation and generation through voice and image. In fact, in 2024 the nomenclature will change from LLM to Large Multimodal Models (LMM), as evidenced by the recent release of GPT-4o from OpenAI. Whether the onboarding process is advisory, human-in-the-loop, or fully automated decision-making, it’s easy to see how GenAI can deliver transformational productivity gains in the knowledge work sector. A recent article on this topic estimated that GenAI, when used to automate tasks, could increase labor productivity by 3.3 percentage points per year, creating a global GDP of $4.4 trillion.

The productivity gains perhaps bring us closer to Keynes’s aspirations when he wrote “Economic Opportunities for Our Grandchildren” in 1930, in which he predicted that in a hundred years, thanks to technological advances that improve living standards, we will all be able to do 15-hour workweeks. This opinion was echoed by Nobel Prize winner in economics Sir Chirstopher Pissarides, who stated that ChatGPT could herald a 4-day working week.

So if the potential for a significant shift in the way we work is right in front of us and growing at a breakneck pace, how do we bridge the gap to make that opportunity a reality?

Trust and tools

When incorporating this technology into end-user workplace technology, there are two common challenges to consider. Probably the biggest one is managing the issue of trust. By default, LLMs do not have access to your private information, so asking about a very specific support issue with your product will usually yield a confident but inaccurate answer, usually referred to as “hallucinations.” Customizing LLM on your own data is one option, albeit an expensive one considering the hardware requirements. A much more affordable method that has become common in the community is called recovery assisted generation (RAG). This is where your private data is fed into the query query, using the power of embedding to perform searches based on your query. The resulting answer is synthesized from this data along with existing LLM knowledge, resulting in something that can be considered useful, albeit with some appropriate guidance for the user.

The second challenge is math. Although high school learners, with a little careful prompting, could create a unique, compelling, and (importantly) persuasive story from scratch, depending on the primary model used, they would struggle with elementary to intermediate math. This is where the community introduced the concept of tools, sometimes called agents. In this paradigm, LLM can categorize the question being asked and, rather than trying to answer it, invoke the appropriate “tool” for the task at hand. For example, if it is asked about the weather outside, it can invoke the weather API service. If asked to perform a calculation, it will redirect the query to the calculator API. And if it needs to retrieve information from a database, it can convert the request to SQL or Pandas, execute the resulting code in a sandbox environment, and return the result to a user who may be none the wiser about what’s going on under the hood.

The potential of multi-agent workflows

Agent-based structures and tools expand the possibilities of using LLM to solve real problems. However, they are still largely unable to perform complex knowledge-based tasks due to limitations such as lack of memory, planning and reasoning abilities. Multi-agent structures provide an opportunity to address some of these challenges. A good way to understand how these might work is to contrast System 1 and System 2 thinking, popularized by Daniel Kahneman.

Think of System 1 as your instincts: fast, automatic, and intuitive. In the LLM world, this is similar to a model’s ability to generate human-like responses based on massive training data. In contrast, System 2 thinking is slower, more deliberate, and logical, representing the model’s ability for structured, step-by-step problem solving and reasoning.

To unlock the full potential of LLM, we need to develop techniques that leverage the capabilities of both System 1 and System 2. By breaking down complex tasks into smaller, manageable steps, we can guide LLM toward more structured and reliable problem solving, much like humans solve challenges.

Consider a team of agents that have been assigned a specific role through rapid engineering and who work together to achieve a single goal. This is essentially what agent workflows, sometimes called agent workflows, do in the case of LLM. Each agent is responsible for a specific subtask and communicates with each other, passing information and results back and forth until the entire task is completed. By designing prompts that encourage logical reasoning, step-by-step problem solving, and collaboration with other agents, we can create a system that mimics the thoughtful and rational thinking associated with System 2.

Here’s where it gets exciting: agent workflows can allow us to emulate entire teams of knowledge. Imagine a virtual team of AI agents, each with their own workflow specialization, collaborating to solve problems and make decisions just as a human team would. This could revolutionize the way we work, enabling us to tackle more complex challenges with zero or minimal human supervision. This also opens up the idea of allowing us to simulate how teams respond to events in a sandbox environment, where each team member is modeled as an agent in the workflow. Conversation results can even be saved for later retrieval, serving as long-term memory.

By combining the raw thinking power of System 1 with the structured reasoning of System 2, we can create artificial intelligence systems that not only generate human-like responses, but can also handle more complex tasks and move toward problem solving. The future of work is here and powered by the symbiosis of human ingenuity and artificial intelligence.

Authors’ Disclaimer: All views expressed are my own.
This blog post represents the views of the authors and not the position of the LSE Business Review or the London School of Economics and Political Science.
Featured image courtesy of Shutterstock
By leaving a comment, you agree to our Comments Policy.