AI vs financial analysts: Who is better at predicting earnings of public companies?

The research—which is preliminary and requires validation—raises questions about “whether financial analysts will continue to be the backbone of informed decision-making in financial markets,” says Valeri V. Nikolaev, professor of accounting at the University of Chicago, who co -back to the study.

“It’s very important to have a human in the loop, of course,” Nikolaev says, but AI has the potential to be more than a “support tool” aiding financial analysts.

Instead, he says, large language models, or LLMs, could be “more centrally located in the decision-making process, taking the driver’s seat, and the human is looking over the shoulder of the (AI) model.”

How do they compare?

To compare LLMs with human analysts, researchers created standardized forms for companies’ balance sheets and income statements and had OpenAI’s GPT-4 Turbo analyze them. The LLM was tasked with mimicking a financial analyst by computing many of the ratios used to evaluate stocks, including operating efficiency, liquidity and leverage ratios.

One concern was that the pretrained LLM could determine a company’s identity based on publicly available documents, allowing it to determine whether earnings were higher or lower the following year, so researchers removed company names and identifying information from the statements, and nonnumeric characters and symbols were used to replace the specific years mentioned.

Based solely on analysis of balance sheets and income statements, the LLM was asked to predict whether individual companies would post higher or lower earnings in the subsequent year. In addition, the LLM was asked to characterize the magnitude of any change in earnings as small, moderate or large. Last, the LLM was told to rate how confident it was in each of its predictions.

In total, the LLM analyzed the financial statements of 15,401 companies from 1968 to 2021. Researchers then compared the LLM’s predictions with those of human financial analysts covering 3,152 companies from 1983 to 2021. The team calculated the median earnings forecast from multiple analysts for each stock and compared that figure with the LLM’s predictions.

Analysis revealed that the LLM correctly predicted whether a company’s earnings would grow or shrink in the subsequent year 60.35% of the time. By comparison, when human analysts made their predictions within a month of a company posting its annual financial documents, they had an accuracy rate of just 52.71%, the paper said.

Human analysts’ accuracy rates improve as they update their predictions over the course of the year, taking into account the latest information, but still they fail to match those of the LLM, researchers found. For example, three months after a company releases its annual income statement and balance sheet, human analysts’ have an accuracy rate of 55.95% when predicting whether earnings will grow or shrink in the coming year.

The LLM’s predictions weren’t updated throughout the year, but they still proved more accurate. That performance is noteworthy since its predictions are based on financial data alone, whereas analysts can incorporate into their predictions “specific narrative contexts,” such as comments from management, researchers said.

For now, that broader context enables human analysts to outperform the LLM when evaluating smaller, money-losing firms, the paper said. Relevant factors unavailable to the LLM include knowledge of the industry and the regulatory, political and macroeconomic environments.

When times are tough

The LLM’s accuracy rate dips during times of economic shock, such as the 1974 oil shortage, the 2008 financial crisis and the Covid-19 pandemic, the paper said. Since the LLM’s predictions are based solely on financial statements, human analysts are more accurate when external factors loom large, the researchers said.

Researchers used the LLM to build theoretical stock portfolios that returned an average of 12% a year in some models, outperforming the overall market. Based on the LLM’s confidence in its predictions, researchers bought the top 10% of stocks that were projected to see moderate or large earnings growth. They also sold short the bottom 10% of stocks expected to see moderate or large decreases in earnings.

“Taken together, our results suggest that GPT can outperform human analysts by performing financial-statement analysis even without any specific narrative contexts,” researchers said in the paper. “Given that GPT outperforms human analysts in predicting future earnings, this finding raises the question of whether an LLM can largely replace a median human analyst.”

Nick Fortuna is a writer in Ocala, Fla. He can be reached at [email protected].