A Statistically Significant Test Proves That OpenAI’s GPT-4 Turbo is Particularly Lazy Over the Winter Breaks

This is not investment advice. The author has no position in any of the stocks mentioned. Wccftech.com has a disclosure and ethics policy.

Don’t ask OpenAI’s most cutting-edge Large Language Model (LLM), the GPT-4 Turbo, to perform exhaustive tasks over the winter holidays. That’s the conclusion that one can comfortably draw from a recent statistically significant test conducted by an LLM enthusiast.

OpenAI claims that GPT-4 Turbo is capable of handling highly complicated tasks encased within a single prompt, courtesy of its much more exhaustive training. The model is also capable of processing 128,000 tokens courtesy of its expanded token context window, a measure of the richness or depth of input and output of a particular LLM. As a refresher, 1,000 tokens are roughly equivalent to 750 words. This means that OpenAI’s latest offering is capable of processing an input of around 96,000 words.

Recently, Rob Lynch, an LLM enthusiast, put GPT-4 Turbo through its proverbial paces. To his utter surprise, the LLM produces a shorter response when it thinks that the current month is December vs. when it is prompted to believe that it is May.

Specifically, Lynch was able to obtain an average output of 4,298 tokens over 477 test runs from GPT-4 Turbo when it was prompted to believe that the current month was May. For December, the LLM gave a significantly shorter mean output of 4,086 tokens, equating to a decrease in productivity of around 5 percent.

While shedding light on the likely cause behind this discrepancy, Ethan Mollick, a professor at Wharton, believes that the GPT-4 Turbo learned from the human tendency to do less work in holiday-heavy December. This also suggests that these LLMs, despite exhaustive efforts to prevent the incursion of harmful human biases, still remain susceptible to inheriting some of the quirkier human shortcomings courtesy of training data infiltration.

This development comes on the heels of another one that suggested OpenAI’s GPT model was becoming progressively lazier, resorting to shortcuts instead of giving complete answers to queries. Some anecdotes suggest that users have been pretending to be handicapped to eke out complete answers from the LLM! The situation is apparently dire enough to prompt OpenAI to try to come up with a hotfix.

Share this story