Alibaba’s Latest A.I. Beats GPT-3.5, Claude In Multple Benchmark Tests

This is not investment advice. The author has no position in any of the stocks mentioned. has a disclosure and ethics policy.

With 2024 marking a strong start to the global artificial intelligence race, Chinese technology giant Alibaba Group has also announced the latest iteration of its Qwen artificial intelligence model. Apart from OpenAI’s ChatGPT, which is the most well known A.I. chatbot in the world, other models such as Meta’s Llama and Amazon partner Anthropic’s Claude are several options that consumers and businesses have when making the choice of an A.I. platform for their needs.

Alibaba’s latest Qwen iteration is Qwen 1.5, and according to benchmarks shared on the social media platform X, the model beats both ChatGPT and Claude in some benchmark scores.

Alibaba’s Qwen 1.5 Beats Claude and ChatGPT On Multiple Benchmarks Testing Instructional Fluidity

Just like operating systems that run on computers or smartphones, an artificial intelligence model is also a piece of software. This allows software engineers and analysts to evaluate its performance, and when it comes to Alibaba’s latest Qwen 1.5, some scores show that it outperforms Anthropic’s Claude and OpenAI’s ChatGPT.

Benchmarks that test operating systems evaluate their ability to process instructions and run applications, and those for artificial intelligence models typically revolve around them testing the models’ ability to generate outputs.

Two such benchmarks are MT-bench and Alapaca-Eval, and scores shared on X show that a variant of Alibaba’s Qwen 1.5 has surpassed ChatGPT and Claude in them. MT-bench tests a models’ ability to answer a set of pre defined questions that not only seek to differentiate it from chatbot but also try to determine if the model can ‘hold its ground’ in a tough conversational setting that involves two parties rapidly engaging with each other.

The benchmark scores show that Qwen was the fourth highest scorer in MT-bench, and it only lagged behind GPT-4 Turbo and the first two GPT-4 releases, namely versions 0613 and 0314.

Alapaca-Eval is a benchmark that uses a reference model to emulate human interactions and determine the extent to which an A.I. model being tested delivers results in line with the baseline. It also provides users with a leaderboard to track their tests, and today’s benchmarks show that Qwen 1.5’s Alapaca-Eval performance only lags behind GPT-4 Turbo and New York based HuggingFace’s Yi-34B.

Qwen1.5 is one of the largest open source models of its kind, and it’s backed by Alibaba’s massive computing resources. An open source A.I., like open source software, makes its code available to users and developers so that they can understand the model and make their own variants. Meta’s Llama, also present in today’s scores, is also an open source model.

The start of 2024 has seen renowned focus from Wall Street and companies on A.I. Earnings reports of mega cap technology giants such as Meta, Microsoft and Alphabet have all focused on A.I. Meta’s chief Mark Zuckerberg aims to buy hundreds of thousands of GPUs this year to power up Llama, and at the firm’s earnings call the executive explained that his decision to beef up computing capacity at Meta follows earlier oversights that led to the firm being under capacity.

Similarly, earnings from chip makers and designers TSMC and AMD have also seen their managements express optimism for the future of A.I. TSMC’s management is confident that the firm has stable footing to capture any A.I. demand, while AMD is of the view that A.I. can end up becoming worth hundreds of billions of dollars by the end of the decade.

Share this story