Artificial Analysis

Technology, Information and Internet

Independent analysis of AI models and hosting providers: https://artificialanalysis.ai/

View all 9 employees

About us

Leading provider of independent analysis of AI models and providers. Understand the AI landscape to choose the best AI technologies for your use-case.

Website: https://artificialanalysis.ai/
External link for Artificial Analysis
Industry: Technology, Information and Internet
Company size: 11-50 employees
Type: Privately Held

Employees at Artificial Analysis

See all employees

Updates

Artificial Analysis

6,542 followers
4mo
Report this post
Thanks for the support Andrew Ng! Completely agree, faster token generation will become increasingly important as a greater proportion of output tokens are consumed by models, such as in multi-step agentic workflows, rather than being read by people.

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI
4mo

Shoutout to the team that built https://lnkd.in/g3Y-Zj3W . Really neat site that benchmarks the speed of different LLM API providers to help developers pick which models to use. This nicely complements the LMSYS Chatbot Arena, Hugging Face open LLM leaderboards and Stanford's HELM that focus more on the quality of the outputs. I hope benchmarks like this encourage more providers to work on fast token generation, which is critical for agentic workflows!

Model & API Providers Analysis | Artificial Analysis

artificialanalysis.ai

1 Comment

Like Comment Share
Artificial Analysis

6,542 followers
1d
Report this post
Anthropic’s Claude 3.5 Haiku release is a significant jump in intelligence from Claude 3 Haiku but its higher price makes it a tricky choice for developers Claude 3.5 Haiku now achieves an Artificial Analysis Quality Index of 69, substantially above Claude 3 Haiku’s 54 and just below the Claude 3 Opus’ 70. However, Anthropic has quadrupled per-token pricing from Claude 3 Haiku - leaving Claude 3.5 Haiku nearly 10x more expensive than Google’s latest Gemini 1.5 Flash and OpenAI’s GPT-4o mini. Our initial speed measurements also show Anthropic's Claude 3.5 Haiku API delivering ~2x slower output speeds than Claude 3 Haiku. While there are a range of factors that can drive both speed and pricing, we would speculate that these changes indicate that Claude 3.5 Haiku is a larger model than Claude 3 Haiku. Seeing Haiku achieve near Opus-level intelligence a mere 8 months since the original launch of the Claude 3 family is incredible but it enters a highly competitive market. See below for pricing comparisons, initial speed results, our full benchmark breakdown and a link to our analysis 👇
8 Comments

Like Comment Share
Artificial Analysis

6,542 followers
6d
Report this post
Revealing red_panda as Recraft V3, the new frontier Text to Image model! Recraft V3 is the latest model from London-based AI graphic design start-up Recraft. Since launching Recraft V3 under the the pseudonym ‘red_panda’ we have received >100k votes. With an ELO of 1172, users are preferring Recraft V3 to every other model on the Artificial Analysis leaderboard. The Artificial Analysis Image Arena now has over one million votes and the Image Arena ELO score is the leading independent metric for comparing image generation models. See below for example images comparing Recraft V3 to other leading image models 👇
5 Comments

Like Comment Share
Artificial Analysis

6,542 followers
1w
Report this post
Initial results from our AI video generation model arena are in! With almost 20k votes, we now have an initial ranking of video generation models 🥇 MiniMax's Hailuo is the clear leader with an ELO of 1092 and a win rate of 67% 🥈 Genmo's Mochi 1 model, released last week, takes the silver and is the leading open-source video generation model 🥉 Runway, a long-time leader in the video generation model space, takes bronze with Runway Gen 3 Alpha which has an ELO of 1051 and a win rate of 61% The Video Arena provides a comparison of video generation models across a wide variety of prompts. Each model has unique strengths, and so we encourage you to test them based on your specific use case. Link below to contribute to the Artificial Analysis Video Arena 👇 . After 30 votes you will also be able to see your own personalized ranking of the video generation models - feel free to share yours below in the comments.
2 Comments

Like Comment Share
Artificial Analysis

6,542 followers
1w
Report this post
Inference optimization techniques can cause different prompts to run at different speeds. For example, speculative decoding uses a smaller draft model to generate speculative tokens for an LLM to verify. One implication of speculative decoding is that ‘simple’ prompts can get even faster speeds than normal/harder prompts! This occurs when a higher proportion of the draft model’s output tokens are accepted as correct by the target model. Below, you can see that for a prompt with simpler output tokens (repeating the Gettysburg Address), we see a much higher output speed than for a more complex prompt.
1 Comment

Like Comment Share
Artificial Analysis

6,542 followers
1w
Report this post
Cerebras has launched a major upgrade and is now achieving >2,000 output token/s on Llama 3.1 70B, >3x their prior speeds This is a dramatic new world record for language model inference. Cerebras Systems' language model inference offering runs on their custom "wafer scale" AI accelerator chips. Cerebras had previously achieved speeds in this range for Llama 3.1 8B and is now delivering these speeds with a much larger model. We have independently benchmarked Cerebras’ updated offering and can confirm that we have observed no quality degradation in the latest version of the API. We understand that Cerebras is achieving these speeds with a range of optimizations throughout their inference stack, including speculative decoding. Speculative decoding is an inference optimization technique that uses a smaller draft model to generate speculative tokens for an LLM to verify. Speculative decoding does not impact quality when implemented correctly.
3 Comments

Like Comment Share
Artificial Analysis

6,542 followers
1w
Report this post
Tesla discloses in their Q3 Earnings Deck that they will have a 50k H100 cluster at Gigafactory Texas by the end of October. Putting this in context, Tesla’s new H100 cluster will be larger than the rumoured sizes of the clusters that have been used to train current frontier language models. Tesla’s cluster would likely be able to complete the original GPT-4 training run (~3 months on ~25k A100s) in less than three weeks.
Like Comment Share
Artificial Analysis

6,542 followers
1w
Report this post
Stability AI released Stable Diffusion 3.5 yesterday. Below are comparisons of how Stabile Diffusion has improved in the past year since SDXL in July 2023 We have also added Stability AI's Stable Diffusion 3.5 & the Turbo variant to our Image Arena. Our Image Arena crowdsources preferences to understand & compare the quality of image models - currently we have >800k preferences submitted. See Stable Diffusion 3.5 in our Image Arena, link below 👇
1 Comment

Like Comment Share
Artificial Analysis

6,542 followers
1w
Report this post
Anthropic’s Claude 3.5 Sonnet leapfrogs GPT-4o, takes back the frontier and extends its lead in coding Our independent quality evals of Anthropic's Claude 3.5 Sonnet (Oct 2024) upgrade yesterday confirm a 3 point improvement in Artificial Analysis Quality Index vs. the original release in June. Improvement is reflected across evals and particularly in coding and math capabilities. This makes Claude 3.5 Sonnet (Oct 2024) the top scoring model that does not require the generation of reasoning tokens before beginning to generate useful output (ie. excluding OpenAI’s o1 models). With no apparent regressions and no changes to pricing or speed, we generally recommend an immediate upgrade from the earlier version of Claude 3.5 Sonnet. Maybe Claude 3.5 Sonnet (Oct 2024) can suggest next time to increment the version number - 3.6? See below for further analysis 👇
4 Comments

Like Comment Share
Artificial Analysis

6,542 followers
2w
Report this post
Announcing Artificial Analysis Video Arena - the first crowdsourced comparison for Text to Video models Text to Video models are accelerating rapidly and crossing quality thresholds every month. We created Video Arena to compare them using the only source of truth for visual media - human preference! Video Arena includes hundreds of videos from the leading video models, including: - Runway Gen 3 Alpha - Pika 1.5 - Luma AI's Dream Machine - MiniMax / Hailuo AI - KLING - AI Videos's Kling 1.0 - Zhipu AI's CogVideoX-5B Voting is open now and we’ll be announcing the first leaderboard results within 24 hours. Any predictions? In the meantime, you can see your own ‘personal leaderboard’ of how you’ve ranked the video models after 30 votes. Contribute to the Video Arena! 🔗 https://lnkd.in/gXbjAjFE

Like Comment Share

Artificial Analysis

Technology, Information and Internet

Independent analysis of AI models and hosting providers: https://artificialanalysis.ai/

About us

Employees at Artificial Analysis

George Cameron

Co-Founder at Artificial Analysis

Tong Zhang

Data Product | AI Benchmarking | Startmate Alumni

Kierem James Usta

LLB / BA at the University of Sydney | Harvard Summer School ‘24 | Maddocks Scholar

Clinton Lui

Scaling ANZ's largest AI Builder Community 🤝

Updates

Model & API Providers Analysis | Artificial Analysis

artificialanalysis.ai

Join now to see what you are missing

Similar pages

Groq

Unsloth AI

SambaNova Systems

Cerebras Systems

Build Club

Tempest AI

Agentsy

Perplexity

GoalGetter

Operator