Artificial Analysis

Artificial Analysis

Technology, Information and Internet

Independent analysis of AI models and hosting providers: https://artificialanalysis.ai/

About us

Leading provider of independent analysis of AI models and providers. Understand the AI landscape to choose the best AI technologies for your use-case.

Website
https://artificialanalysis.ai/
Industry
Technology, Information and Internet
Company size
11-50 employees
Type
Privately Held

Employees at Artificial Analysis

Updates

  • View organization page for Artificial Analysis, graphic

    6,542 followers

    Thanks for the support Andrew Ng! Completely agree, faster token generation will become increasingly important as a greater proportion of output tokens are consumed by models, such as in multi-step agentic workflows, rather than being read by people.

    View profile for Andrew Ng, graphic
    Andrew Ng Andrew Ng is an Influencer

    Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI

    Shoutout to the team that built https://lnkd.in/g3Y-Zj3W . Really neat site that benchmarks the speed of different LLM API providers to help developers pick which models to use. This nicely complements the LMSYS Chatbot Arena, Hugging Face open LLM leaderboards and Stanford's HELM that focus more on the quality of the outputs. I hope benchmarks like this encourage more providers to work on fast token generation, which is critical for agentic workflows!

    Model & API Providers Analysis | Artificial Analysis

    Model & API Providers Analysis | Artificial Analysis

    artificialanalysis.ai

  • View organization page for Artificial Analysis, graphic

    6,542 followers

    Anthropic’s Claude 3.5 Haiku release is a significant jump in intelligence from Claude 3 Haiku but its higher price makes it a tricky choice for developers Claude 3.5 Haiku now achieves an Artificial Analysis Quality Index of 69, substantially above Claude 3 Haiku’s 54 and just below the Claude 3 Opus’ 70. However, Anthropic has quadrupled per-token pricing from Claude 3 Haiku - leaving Claude 3.5 Haiku nearly 10x more expensive than Google’s latest Gemini 1.5 Flash and OpenAI’s GPT-4o mini. Our initial speed measurements also show Anthropic's Claude 3.5 Haiku API delivering ~2x slower output speeds than Claude 3 Haiku. While there are a range of factors that can drive both speed and pricing, we would speculate that these changes indicate that Claude 3.5 Haiku is a larger model than Claude 3 Haiku. Seeing Haiku achieve near Opus-level intelligence a mere 8 months since the original launch of the Claude 3 family is incredible but it enters a highly competitive market. See below for pricing comparisons, initial speed results, our full benchmark breakdown and a link to our analysis 👇

    • No alternative text description for this image
  • View organization page for Artificial Analysis, graphic

    6,542 followers

    Revealing red_panda as Recraft V3, the new frontier Text to Image model! Recraft V3 is the latest model from London-based AI graphic design start-up Recraft. Since launching Recraft V3 under the the pseudonym ‘red_panda’ we have received >100k votes. With an ELO of 1172, users are preferring Recraft V3 to every other model on the Artificial Analysis leaderboard. The Artificial Analysis Image Arena now has over one million votes and the Image Arena ELO score is the leading independent metric for comparing image generation models. See below for example images comparing Recraft V3 to other leading image models 👇

    • No alternative text description for this image
  • View organization page for Artificial Analysis, graphic

    6,542 followers

    Initial results from our AI video generation model arena are in! With almost 20k votes, we now have an initial ranking of video generation models 🥇 MiniMax's Hailuo is the clear leader with an ELO of 1092 and a win rate of 67% 🥈 Genmo's Mochi 1 model, released last week, takes the silver and is the leading open-source video generation model 🥉 Runway, a long-time leader in the video generation model space, takes bronze with Runway Gen 3 Alpha which has an ELO of 1051 and a win rate of 61% The Video Arena provides a comparison of video generation models across a wide variety of prompts. Each model has unique strengths, and so we encourage you to test them based on your specific use case. Link below to contribute to the Artificial Analysis Video Arena 👇 . After 30 votes you will also be able to see your own personalized ranking of the video generation models - feel free to share yours below in the comments.

    • No alternative text description for this image
  • View organization page for Artificial Analysis, graphic

    6,542 followers

    Inference optimization techniques can cause different prompts to run at different speeds. For example, speculative decoding uses a smaller draft model to generate speculative tokens for an LLM to verify. One implication of speculative decoding is that ‘simple’ prompts can get even faster speeds than normal/harder prompts! This occurs when a higher proportion of the draft model’s output tokens are accepted as correct by the target model. Below, you can see that for a prompt with simpler output tokens (repeating the Gettysburg Address), we see a much higher output speed than for a more complex prompt.

    • No alternative text description for this image
    • No alternative text description for this image
  • View organization page for Artificial Analysis, graphic

    6,542 followers

    Cerebras has launched a major upgrade and is now achieving >2,000 output token/s on Llama 3.1 70B, >3x their prior speeds This is a dramatic new world record for language model inference. Cerebras Systems' language model inference offering runs on their custom "wafer scale" AI accelerator chips. Cerebras had previously achieved speeds in this range for Llama 3.1 8B and is now delivering these speeds with a much larger model. We have independently benchmarked Cerebras’ updated offering and can confirm that we have observed no quality degradation in the latest version of the API. We understand that Cerebras is achieving these speeds with a range of optimizations throughout their inference stack, including speculative decoding. Speculative decoding is an inference optimization technique that uses a smaller draft model to generate speculative tokens for an LLM to verify. Speculative decoding does not impact quality when implemented correctly.

    • No alternative text description for this image
  • View organization page for Artificial Analysis, graphic

    6,542 followers

    Tesla discloses in their Q3 Earnings Deck that they will have a 50k H100 cluster at Gigafactory Texas by the end of October. Putting this in context, Tesla’s new H100 cluster will be larger than the rumoured sizes of the clusters that have been used to train current frontier language models. Tesla’s cluster would likely be able to complete the original GPT-4 training run (~3 months on ~25k A100s) in less than three weeks.

    • No alternative text description for this image
  • View organization page for Artificial Analysis, graphic

    6,542 followers

    Stability AI released Stable Diffusion 3.5 yesterday. Below are comparisons of how Stabile Diffusion has improved in the past year since SDXL in July 2023 We have also added Stability AI's Stable Diffusion 3.5 & the Turbo variant to our Image Arena. Our Image Arena crowdsources preferences to understand & compare the quality of image models - currently we have >800k preferences submitted. See Stable Diffusion 3.5 in our Image Arena, link below 👇

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • View organization page for Artificial Analysis, graphic

    6,542 followers

    Anthropic’s Claude 3.5 Sonnet leapfrogs GPT-4o, takes back the frontier and extends its lead in coding Our independent quality evals of Anthropic's Claude 3.5 Sonnet (Oct 2024) upgrade yesterday confirm a 3 point improvement in Artificial Analysis Quality Index vs. the original release in June. Improvement is reflected across evals and particularly in coding and math capabilities. This makes Claude 3.5 Sonnet (Oct 2024) the top scoring model that does not require the generation of reasoning tokens before beginning to generate useful output (ie. excluding OpenAI’s o1 models). With no apparent regressions and no changes to pricing or speed, we generally recommend an immediate upgrade from the earlier version of Claude 3.5 Sonnet. Maybe Claude 3.5 Sonnet (Oct 2024) can suggest next time to increment the version number - 3.6? See below for further analysis 👇

    • No alternative text description for this image
  • View organization page for Artificial Analysis, graphic

    6,542 followers

    Announcing Artificial Analysis Video Arena - the first crowdsourced comparison for Text to Video models Text to Video models are accelerating rapidly and crossing quality thresholds every month. We created Video Arena to compare them using the only source of truth for visual media - human preference! Video Arena includes hundreds of videos from the leading video models, including: - Runway Gen 3 Alpha - Pika 1.5 - Luma AI's Dream Machine - MiniMax / Hailuo AI - KLING - AI Videos's Kling 1.0 - Zhipu AI's CogVideoX-5B Voting is open now and we’ll be announcing the first leaderboard results within 24 hours. Any predictions? In the meantime, you can see your own ‘personal leaderboard’ of how you’ve ranked the video models after 30 votes. Contribute to the Video Arena! 🔗 https://lnkd.in/gXbjAjFE

Similar pages