Hacker News new | past | comments | ask | show | jobs | submit login

Why are some of their models open, and others closed? What is their strategy?



My personal speculation is that their closed models are based on other companies' models.

For example on EQbench[0], Miqu[1], a leaked continued pretrain based on LLama2, performs extremely similar to the mistral medium model their API offers.

Maybe they're thinking it'd be bad PR for them to release models they didn't create from scratch, or there is some contractual obligation preventing the release.

[0]https://eqbench.com/index.html

[1]https://huggingface.co/miqudev/miqu-1-70b


That's quite likely, some have also speculated that Mistral 7B got some EU grant funding that stipulated it had to be openly released later, and Mixtral is based on Mistral 7B so it would likely be subject to the same terms. I haven't found any source to substantiate it though.


Mistral have stated they want to chase the fine-tune dollar to support le research. We should get thrown a bone of hard to tune mid-range stuff occasionally. Especially when big announcements about small models are expected later in the week (llama3) or when haiku is stealing the thunder from mixtral 8x7b.


It's gotta be either perceived value or training data/licensing restrictions.


I am not sure why some are open and some are closed - if I had to speculate, it’s perhaps that the commercial models help fund the team. They come with safety features built-in as well as API-based access (instead of needing to self-host). They word their mission (https://mistral.ai/company/#missions) as follows:

> Our mission is to make frontier AI ubiquitous, and to provide tailor-made AI to all the builders. This requires fierce independence, strong commitment to open, portable and customisable solutions, and an extreme focus on shipping the most advanced technology in limited time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:
  翻译: