Hacker News new | past | comments | ask | show | jobs | submit login

I’ve been testing a lot of LLMs on my MacBook and I would say that all of them are far away from being as good as GPT-4, at any time. Many are as good as GPT-3 though. There are also a lot of models that are fine tuned for specific tasks.

Language support is one big thing that is missing from open models. I’ve only found one model that can do anything useful with Norwegian, which has never been an issue GPT-4.




Which ones have you tested? There were some huge ones released recently.


Samantha, llama 2 pubmed, marcoroni, openchat, fashiongpt, falcon 180B, deepseek llm chat, orca 2, orca 2 alpac uncersored, meditron, tigerbot, mixtral instruct, wizardcoder, gemma, nouse hermes 2 solar, yarn solar 64k, nouse hermes 2 yi, nous hermes 2 mixtral, nouse hermes llama 2, starcode2, hermes 2 pro mistral, norskgpt mistral and norskgpt llama.

Nouse Hermes 2 Solar is the best model for Norwegian that I've tried so far. It's much better than NorskGPT Mistral/Llama. I actually got it to make fairly decent summaries of news articles, though it wouldn't follow any stricter commands like producing 5 keywords in a json list. Kept producing more than 5 keywords and if I doubled down on the restriction on the number of keywords it would start messing up the json.

The best competitor to GPT-4 was falcon 180b, it's still terrible compared to GPT-4. Mixtral is my new favourite though, it's faster than falcon and in general as good or better. Though I would still pick GPT-4 over Mixtral any day of the week, it's leagues ahead of Mixtral.

Tigerbot has a very interesting trait. It tends to disagree when you try to convince it that it's wrong.

I haven't been able to test out the new 8x22 mixtral or command r plus. These are the next ones on my list!


Just tested out Command R+ with some niche SHACL constraint questions and it performs considerably worse than GTP-4. Might be a bit better than GPT-3.5 though, which is actually pretty amazing.


You need to use their beginning and end token scheme and set rep pen to 1 to get good quality out of cr+.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:
  翻译: