Hacker News new | past | comments | ask | show | jobs | submit login

4bit should take up less than this, there are quite a few shared parameters between experts.

But unless you’re running bs=1 it will be painful vs 8x GPU as you’re almost certain to be activating most/all of the experts in a batch.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:
  翻译: