Unless something has changed, it needs to load the full 8 models at the same tim...

brandall10 6 months ago | parent | context | favorite | on: Mistral AI Launches New 8x22B MOE Model

Unless something has changed, it needs to load the full 8 models at the same time. During inference it performs like a 2 x base model.

Mixtral 7B @ 5 bit takes up over 30gb on my M3 Max. That's over 90 for this at the same quantization. Realistically you probably need a 128gb machine to run this with good results.

fzzzy 6 months ago [–]

A 4 bit quant of the new one would still be about 70 gb, so yeah. Gonna need a lot more ram.

翻译：