Unless something has changed, it needs to load the full 8 models at the same time. During inference it performs like a 2 x base model.
Mixtral 7B @ 5 bit takes up over 30gb on my M3 Max. That's over 90 for this at the same quantization. Realistically you probably need a 128gb machine to run this with good results.
Mixtral 7B @ 5 bit takes up over 30gb on my M3 Max. That's over 90 for this at the same quantization. Realistically you probably need a 128gb machine to run this with good results.