Mamba + Tensor Parallel Support #1184

haileyschoelkopf · 2024-03-12T15:36:37Z

This PR adds Mamba + TP support.

Loss curves comparing TP=2 to TP=1 with + without mamba_inner_func_fusion:

Versus when allreduce was missing with inner func fusion turned on:

Also tested that PP seems to work.

Quentin-Anthony · 2024-03-13T01:30:38Z

LGTM, no comments.

@haileyschoelkopf -- As a final check, I'd like to verify that TP gives the expected memory benefits. Can you link the wandb here so I can take a look?

haileyschoelkopf · 2024-03-13T11:41:05Z

Yep! https://wandb.ai/eleutherai/mamba-neox-tp-memsavings/workspace?nw=nwuserschoelkopf

this should be public and a clean comparison, lmk if more is needed or for some reason it's not visible (MBS=16 for both TP=1 and TP=2, corrected for w/ grad accum.) Seeing 29-30GB for TP=1, 19GB for TP = 2.

Here's the full wandb of trial runs including initial tests + adding stuff like Mamba's GPT-2 style init, Mamba-160m, and some Pythia-160m baseline curves: https://wandb.ai/eleutherai/mamba-neox?nw=nwuserschoelkopf

haileyschoelkopf added 6 commits March 12, 2024 13:49

TP works!

c95b6e2

merge TP mamba changes with most current MambaLayer

28bd76d

cleanup TP, confirmed working still

e669681

make shapes with TP>1 work with conversion

f544092

tested and PP works, so no need for assert blocking it in arguments

f82db1c

update comment

696454f

haileyschoelkopf requested a review from Quentin-Anthony as a code owner March 12, 2024 15:36

Update NeoXArgs docs automatically

4bc39c3

Merge branch 'main' into tp-mamba-neox

f70c54d

Quentin-Anthony previously approved these changes Mar 15, 2024

View reviewed changes

Update NeoXArgs docs automatically

417f885

github-actions bot dismissed Quentin-Anthony’s stale review via 417f885 March 15, 2024 14:42

Quentin-Anthony approved these changes Mar 15, 2024

View reviewed changes

Quentin-Anthony merged commit 277141e into main Mar 15, 2024
2 checks passed

Quentin-Anthony deleted the tp-mamba-neox branch March 15, 2024 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mamba + Tensor Parallel Support #1184

Mamba + Tensor Parallel Support #1184

haileyschoelkopf commented Mar 12, 2024

Quentin-Anthony commented Mar 13, 2024

haileyschoelkopf commented Mar 13, 2024 •

edited

Loading

Mamba + Tensor Parallel Support #1184

Mamba + Tensor Parallel Support #1184

Conversation

haileyschoelkopf commented Mar 12, 2024

Quentin-Anthony commented Mar 13, 2024

haileyschoelkopf commented Mar 13, 2024 • edited Loading

haileyschoelkopf commented Mar 13, 2024 •

edited

Loading