New upload taking more VRAM with MTP disabled?

#4
by grumd0 - opened

Hey AesSedai, love your models

Maybe you can help me out. I was running this model, IQ3_S with 167k context yesterday, today new GGUFs were uploaded, and now only 110k fits, even a bit more context and I'm OOM

Do MTP layers take extra VRAM even if MTP is not enabled? I don't have any extra VRAM/RAM to enable MTP anyway so I'd prefer running the model without it, but it's unfortunate that the new GGUFs cause me to have less context. Or is it a bug in llama.cpp which allocates extra memory even if MTP is not enabled?

Thanks!

Hi, I don't know if it requires more VRAM with MTP disabled, I would think not but it sounds like it might be worth an issue on the llama.cpp github if you can provide some more detail maybe?

I'd investigate a bit more, but I deleted the previous GGUFs of your model. Do you know if I can get the IQ3_S from before the latest upload? Am I correct that the latest reupload was adding MTP?
If I had the previous GGUF I'd at least be able to investigate why llama.cpp eats more VRAM with the newer model with mtp disabled.

I've been squashing the repository after uploads to save space, so the previous GGUFs are gone :(

Ах-ха-ха-ха! =D
I'm too delete file, download new and… =D
No vision today, kekeke.

Sign up or log in to comment