Any llama.cpp parameters to work around the looping?

#12
by tarruda - opened

It seems like this model likes to enter infinite loops. There are some reports that even Q8 displays this behavior https://huggingface.co/unsloth/MiMo-V2.5-GGUF/discussions/2

I'm curious if you experienced this and if you found any good values of repeat + presence penalties.

I've mostly stuck to the Pro version, but I haven't seen looping there probably due to me not really using the model for agentic / technical workloads. I've primarily used it for creative writing and conversation and I haven't had to use any repeat / presence penalties for that.

Many members on the b6k discord have been seeing looping as well in sglang and vllm so it's not isolated to llama.cpp. It also loops on MiMo's API too.

iq3 quant 2.5 only loops in agentic stuff even with the latest lllama.cpp server! hope they will fix it in the model or something, smart model TBH.

Sign up or log in to comment