https://huggingface.co/zai-org/GLM-4.6V-Flash
This is a tiny member of the GLM family, weighing in at about 9b.
We need to wait for https://github.com/ggml-org/llama.cpp/pull/16600 to be merged first. That PR unfortunately seems somewhat stale as the last commit was 1 month ago.
I could actually try this one thanks to https://github.com/ggml-org/llama.cpp/pull/14823 but it will lack any vision capabilities.
I am not sure is my "text only" support is valid for this one but you can try.
I am not sure is my "text only" support is valid for this one but you can try.
@jacek2024
It is. You are amazing! Thank you so much for adding text-only support for this one.
Maybe something similar could be done for Glm4vMoeForConditionalGeneration but assuming https://github.com/ggml-org/llama.cpp/pull/16600 is not forever abandoned waiting for it makes more sense.
-2000 19 GLM-4.6V-Flash run/imatrix (GPU-2d) 46/40 0.91s/c 0.3/4.8m(?-8.1) [11/315] 5.3983
-2000 19 si GLM-4.6V-Flash run/static 2/12,Q4_K_S [27/523] (hfu f16)
It's queued!
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#GLM-4.6V-Flash-GGUF for text-only quants to appear.
great! if Air won't be released we may try doing same for 4.6V