https://huggingface.co/zai-org/GLM-4.6V-Flash

#1587
by SabinStargem - opened

This is a tiny member of the GLM family, weighing in at about 9b.

https://huggingface.co/zai-org/GLM-4.6V-Flash

We need to wait for https://github.com/ggml-org/llama.cpp/pull/16600 to be merged first. That PR unfortunately seems somewhat stale as the last commit was 1 month ago.

I could actually try this one thanks to https://github.com/ggml-org/llama.cpp/pull/14823 but it will lack any vision capabilities.

I am not sure is my "text only" support is valid for this one but you can try.

I am not sure is my "text only" support is valid for this one but you can try.

@jacek2024 It is. You are amazing! Thank you so much for adding text-only support for this one.
Maybe something similar could be done for Glm4vMoeForConditionalGeneration but assuming https://github.com/ggml-org/llama.cpp/pull/16600 is not forever abandoned waiting for it makes more sense.

-2000   19 GLM-4.6V-Flash                                run/imatrix (GPU-2d) 46/40 0.91s/c 0.3/4.8m(?-8.1) [11/315] 5.3983
-2000   19 si GLM-4.6V-Flash                               run/static 2/12,Q4_K_S [27/523] (hfu f16)

It's queued!

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#GLM-4.6V-Flash-GGUF for text-only quants to appear.

great! if Air won't be released we may try doing same for 4.6V

Sign up or log in to comment