The original AutoGLM-Phone-9B model supports multimodality, but this GGUF model does not support multimodality.

#1
by sean2342 - opened

AutoGLM-Phone-9B model needs to support image processing in order to be called by Open-AutoGLM to implement mobile phone automation operations. Currently, when this GGUF model is called, it prompts that image input is not supported. Looking forward to a multimodal GGUF format model, so it can be conveniently loaded in Ollama or LM Studio for local calls.

llama.cpp currently does unfortunately not support vision for the Glm4vForConditionalGeneration architecture. We will add vision support once llama.cpp support for it is implemented.

Sign up or log in to comment