Audio Flamingo

Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei Ping, Rafael Valle, Bryan Catanzaro

This repo contains the model checkpoints of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities (ICML 2024). Audio Flamingo is a novel audio-understanding language model with

strong audio understanding abilities,
the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and
strong multi-turn dialogue abilities.

We introduce a series of training techniques, architecture design, and data strategies to enhance our model with these abilities. Extensive evaluations across various audio understanding tasks confirm the efficacy of our method, setting new state-of-the-art benchmarks. Sound demos can be found in this website.

Code

Our code is at https://github.com/NVIDIA/audio-flamingo

License

The checkpoints are for non-commercial use only. They are subject to the OPT-IML license, the Terms of Use of the data generated by OpenAI, and the original licenses accompanying each training dataset.

Citation

@article{kong2024audio,
  title={Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities},
  author={Kong, Zhifeng and Goel, Arushi and Badlani, Rohan and Ping, Wei and Valle, Rafael and Catanzaro, Bryan},
  journal={arXiv preprint arXiv:2402.01831},
  year={2024}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Spaces using nvidia/audio-flamingo 10

Paper for nvidia/audio-flamingo

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Paper • 2402.01831 • Published Feb 2, 2024 • 16