Spaces:
Paused
Paused
| Given this context, what is litellm? LiteLLM about: About | |
| Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs). LiteLLM manages | |
| Translating inputs to the provider's completion and embedding endpoints | |
| Guarantees consistent output, text responses will always be available at ['choices'][0]['message']['content'] | |
| Exception mapping - common exceptions across providers are mapped to the OpenAI exception types. | |
| 10/05/2023: LiteLLM is adopting Semantic Versioning for all commits. Learn more | |
| 10/16/2023: Self-hosted OpenAI-proxy server Learn more | |
| Usage (Docs) | |
| Important | |
| LiteLLM v1.0.0 is being launched to require openai>=1.0.0. Track this here | |
| Open In Colab | |
| pip install litellm | |
| from litellm import completion | |
| import os | |
| ## set ENV variables | |
| os.environ["OPENAI_API_KEY"] = "your-openai-key" | |
| os.environ["COHERE_API_KEY"] = "your-cohere-key" | |
| messages = [{ "content": "Hello, how are you?","role": "user"}] | |
| # openai call | |
| response = completion(model="gpt-3.5-turbo", messages=messages) | |
| # cohere call | |
| response = completion(model="command-nightly", messages=messages) | |
| print(response) | |
| Streaming (Docs) | |
| liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response. | |
| Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.) | |
| from litellm import completion | |
| response = completion(model="gpt-3.5-turbo", messages=messages, stream=True) | |
| for chunk in response: | |
| print(chunk['choices'][0]['delta']) | |
| # claude 2 | |
| result = completion('claude-2', messages, stream=True) | |
| for chunk in result: | |
| print(chunk['choices'][0]['delta']) |