Spaces:
Paused
Paused
File size: 1,691 Bytes
447ebeb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
Given this context, what is litellm? LiteLLM about: About
Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs). LiteLLM manages
Translating inputs to the provider's completion and embedding endpoints
Guarantees consistent output, text responses will always be available at ['choices'][0]['message']['content']
Exception mapping - common exceptions across providers are mapped to the OpenAI exception types.
10/05/2023: LiteLLM is adopting Semantic Versioning for all commits. Learn more
10/16/2023: Self-hosted OpenAI-proxy server Learn more
Usage (Docs)
Important
LiteLLM v1.0.0 is being launched to require openai>=1.0.0. Track this here
Open In Colab
pip install litellm
from litellm import completion
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["COHERE_API_KEY"] = "your-cohere-key"
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)
# cohere call
response = completion(model="command-nightly", messages=messages)
print(response)
Streaming (Docs)
liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response.
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)
from litellm import completion
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for chunk in response:
print(chunk['choices'][0]['delta'])
# claude 2
result = completion('claude-2', messages, stream=True)
for chunk in result:
print(chunk['choices'][0]['delta']) |