How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "philschmid/t5-11b-sharded"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "philschmid/t5-11b-sharded",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Use Docker
docker model run hf.co/philschmid/t5-11b-sharded
Quick Links

Fork of t5-11b

This is fork of t5-11b implementing a custom handler.py as an example for how to use t5-11b with inference-endpoints on a single NVIDIA T4.


Model Card for T5 11B - fp16

model image

Use with Inference Endpoints

Hugging Face Inference endpoints can be used with an HTTP client in any language. We will use Python and the requests library to send our requests. (make your you have it installed pip install requests)

result

Send requests with Pyton

import json
import requests as r

ENDPOINT_URL=""# url of your endpoint
HF_TOKEN=""

# payload samples
regular_payload = { "inputs": "translate English to German: The weather is nice today." }
parameter_payload = {
    "inputs": "translate English to German: Hello my name is Philipp and I am a Technical Leader at Hugging Face",
  "parameters" : {
    "max_length": 40,
  }
}

# HTTP headers for authorization
headers= {
    "Authorization": f"Bearer {HF_TOKEN}",
    "Content-Type": "application/json"
}

# send request
response = r.post(ENDPOINT_URL, headers=headers, json=paramter_payload)
generated_text = response.json()

print(generated_text)
Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train philschmid/t5-11b-sharded