Instructions to use SillyTilly/google-gemma-2-27b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SillyTilly/google-gemma-2-27b-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SillyTilly/google-gemma-2-27b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SillyTilly/google-gemma-2-27b-it") model = AutoModelForCausalLM.from_pretrained("SillyTilly/google-gemma-2-27b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SillyTilly/google-gemma-2-27b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SillyTilly/google-gemma-2-27b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SillyTilly/google-gemma-2-27b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/SillyTilly/google-gemma-2-27b-it
- SGLang
How to use SillyTilly/google-gemma-2-27b-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SillyTilly/google-gemma-2-27b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SillyTilly/google-gemma-2-27b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SillyTilly/google-gemma-2-27b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SillyTilly/google-gemma-2-27b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use SillyTilly/google-gemma-2-27b-it with Docker Model Runner:
docker model run hf.co/SillyTilly/google-gemma-2-27b-it
| url: https://huggingface.co/google/gemma-2-27b-it | |
| branch: main | |
| download date: 2024-06-27 17:16:05 | |
| sha256sum: | |
| 74ca36bee640853cbdcae85119eabeb73cc0c63ab852136de8a330f2f9c33697 model-00001-of-00012.safetensors | |
| 13354bc14e0eddeb1e50bc7cce9f5219e2d1492bf8addfb8efabfa5d45de768d model-00002-of-00012.safetensors | |
| 3a8c6ed92ec81bce871a1eb5538143c5d336a12ca592887ca37865ba2614d32c model-00003-of-00012.safetensors | |
| f06d4211eda8e2546c7fdb38c655a96bd146b28b382ee007be29416faa8c5e46 model-00004-of-00012.safetensors | |
| 065680559b7b99fcbf3359fbe3676eadb8cfd1999fc6670789c8d01de716ec27 model-00005-of-00012.safetensors | |
| a7f3b81dd009049c07206c2d4479a2ab55d12ec54743f686bad36d09707affc7 model-00006-of-00012.safetensors | |
| 679fcd6c8ad0ecbac864141244edfeed7fb594887ab7216a19e53c2f0500f350 model-00007-of-00012.safetensors | |
| 2ae127df61c37cbcf1e2bed5fd3fc6eea3ec37c15f6dededa279d7bba7d83184 model-00008-of-00012.safetensors | |
| 4950140e96b75421ac5064966659e9f23a7741632ead93e4df0319b9fd6140fa model-00009-of-00012.safetensors | |
| 5e1d04f78fc6e518c1dab8fb00691af3226868fdde6046a548f9bc93eb308322 model-00010-of-00012.safetensors | |
| c1448b676de8e8c21a7585ed60073ecf54f448082adedc125ca33a35246b198e model-00011-of-00012.safetensors | |
| 37632f98b340e76c1c345d1681ea05bc3c8c9267e44ab5074dd5ebccb274081f model-00012-of-00012.safetensors | |
| 7da53ca29fb16f6b2489482fc0bc6a394162cdab14d12764a1755ebc583fea79 tokenizer.json | |
| 61a7b147390c64585d6c3543dd6fc636906c9af3865a5548f27f31aee1d4c8e2 tokenizer.model | |