Instructions to use azale-ai/DukunLM-7B-V1.0-Uncensored with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use azale-ai/DukunLM-7B-V1.0-Uncensored with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="azale-ai/DukunLM-7B-V1.0-Uncensored")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored") model = AutoModelForCausalLM.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use azale-ai/DukunLM-7B-V1.0-Uncensored with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "azale-ai/DukunLM-7B-V1.0-Uncensored" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "azale-ai/DukunLM-7B-V1.0-Uncensored", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/azale-ai/DukunLM-7B-V1.0-Uncensored
- SGLang
How to use azale-ai/DukunLM-7B-V1.0-Uncensored with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "azale-ai/DukunLM-7B-V1.0-Uncensored" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "azale-ai/DukunLM-7B-V1.0-Uncensored", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "azale-ai/DukunLM-7B-V1.0-Uncensored" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "azale-ai/DukunLM-7B-V1.0-Uncensored", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use azale-ai/DukunLM-7B-V1.0-Uncensored with Docker Model Runner:
docker model run hf.co/azale-ai/DukunLM-7B-V1.0-Uncensored
DukunLM V1.0 - Indonesian Language Model 🧙♂️
🚀 Welcome to the DukunLM V1.0 repository! DukunLM V1.0 is an open-source language model trained to generate Indonesian text using the power of AI. DukunLM, meaning "WizardLM" in Indonesian, is here to revolutionize language generation 🌟. This is an updated version from azale-ai/DukunLM-Uncensored-7B with full model release, not only adapter model like before 👽.
Model Details
| Name Model | Parameters | Google Colab | Base Model | Dataset | Prompt Format | Fine Tune Method | Sharded Version |
|---|---|---|---|---|---|---|---|
| DukunLM-7B-V1.0-Uncensored | 7B | Link | ehartford/WizardLM-7B-V1.0-Uncensored | MBZUAI/Bactrian-X (Indonesian subset) | Alpaca | QLoRA | Link |
| DukunLM-13B-V1.0-Uncensored | 13B | Link | ehartford/WizardLM-13B-V1.0-Uncensored | MBZUAI/Bactrian-X (Indonesian subset) | Alpaca | QLoRA | Link |
⚠️ Warning: DukunLM is an uncensored model without filters or alignment. Please use it responsibly as it may contain errors, cultural biases, and potentially offensive content. ⚠️
Installation
To use DukunLM, ensure that PyTorch has been installed and that you have an Nvidia GPU (or use Google Colab). After that you need to install the required dependencies:
pip3 install -U git+https://github.com/huggingface/transformers.git
pip3 install -U git+https://github.com/huggingface/peft.git
pip3 install -U git+https://github.com/huggingface/accelerate.git
pip3 install -U bitsandbytes==0.39.0 einops==0.6.1 sentencepiece
How to Use
Normal Model
Stream Output
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model = AutoModelForCausalLM.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored", torch_dtype=torch.float16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored")
streamer = TextStreamer(tokenizer)
instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
input_prompt = ""
if not input_prompt:
prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
"""
prompt = prompt.format(instruction=instruction_prompt)
else:
prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:
"""
prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
_ = model.generate(
inputs=inputs.input_ids,
streamer=streamer,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
max_length=2048, temperature=0.7,
do_sample=True, top_k=4, top_p=0.95
)
No Stream Output
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored", torch_dtype=torch.float16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored")
instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
input_prompt = ""
if not input_prompt:
prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
"""
prompt = prompt.format(instruction=instruction_prompt)
else:
prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:
"""
prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
inputs=inputs.input_ids,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
max_length=2048, temperature=0.7,
do_sample=True, top_k=4, top_p=0.95
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Quantize Model
Stream Output
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer
model = AutoModelForCausalLM.from_pretrained(
"azale-ai/DukunLM-7B-V1.0-Uncensored-sharded",
load_in_4bit=True,
torch_dtype=torch.float32,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
)
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored-sharded")
streamer = TextStreamer(tokenizer)
instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
input_prompt = ""
if not input_prompt:
prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
"""
prompt = prompt.format(instruction=instruction_prompt)
else:
prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:
"""
prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
_ = model.generate(
inputs=inputs.input_ids,
streamer=streamer,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
max_length=2048, temperature=0.7,
do_sample=True, top_k=4, top_p=0.95
)
No Stream Output
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
model = AutoModelForCausalLM.from_pretrained(
"azale-ai/DukunLM-7B-V1.0-Uncensored-sharded",
load_in_4bit=True,
torch_dtype=torch.float32,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
)
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored-sharded")
instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
input_prompt = ""
if not input_prompt:
prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
"""
prompt = prompt.format(instruction=instruction_prompt)
else:
prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:
"""
prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
inputs=inputs.input_ids,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
max_length=2048, temperature=0.7,
do_sample=True, top_k=4, top_p=0.95
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Benchmark
Coming soon, stay tune 🙂🙂.
Limitations
- The base model language is English and fine-tuned to Indonesia
- Cultural and contextual biases
License
DukunLM V1.0 is licensed under the Creative Commons NonCommercial (CC BY-NC 4.0) license.
Contributing
We welcome contributions to enhance and improve DukunLM V1.0. If you have any suggestions or find any issues, please feel free to open an issue or submit a pull request. Also we're open to sponsor for compute power.
Contact Us
- Downloads last month
- 928