Instructions to use azale-ai/DukunLM-7B-V1.0-Uncensored with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use azale-ai/DukunLM-7B-V1.0-Uncensored with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="azale-ai/DukunLM-7B-V1.0-Uncensored")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored")
model = AutoModelForCausalLM.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored")

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use azale-ai/DukunLM-7B-V1.0-Uncensored with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "azale-ai/DukunLM-7B-V1.0-Uncensored"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "azale-ai/DukunLM-7B-V1.0-Uncensored",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/azale-ai/DukunLM-7B-V1.0-Uncensored

SGLang

How to use azale-ai/DukunLM-7B-V1.0-Uncensored with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "azale-ai/DukunLM-7B-V1.0-Uncensored" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "azale-ai/DukunLM-7B-V1.0-Uncensored",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "azale-ai/DukunLM-7B-V1.0-Uncensored" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "azale-ai/DukunLM-7B-V1.0-Uncensored",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use azale-ai/DukunLM-7B-V1.0-Uncensored with Docker Model Runner:
```
docker model run hf.co/azale-ai/DukunLM-7B-V1.0-Uncensored
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

DukunLM V1.0 - Indonesian Language Model 🧙‍♂️

🚀 Welcome to the DukunLM V1.0 repository! DukunLM V1.0 is an open-source language model trained to generate Indonesian text using the power of AI. DukunLM, meaning "WizardLM" in Indonesian, is here to revolutionize language generation 🌟. This is an updated version from azale-ai/DukunLM-Uncensored-7B with full model release, not only adapter model like before 👽.

Model Details

Name Model	Parameters	Google Colab	Base Model	Dataset	Prompt Format	Fine Tune Method	Sharded Version
DukunLM-7B-V1.0-Uncensored	7B	Link	ehartford/WizardLM-7B-V1.0-Uncensored	MBZUAI/Bactrian-X (Indonesian subset)	Alpaca	QLoRA	Link
DukunLM-13B-V1.0-Uncensored	13B	Link	ehartford/WizardLM-13B-V1.0-Uncensored	MBZUAI/Bactrian-X (Indonesian subset)	Alpaca	QLoRA	Link

⚠️ Warning: DukunLM is an uncensored model without filters or alignment. Please use it responsibly as it may contain errors, cultural biases, and potentially offensive content. ⚠️

Installation

To use DukunLM, ensure that PyTorch has been installed and that you have an Nvidia GPU (or use Google Colab). After that you need to install the required dependencies:

pip3 install -U git+https://github.com/huggingface/transformers.git
pip3 install -U git+https://github.com/huggingface/peft.git
pip3 install -U git+https://github.com/huggingface/accelerate.git
pip3 install -U bitsandbytes==0.39.0 einops==0.6.1 sentencepiece

How to Use

Normal Model

Stream Output

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model = AutoModelForCausalLM.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored", torch_dtype=torch.float16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored")
streamer = TextStreamer(tokenizer)

instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
input_prompt = ""

if not input_prompt:
  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt)

else:
  prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
_ = model.generate(
    inputs=inputs.input_ids,
    streamer=streamer,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_length=2048, temperature=0.7,
    do_sample=True, top_k=4, top_p=0.95
)

No Stream Output

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored", torch_dtype=torch.float16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored")

instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
input_prompt = ""

if not input_prompt:
  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt)

else:
  prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
    inputs=inputs.input_ids,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_length=2048, temperature=0.7,
    do_sample=True, top_k=4, top_p=0.95
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quantize Model

Stream Output

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer

model = AutoModelForCausalLM.from_pretrained(
    "azale-ai/DukunLM-7B-V1.0-Uncensored-sharded",
    load_in_4bit=True,
    torch_dtype=torch.float32,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        llm_int8_threshold=6.0,
        llm_int8_has_fp16_weight=False,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
    )
)
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored-sharded")
streamer = TextStreamer(tokenizer)

instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
input_prompt = ""

if not input_prompt:
  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt)

else:
  prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
_ = model.generate(
    inputs=inputs.input_ids,
    streamer=streamer,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_length=2048, temperature=0.7,
    do_sample=True, top_k=4, top_p=0.95
)

No Stream Output

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model = AutoModelForCausalLM.from_pretrained(
    "azale-ai/DukunLM-7B-V1.0-Uncensored-sharded",
    load_in_4bit=True,
    torch_dtype=torch.float32,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        llm_int8_threshold=6.0,
        llm_int8_has_fp16_weight=False,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
    )
)
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored-sharded")

instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
input_prompt = ""

if not input_prompt:
  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt)

else:
  prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
    inputs=inputs.input_ids,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_length=2048, temperature=0.7,
    do_sample=True, top_k=4, top_p=0.95
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Benchmark

Coming soon, stay tune 🙂🙂.

Limitations

The base model language is English and fine-tuned to Indonesia
Cultural and contextual biases

License

DukunLM V1.0 is licensed under the Creative Commons NonCommercial (CC BY-NC 4.0) license.

Contributing

We welcome contributions to enhance and improve DukunLM V1.0. If you have any suggestions or find any issues, please feel free to open an issue or submit a pull request. Also we're open to sponsor for compute power.