Anerysrynz commited on
Commit
8062fdc
·
verified ·
1 Parent(s): ae8a81e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +203 -0
README.md ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - id
5
+ tags:
6
+ - anerysai
7
+ - transformer
8
+ - causal-lm
9
+ - indonesian
10
+ - english
11
+ - pytorch
12
+ - custom-architecture
13
+ license: mit
14
+ datasets:
15
+ - custom
16
+ ---
17
+
18
+ # AnerysAI LLM v0.1
19
+
20
+ A comprehensive Indonesian-English bilingual language model built with custom PyTorch transformer architecture.
21
+
22
+ ## Model Description
23
+
24
+ AnerysAI LLM is a bilingual language model supporting both English and Indonesian languages. It features:
25
+ - **Chain-of-Thought Reasoning**: Step-by-step reasoning capabilities
26
+ - **Search-Augmented Generation**: Web search integration for enhanced responses
27
+ - **Gradient Accumulation Training**: Large effective batch sizes for better convergence
28
+ - **Custom Architecture**: Optimized transformer implementation
29
+
30
+ ## Model Details
31
+
32
+ - **Model Type**: Custom Transformer-based Causal Language Model
33
+ - **Architecture**: Multi-head attention with feed-forward networks
34
+ - **Languages**: English and Indonesian (bilingual support)
35
+ - **Training Data**: High-quality structured data covering AI, technology, and general knowledge
36
+
37
+ ## Installation
38
+
39
+ ```bash
40
+ pip install torch transformers huggingface-hub
41
+ git clone https://github.com/AneryzRynzStudios/AnerysAI-LLM-V0.1.git
42
+ cd AnerysAI-LLM-V0.1
43
+ pip install -r requirements.txt
44
+ ```
45
+
46
+ ## Usage
47
+
48
+ ### Load and Use the Model
49
+
50
+ ```python
51
+ from huggingface_hub import hf_hub_download
52
+ import torch
53
+ from src.model import TransformerModel
54
+ from src.config import ModelConfig
55
+ from src.tokenizer import BaseTokenizer
56
+
57
+ # Download model files
58
+ model_path = hf_hub_download("Anerysrynz/anerysai-llm-v0.1", "final_model.pt")
59
+ tokenizer_path = hf_hub_download("Anerysrynz/anerysai-llm-v0.1", "tokenizer.json")
60
+ config_path = hf_hub_download("Anerysrynz/anerysai-llm-v0.1", "config.json")
61
+
62
+ # Load configuration
63
+ with open(config_path, 'r') as f:
64
+ config_data = json.load(f)
65
+
66
+ model_config = config_data["model_config"]
67
+
68
+ # Create model
69
+ config = ModelConfig(
70
+ vocab_size=model_config["vocab_size"],
71
+ max_sequence_length=model_config["max_sequence_length"],
72
+ embedding_dim=model_config["embedding_dim"],
73
+ num_heads=model_config["num_heads"],
74
+ num_layers=model_config["num_layers"],
75
+ ffn_dim=model_config["ffn_dim"],
76
+ dropout_rate=model_config["dropout_rate"],
77
+ )
78
+
79
+ model = TransformerModel(config)
80
+ model.load_state_dict(torch.load(model_path, map_location="cpu"))
81
+ model.eval()
82
+
83
+ # Load tokenizer
84
+ tokenizer = BaseTokenizer(vocab_size=model_config["vocab_size"])
85
+ tokenizer.load(tokenizer_path)
86
+
87
+ # Generate text
88
+ prompt = "What is artificial intelligence?"
89
+ tokens = tokenizer.encode(prompt, language="en")
90
+ # Add generation logic here
91
+ ```
92
+
93
+ ### Interactive Usage
94
+
95
+ ```bash
96
+ # Clone the repository
97
+ git clone https://github.com/AneryzRynzStudios/AnerysAI-LLM-V0.1.git
98
+ cd AnerysAI-LLM-V0.1
99
+
100
+ # Download model from Hugging Face
101
+ python -c "
102
+ from huggingface_hub import snapshot_download
103
+ snapshot_download(repo_id='Anerysrynz/anerysai-llm-v0.1', local_dir='checkpoints')
104
+ "
105
+
106
+ # Run inference
107
+ python inference.py --device cpu --interactive
108
+ ```
109
+
110
+ ## Model Configuration
111
+
112
+ | Parameter | Value |
113
+ |-----------|-------|
114
+ | Vocabulary Size | 1,081 |
115
+ | Max Sequence Length | 256 |
116
+ | Embedding Dimension | 512 |
117
+ | Number of Heads | 8 |
118
+ | Number of Layers | 12 |
119
+ | Feed-forward Dimension | 3,072 |
120
+ | Dropout Rate | 0.1 |
121
+
122
+ ## Training Details
123
+
124
+ - **Physical Batch Size**: 8
125
+ - **Effective Batch Size**: 9,000 (via gradient accumulation)
126
+ - **Gradient Accumulation Steps**: 1,125
127
+ - **Learning Rate**: 0.0018 (auto-scaled)
128
+ - **Training Approach**: Custom implementation with label smoothing
129
+ - **Data**: Bilingual structured dataset
130
+
131
+ ## Features
132
+
133
+ ### Chain-of-Thought Reasoning
134
+ ```python
135
+ from src.inference import TextGenerator
136
+
137
+ generator = TextGenerator(model, tokenizer)
138
+ response = generator.think_and_generate(
139
+ "Why does the sky appear blue?",
140
+ num_steps=3
141
+ )
142
+ ```
143
+
144
+ ### Search-Augmented Generation
145
+ ```python
146
+ response = generator.search_and_generate(
147
+ "What is the capital of France?"
148
+ )
149
+ ```
150
+
151
+ ## Performance
152
+
153
+ - **Model Size**: ~50M parameters
154
+ - **Training Time**: Variable (depends on hardware)
155
+ - **Inference Speed**: Fast on CPU/GPU
156
+ - **Memory Usage**: Moderate (fits in most GPUs)
157
+
158
+ ## Limitations
159
+
160
+ - Requires custom loading code (not standard HF format)
161
+ - Model size may be limited compared to larger models
162
+ - May not generalize as well as models trained on massive datasets
163
+ - Requires specific dependencies and setup
164
+
165
+ ## Ethical Considerations
166
+
167
+ This model is intended for research and educational purposes. Users should:
168
+ - Be aware of potential biases in training data
169
+ - Use the model responsibly
170
+ - Not deploy in critical applications without thorough testing
171
+ - Respect copyright and intellectual property laws
172
+
173
+ ## Training Data
174
+
175
+ The model was trained on a curated dataset including:
176
+ - AI and machine learning concepts
177
+ - Computer science fundamentals
178
+ - Technology trends and developments
179
+ - Bilingual Q&A pairs
180
+ - Conversational scenarios
181
+
182
+ ## Citation
183
+
184
+ ```bibtex
185
+ @misc{anerysai-llm-2026,
186
+ title={AnerysAI LLM v0.1: Bilingual Language Model with Reasoning Capabilities},
187
+ author={AneryzRynz Studios},
188
+ year={2026},
189
+ url={https://huggingface.co/Anerysrynz/anerysai-llm-v0.1}
190
+ }
191
+ ```
192
+
193
+ ## License
194
+
195
+ MIT License - see the LICENSE file in the original repository for details.
196
+
197
+ ## Contact
198
+
199
+ For questions or issues, please visit the [GitHub repository](https://github.com/AneryzRynzStudios/AnerysAI-LLM-V0.1).
200
+
201
+ ---
202
+
203
+ *This model was created for educational and research purposes. Use responsibly.*