ChiKoi7 commited on
Commit
f23bf9a
·
verified ·
1 Parent(s): 1623bd4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +304 -5
README.md CHANGED
@@ -1,5 +1,304 @@
1
- ---
2
- license: other
3
- license_name: falcon-llm-license
4
- license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: falcon-llm-license
4
+ license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
5
+ language:
6
+ - en
7
+ - fr
8
+ - es
9
+ - pt
10
+ base_model:
11
+ - tiiuae/Falcon3-1B-Instruct
12
+ pipeline_tag: text-generation
13
+ tags:
14
+ - Falcon
15
+ - Falcon3
16
+ - tiiuae
17
+ - 1b
18
+ - Instruct
19
+ - Heretic
20
+ - Uncensored
21
+ - Abliterated
22
+ ---
23
+
24
+ ## Falcon3-1B-Instruct-Heretic
25
+
26
+ A decensored version of [Falcon3-1B-Instruct](https://huggingface.co/tiiuae/Falcon3-1B-Instruct), made using [Heretic](https://github.com/p-e-w/heretic) v1.0.1
27
+
28
+ | | Falcon3-1B-Instruct-Heretic | Original model ([Falcon3-1B-Instruct](https://huggingface.co/tiiuae/Falcon3-1B-Instruct)) |
29
+ | --- | --- | --- |
30
+ | **Refusals** | 3/100 | 97/100 |
31
+ | **KL divergence** | 0.03 | 0 *(by definition)* |
32
+
33
+ ## Heretic Abliteration Parameters
34
+
35
+ | Parameter | Value |
36
+ | :-------- | :---: |
37
+ | **direction_index** | 11.70 |
38
+ | **attn.o_proj.max_weight** | 1.45 |
39
+ | **attn.o_proj.max_weight_position** | 10.31 |
40
+ | **attn.o_proj.min_weight** | 0.82 |
41
+ | **attn.o_proj.min_weight_distance** | 6.09 |
42
+ | **mlp.down_proj.max_weight** | 1.35 |
43
+ | **mlp.down_proj.max_weight_position** | 11.22 |
44
+ | **mlp.down_proj.min_weight** | 0.44 |
45
+ | **mlp.down_proj.min_weight_distance** | 3.73 |
46
+
47
+
48
+ ## GGUF Versions
49
+
50
+ Quantized/GGUF versions available at [ChiKoi7/Falcon3-1B-Instruct-Heretic-GGUF](https://huggingface.co/ChiKoi7/Falcon3-1B-Instruct-Heretic-GGUF)
51
+
52
+ ---
53
+
54
+ ---
55
+
56
+ ---
57
+
58
+ <div align="center">
59
+ <img src="https://huggingface.co/datasets/tiiuae/documentation-images/resolve/main/general/falco3-logo.png" alt="drawing" width="500"/>
60
+ </div>
61
+
62
+ # Falcon3-1B-Instruct
63
+
64
+ **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
65
+
66
+ This repository contains the **Falcon3-1B-Instruct**. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks.
67
+ Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K.
68
+
69
+ ## Model Details
70
+ - Architecture
71
+ - Transformer-based causal decoder-only architecture
72
+ - 18 decoder blocks
73
+ - Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads
74
+ - Wider head dimension: 256
75
+ - High RoPE value to support long context understanding: 1000042
76
+ - Uses SwiGLU and RMSNorm
77
+ - 8K context length
78
+ - 131K vocab size
79
+ - Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips
80
+ - Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data
81
+ - Supports EN, FR, ES, PT
82
+ - Developed by [Technology Innovation Institute](https://www.tii.ae)
83
+ - License: TII Falcon-LLM License 2.0
84
+ - Model Release Date: December 2024
85
+
86
+
87
+ ## Getting started
88
+
89
+ <details>
90
+ <summary> Click to expand </summary>
91
+
92
+ ```python
93
+ from transformers import AutoTokenizer, AutoModelForCausalLM
94
+
95
+
96
+ model_name = "tiiuae/Falcon3-1B-Instruct"
97
+
98
+ model = AutoModelForCausalLM.from_pretrained(
99
+ model_name,
100
+ torch_dtype="auto",
101
+ device_map="auto"
102
+ )
103
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
104
+
105
+ prompt = "How many hours in one day?"
106
+ messages = [
107
+ {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
108
+ {"role": "user", "content": prompt}
109
+ ]
110
+ text = tokenizer.apply_chat_template(
111
+ messages,
112
+ tokenize=False,
113
+ add_generation_prompt=True
114
+ )
115
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
116
+
117
+ generated_ids = model.generate(
118
+ **model_inputs,
119
+ max_new_tokens=1024
120
+ )
121
+ generated_ids = [
122
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
123
+ ]
124
+
125
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
126
+ print(response)
127
+ ```
128
+
129
+ </details>
130
+
131
+ <br>
132
+
133
+ ## Benchmarks
134
+ We report in the following table our internal pipeline benchmarks.
135
+ - We use [lm-evaluation harness](https://github.com/EleutherAI/lm-evaluation-harness).
136
+ - We report **raw scores** obtained by applying chat template and fewshot_as_multiturn.
137
+ - We use same batch-size across all models.
138
+
139
+ <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
140
+ <colgroup>
141
+ <col style="width: 10%;">
142
+ <col style="width: 10%;">
143
+ <col style="width: 7%;">
144
+ <col style="width: 7%;">
145
+ <col style="width: 7%;">
146
+ <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
147
+ </colgroup>
148
+ <thead>
149
+ <tr>
150
+ <th>Category</th>
151
+ <th>Benchmark</th>
152
+ <th>Llama-3.2-1B</th>
153
+ <th>Qwen2.5-1.5B</th>
154
+ <th>SmolLM2-1.7B</th>
155
+ <th>Falcon3-1B-Instruct</th>
156
+ </tr>
157
+ </thead>
158
+ <tbody>
159
+ <tr>
160
+ <td rowspan="3">General</td>
161
+ <td>MMLU (5-shot)</td>
162
+ <td><b>68.2</b></td>
163
+ <td>59.8</td>
164
+ <td>49.2</td>
165
+ <td>46.1</td>
166
+ </tr>
167
+ <tr>
168
+ <td>MMLU-PRO (5-shot)</td>
169
+ <td>16</td>
170
+ <td><b>28.2</b></td>
171
+ <td>20</td>
172
+ <td>18.6</td>
173
+ </tr>
174
+ <tr>
175
+ <td>IFEval</td>
176
+ <td><b>55.3</b></td>
177
+ <td>44.2</td>
178
+ <td>53</td>
179
+ <td>54.4</td>
180
+ </tr>
181
+ <tr>
182
+ <td rowspan="3">Math</td>
183
+ <td>GSM8K (5-shot)</td>
184
+ <td><b>82.6</b></td>
185
+ <td>57.8</td>
186
+ <td>47.6</td>
187
+ <td>43.9</td>
188
+ </tr>
189
+ <tr>
190
+ <td>GSM8K (8-shot, COT)</td>
191
+ <td>46.6</td>
192
+ <td><b>58.8</b></td>
193
+ <td>46.3</td>
194
+ <td>45.8</td>
195
+ </tr>
196
+ <tr>
197
+ <td>MATH Lvl-5 (4-shot)</td>
198
+ <td><b>5.2</b></td>
199
+ <td>1.1</td>
200
+ <td>3.1</td>
201
+ <td>1</td>
202
+ </tr>
203
+ <tr>
204
+ <td rowspan="5">Reasoning</td>
205
+ <td>Arc Challenge (25-shot)</td>
206
+ <td><b>58.6</b></td>
207
+ <td>50.7</td>
208
+ <td>49.7</td>
209
+ <td>47.7</td>
210
+ </tr>
211
+ <tr>
212
+ <td>GPQA (0-shot)</td>
213
+ <td>24.4</td>
214
+ <td><b>29.6</b></td>
215
+ <td>28.6</td>
216
+ <td>26.5</td>
217
+ </tr>
218
+ <tr>
219
+ <td>GPQA (0-shot, COT)</td>
220
+ <td>13.2</td>
221
+ <td>9.2</td>
222
+ <td>16</td>
223
+ <td><b>21.3</b></td>
224
+ </tr>
225
+ <tr>
226
+ <td>MUSR (0-shot)</td>
227
+ <td>32</td>
228
+ <td>36.5</td>
229
+ <td>32.9</td>
230
+ <td><b>40.7</b></td>
231
+ </tr>
232
+ <tr>
233
+ <td>BBH (3-shot)</td>
234
+ <td>33.8</td>
235
+ <td><b>39.2</b></td>
236
+ <td>34</td>
237
+ <td>35.1</td>
238
+ </tr>
239
+ <tr>
240
+ <td rowspan="5">CommonSense Understanding</td>
241
+ <td>PIQA (0-shot)</td>
242
+ <td>72.1</td>
243
+ <td>73.2</td>
244
+ <td><b>74.4</b></td>
245
+ <td>72</td>
246
+ </tr>
247
+ <tr>
248
+ <td>SciQ (0-shot)</td>
249
+ <td>61.8</td>
250
+ <td>69.5</td>
251
+ <td>71.4</td>
252
+ <td><b>86.8</b></td>
253
+ </tr>
254
+ <tr>
255
+ <td>Winogrande (0-shot)</td>
256
+ <td>-</td>
257
+ <td>-</td>
258
+ <td>-</td>
259
+ <td><b>60.2</b></td>
260
+ </tr>
261
+ <tr>
262
+ <td>OpenbookQA (0-shot)</td>
263
+ <td>40.2</td>
264
+ <td>40.4</td>
265
+ <td><b>42.8</b></td>
266
+ <td>40</td>
267
+ </tr>
268
+ <tr>
269
+ <td>MT-Bench (avg)</td>
270
+ <td>5.4</td>
271
+ <td><b>7.1</b></td>
272
+ <td>6.1</td>
273
+ <td>5.5</td>
274
+ </tr>
275
+ <tr>
276
+ <td rowspan="1">Instructions following</td>
277
+ <td>Alpaca (WC)</td>
278
+ <td><b>8.6</b></td>
279
+ <td><b>8.6</b></td>
280
+ <td>5.4</td>
281
+ <td>6.1</td>
282
+ </tr>
283
+ </tbody>
284
+ </table>
285
+
286
+ ## Useful links
287
+ - View our [release blogpost](https://huggingface.co/blog/falcon3).
288
+ - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
289
+
290
+ ## Technical Report
291
+ Coming soon....
292
+
293
+ ## Citation
294
+ If the Falcon3 family of models were helpful to your work, feel free to give us a cite.
295
+
296
+ ```
297
+ @misc{Falcon3,
298
+ title = {The Falcon 3 Family of Open Models},
299
+ url = {https://huggingface.co/blog/falcon3},
300
+ author = {Falcon-LLM Team},
301
+ month = {December},
302
+ year = {2024}
303
+ }
304
+ ```