Update Keyboard LM docs authored by Aleksandras Kostarevas's avatar Aleksandras Kostarevas
...@@ -25,7 +25,7 @@ To mitigate some of these weaknesses, the dictionary algorithm is run in paralle ...@@ -25,7 +25,7 @@ To mitigate some of these weaknesses, the dictionary algorithm is run in paralle
While we use the Llama architecture, we don't use any of the public models like Llama 2 mainly because they're all too big (billions of parameters). Our current model has 36 million parameters. We also use a custom non-conventional tokenizer with space suffixes instead of prefixes to make word-based inference more efficient. For autocorrect, we use a specific prompt format. You can read more about all of this in the Model section. While we use the Llama architecture, we don't use any of the public models like Llama 2 mainly because they're all too big (billions of parameters). Our current model has 36 million parameters. We also use a custom non-conventional tokenizer with space suffixes instead of prefixes to make word-based inference more efficient. For autocorrect, we use a specific prompt format. You can read more about all of this in the Model section.
(Note: Finetuning is currently disabled by default as its effectiveness has not been properly evaluated, but the following applies if you enable it) As you type things with the keyboard, your typed data is saved locally to temporary storage for later finetuning of the transformer LM. Finetuning is scheduled to run at least once every ~20 hours when your device is idle, plugged in and there's enough data. Under the hood, finetuning trains a LoRA adapter locally on your device, merges it with the original model and saves it. While the original data is deleted after finetuning, the finetuned model's weights may contain the data in some form or another, so we recommend avoiding sharing the finetuned model. (Note: Finetuning is currently disabled by default as its effectiveness has not been properly evaluated, but the following applies if you enable it. Finetuning may not be stable) As you type things with the keyboard, your typed data is saved locally to temporary storage for later finetuning of the transformer LM. Finetuning is scheduled to run at least once every ~20 hours when your device is idle, plugged in and there's enough data. Under the hood, finetuning trains a LoRA adapter locally on your device, merges it with the original model and saves it. While the original data is deleted after finetuning, the finetuned model's weights may contain the data in some form or another, so we recommend avoiding sharing the finetuned model.
You can import and export model files for backup, transferring finetuned models between devices, or importing custom/third-party models. If you want to make your own models, check out the Model Creation section. The files are in .gguf format but with extra metadata, defined in the GGUF Metadata section. You can import and export model files for backup, transferring finetuned models between devices, or importing custom/third-party models. If you want to make your own models, check out the Model Creation section. The files are in .gguf format but with extra metadata, defined in the GGUF Metadata section.
... ...
......