The Distributed Framework

The underlying structure is Plug-and-Play a large language model supported architecture with multi-billion parameters, trained on a dataset of 6.35 million articles.

Model Layer of BitMind (Some of the modules are changeable subjected to real needs)

Low-Rank adaptation (LoRA) optimizes low-rank decomposition of dense layers in LLMs. For a pre-trained weight matrix W 0 ∈ R d m ×d p and input activation x ∈ R d m , LORA decomposes W 0 into two low-rank matrices:

h = W 0 + ∆W x = W 0 + BAx

where B ∈ R d p ×r A ∈ R r × d m , and the rank r = min ( d m ,d p ).

The Mixture of Expert MoE, is also introduced. The MoE is a family of neural network architecture that enables conditional computation through multiple experts that are activated based on a gating mechanism (router). In our architecture, we replace each expert with a lightweight adapter: LORA. During fine-tuning, pre-trained weights of dense layers remain fixed, while experts and router layers are trained from scratch. The lightweight experts learn to adapt the pre-trained layers in the fine-tuning time. In this way, the BitMind framework requires only a limited number of parameter updates and does not introduce a huge model size in total.

This MoLoRA methodology is a 3-layer MOEarchitecture, a technology that enables rapid training and distributed deployment for many users, reducing the burden on central computing power and allowing each node to collaborate using its own computational resources. This technology has three main advantages:

  • With the flexibility of using multiple expert models with plug and play capability, it significantly reduces the complexity of training for end users;

  • With support from industry expert models, the amount of training data required for an ordinary user is reduced to less than 1/100 of the original amount, this means the users only need to provide data relevant to their own work or learning in order for the AI to understand industry information and current tasks.

  • MoLoRA reduces the computational power required for the same level of performance by 90%.

The MoLoRA Methodology

In addition, distributed deployment ensures the privacy and confidentiality of personal data and models.

Last updated