TLDR:
- Salesforce debuts XGen 7B, a new language learning model boasting an extended sequence length up to 8K, outdoing previous models' 2K limit.
- Unlike other models, XGen 7B can be freely used for commercial purposes, thanks to its Apache 2.0 license.
Introduction:
Salesforce, a company renowned for its robust AI models and open-source contributions, recently launched XGen 7B, an advanced language learning model. XGen 7B takes a leap from the conventional sequence length of 2K, extending it to an impressive 8K. This shift is expected to bring significant improvements in text summarization, prediction of protein sequences, and more.
Key Notes:
- XGen 7B is a 7 billion parameter model trained for 1.5 trillion tokens, offering a long sequence modelling approach.
- This model's open-source nature sets it apart from the 7 billion LLaMA model, whose commercial usage has raised concerns in the AI community.
- XGen 7B's models are available in both 4K and 8K versions.
- The 8K sequence length wasn't achieved by employing tricks to extend the context window but through a dense attention mechanism.
- Pre-training of XGen 7B involved the usage of the Red Pajama dataset, now a de facto standard in the community.
- The XGen 7B model can cater to 22 languages, making it multilingual, unlike the English-only MPT.
Implications:
- Salesforce's new model holds promise in various applications, especially in areas requiring text summarization and long sequence predictions.
- The 8K sequence length could be a game-changer, significantly outperforming the traditional 2K limit in various tasks.
- The commercial usability of XGen 7B under the Apache 2.0 license offers unprecedented opportunities for businesses.
- This expansive context window makes the model incredibly useful for tasks like text summarization and predicting protein sequences.
Performance Benchmarks
Comparing XGen's performance with other open-source models paints a promising picture. It performs admirably on the MMLU benchmarks, showing higher scores than many other models.
- XGen doesn't perform as well on the HellaSwag dataset, but on MMLU, it is outperforming many other open-source models.
- When it comes to code generation, XGen fares better than LLaMa but falls behind MPT 7B.
Unique XGen Model: XGen Instruct Model
The XGen Instruct Model is a particular variant of XGen that has undergone supervised fine-tuning on public domain instruction data. The results from this model are influenced significantly by the data it was fine-tuned on, which were not distilled from GPT-4.
- The model performs well for tasks like text summarization.
- However, it stumbles in tasks that require reasoning.
Our Take:
- While XGen 7B shows promising advancements in the world of language learning models, certain aspects need addressing. For instance, certain instances of information generated from the model seem irregular, with responses not ending properly or not generated at all. Additionally, the model's performance on reasoning tasks appears inconsistent, raising some concerns.
- Nevertheless, the model shines in specific areas, notably in text summarization. Its capability to summarize articles in both standard and bullet-point format is noteworthy. The inconsistencies seen could be attributed to the influence of the datasets used for fine-tuning, which could potentially be resolved with better datasets in the future.
Looking Ahead:
The release of Salesforce's XGen 7B is an exciting development in the realm of language learning models. Despite some areas needing improvements, the model's unique offerings, such as an extended sequence length of 8K and its usability for commercial purposes, make it a strong contender in the field. As technology advances, we can expect the XGen model to evolve further, potentially bringing transformative changes to AI-driven language models.
However, the model's performance and capabilities could be significantly enhanced when fine-tuned on more sophisticated datasets, potentially leading to models such as XGen Wizard. It will be interesting to see if Salesforce will build and release larger models in the future. Their contribution undoubtedly enriches the landscape of machine learning and large language models.