Summary Magicoder Open-source Large Language Models for Code arxiv.org
8,169 words - PDF document - View PDF document
One Line
Magicoder generates coding challenges from open-source code to train large language models, producing diverse, low-bias data that allows its models to outperform larger models on code generation tasks.
Slides
Slide Presentation (12 slides)
Key Points
- Magicoder is a series of open-source large language models (LLMs) for code generation, which significantly outperform existing closed-source models while having no more than 7B parameters
- OSS-INSTRUCT is a novel approach to generating high-quality instruction-response pairs for training Magicoder models by prompting LLMs to create coding problems and solutions based on open-source code snippets
- The Magicoder models, including Magicoder-CL and MagicoderS-CL, are trained using the OSS-INSTRUCT data and achieve state-of-the-art performance on a wide range of code generation benchmarks, including HumanEval, MBPP, MultiPL-E, and DS-1000
- Ablation studies show that instruction tuning on diverse programming languages can boost the overall coding ability, and the OSS-INSTRUCT approach is superior to direct finetuning on comment-function pairs
- The authors fully open-source the Magicoder model weights, training data, and source code to facilitate future research in LLMs for code
Summaries
25 word summary
Magicoder generates coding challenges from open-source code to train large language models. OSS-INSTRUCT produces diverse, low-bias data. Magicoder models outperform larger models on code generation.
46 word summary
Magicoder leverages open-source code to generate high-quality coding challenges for training large language models. The OSS-INSTRUCT method produces diverse, low-bias data. Magicoder models, including Magicoder-CL and MagicoderS-CL, outperform larger models on code generation tasks. MagicoderS-CL achieves comparable performance to a 34B model using only 7B parameters.
116 word summary
Magicoder is a novel approach that leverages open-source code snippets to generate high-quality coding challenges for instruction tuning of large language models (LLMs). The key contributions are the OSS-INSTRUCT method, which produces diverse, low-bias, and high-quality training data, and the Magicoder models, including Magicoder-CL and MagicoderS-CL, which outperform larger models on code generation tasks. The Magicoder models are evaluated on various benchmarks, demonstrating superior performance compared to state-of-the-art models. MagicoderS-CL, which combines OSS-INSTRUCT with the Evol-Instruct technique, achieves comparable performance to the 34B WizardCoder model while using only 7B parameters. The authors' open-sourcing of the model weights, training data, and source code aims to facilitate further research and development in the field of LLMs for code.
310 word summary
Magicoder: Open-source Large Language Models for Code
Magicoder is a novel approach that leverages open-source code snippets to generate high-quality coding challenges for instruction tuning of large language models (LLMs). This method, called OSS-INSTRUCT, enables Magicoder to outperform existing LLMs on various code generation benchmarks.
The key contributions of this work are:
1. OSS-INSTRUCT: Magicoder generates instruction-response pairs by extracting code snippets from open-source repositories and pairing them with natural language descriptions. This approach produces diverse, low-bias, and high-quality training data for instruction tuning.
2. Magicoder Models: The Magicoder models, including Magicoder-CL and MagicoderS-CL, are instruction-tuned on the OSS-INSTRUCT data. Despite having less than 7B parameters, they outperform larger models like the 15B WizardCoder on code generation tasks.
The Magicoder models are evaluated on a wide range of benchmarks, including multilingual code generation (MultiPL-E), data science coding (DS-1000), and program synthesis (HumanEval, MBPP). The results demonstrate the superior performance of Magicoder compared to state-of-the-art models.
On the MultiPL-E benchmark, Magicoder-CL outperforms the base CodeLLAMA-Python-7B model across all studied programming languages. MagicoderS-CL, which combines OSS-INSTRUCT with the Evol-Instruct technique, achieves comparable performance to the 34B WizardCoder model while using only 7B parameters.
On the DS-1000 dataset, Magicoder-CL-7B outperforms all baselines, including the state-of-the-art WizardCoder models. MagicoderS-CL-7B further improves upon this, achieving an 8.3 percentage point absolute improvement over the 15B WizardCoder-SC.
The authors also compare Magicoder with the recently released DeepSeek-Coder models. Despite DeepSeek-Coder's impressive performance, the Magicoder variants, particularly MagicoderS-DS, are able to surpass DeepSeek-Coder-Instruct-6.7B on the HumanEval and MBPP benchmarks while using only a fraction of the training tokens.
In conclusion, the Magicoder models, enabled by the novel OSS-INSTRUCT data generation method, demonstrate state-of-the-art performance on a wide range of code generation tasks. The authors' open-sourcing of the model weights, training data, and source code aims to facilitate further research and development in the field of LLMs for code.
438 word summary
Magicoder: Open-source Large Language Models for Code
Magicoder is a novel approach that leverages open-source code snippets to generate high-quality coding challenges for instruction tuning of large language models (LLMs). This method, called OSS-INSTRUCT, enables Magicoder to significantly outperform existing LLMs on a range of code generation benchmarks.
The key contributions of this work are:
1. OSS-INSTRUCT: Magicoder generates instruction-response pairs by extracting code snippets from open-source repositories and pairing them with natural language descriptions. This approach produces diverse, low-bias, and high-quality training data for instruction tuning.
2. Magicoder Models: The Magicoder models, including Magicoder-CL and MagicoderS-CL, are instruction-tuned on the OSS-INSTRUCT data. Despite having less than 7B parameters, they outperform larger models like the 15B WizardCoder on various code generation tasks.
3. Comprehensive Evaluation: Magicoder is evaluated on a wide range of benchmarks, including multilingual code generation (MultiPL-E), data science coding (DS-1000), and program synthesis (HumanEval, MBPP). The results demonstrate the superior performance of Magicoder compared to state-of-the-art models.
The Magicoder models are trained using the OSS-INSTRUCT approach, which generates instruction-response pairs from open-source code snippets. This method allows the models to learn from real-world coding examples, leading to significant performance improvements compared to existing LLMs.
On the MultiPL-E benchmark, Magicoder-CL outperforms the base CodeLLAMA-Python-7B model by a large margin across all studied programming languages. Moreover, MagicoderS-CL, which combines OSS-INSTRUCT with the Evol-Instruct technique, achieves comparable performance to the 34B WizardCoder model while using only 7B parameters.
The authors also evaluate Magicoder on the DS-1000 dataset, which assesses code generation for data science tasks. The results show that Magicoder-CL-7B outperforms all the baselines, including the state-of-the-art WizardCoder models. MagicoderS-CL-7B further improves upon this, achieving an 8.3 percentage point absolute improvement over the 15B WizardCoder-SC.
The authors conduct ablation studies to understand the impact of the training data distribution on the model's performance. They find that instruction tuning on different programming languages can boost the overall coding ability, even for out-of-distribution languages. Additionally, the authors compare OSS-INSTRUCT with direct finetuning on comment-function pairs, demonstrating the superiority of the OSS-INSTRUCT approach in terms of data quality and model performance.
The authors also compare Magicoder with the recently released DeepSeek-Coder models. Despite DeepSeek-Coder's impressive performance, the Magicoder variants, particularly MagicoderS-DS, are able to surpass DeepSeek-Coder-Instruct-6.7B on the HumanEval and MBPP benchmarks while using only a fraction of the training tokens.
In conclusion, the Magicoder models, enabled by the novel OSS-INSTRUCT data generation method, demonstrate state-of-the-art performance on a wide range of code generation tasks. The authors' open-sourcing of the model weights, training data, and source code aims to facilitate further research and development in the field of LLMs for code.
1166 word summary
Magicoder: Open-source Large Language Models for Code
Introduction Code generation is a long-standing challenge in computer science. Recently, Large Language Models (LLMs) trained on code have shown outstanding breakthroughs in generating code that accurately satisfies user intents. However, these models are often closed-source, limiting their accessibility and potential for further research and development.
To address this, we introduce Magicoder, a series of fully open-source (code, weights, and data) LLMs for code that significantly close the gap with top code models while having no more than 7B parameters. The key innovation is OSS-INSTRUCT, a novel approach to enlightening LLMs with open-source code snippets to generate high-quality instruction data for code.
OSS-INSTRUCT: Instruction Tuning from Open Source OSS-INSTRUCT works by prompting an LLM (e.g., ChatGPT) to generate a coding problem and its solution according to some seed code snippet collected from open-source repositories. The seed snippet offers controllability of the generation and encourages the LLM to create diverse coding problems that can reflect real-world programming scenarios.
We collect 80K initial seed snippets from various programming languages and apply them to a prompt template, which the LLM takes as input and outputs both a coding problem and its solution. We perform data cleaning and decontamination to ensure the quality of the generated data.
Qualitative examples demonstrate how OSS-INSTRUCT can inspire an LLM to create diverse coding tasks, including algorithmic challenges, realistic issues, single-function code generation, library-based program completion, whole-program development, and even whole-application construction. Analysis of the generated data shows that it exhibits diversity and balance across different categories.
Compared to other data generation methods like Self-Instruct and Evol-Instruct, OSS-INSTRUCT exhibits the lowest average similarity with HumanEval, indicating that the improvements from OSS-INSTRUCT are not merely due to including data from the same distribution.
Magicoder and MagicoderS We build the Magicoder series by finetuning the base models (CodeLLAMA-Python-7B and DeepSeek-Coder-Base-6.7B) on the 75K synthetic data generated through OSS-INSTRUCT. To further enhance the coding abilities, we continue to finetune the Magicoder models with the open-source Evol-Instruct dataset, resulting in the MagicoderS series.
Evaluation We evaluate the Magicoder and MagicoderS models on a wide range of coding tasks, including HumanEval and MBPP for Python text-to-code generation, MultiPL-E for multilingual code completion, and DS-1000 for solving data science problems. We also use the more rigorous EvalPlus framework, which includes the augmented HumanEval+ and MBPP+ datasets.
The results show that both Magicoder-CL and MagicoderS-CL substantially outperform the base CodeLLAMA-Python-7B. Notably, Magicoder-CL even outperforms WizardCoder-CL-7B, WizardCoder-SC-15B, and all studied SOTA LLMs with less than or equal to 16B parameters on all the benchmarks we tested.
Furthermore, the pass@1 result of the enhanced MagicoderS-CL is on par with ChatGPT on HumanEval (70.7 vs. 72.6) and surpasses it on the more rigorous HumanEval+ (66.5 vs. 65.9), indicating that MagicoderS-CL can generate more robust code. It also achieves SOTA results among all code models at the same scale.
We also applied OSS-INSTRUCT on the DeepSeek-Coder-Base-6.7B, resulting in the creation of Magicoder-DS and MagicoderS-DS. MagicoderS-DS achieves a remarkable 76.8 pass@1 on HumanEval, outperforming DeepSeek-Coder-Instruct-6.7B on HumanEval, HumanEval+, MBPP, and MBPP+ with 8x less finetuning tokens.
Contributions In summary, we make the following contributions:
1. We introduce OSS-INSTRUCT, a pioneering approach to enlightening LLMs with open-source code snippets to generate more diverse, realistic, and controllable coding instruction data, which can be leveraged to substantially boost the performance of various LLMs via instruction tuning.
2. We build the Magicoder series trained with OSS-INSTRUCT and MagicoderS series trained on a combination of OSS-INSTRUCT and Evol-Instruct. Our evaluation across 6 benchmarks shows that all Magicoders significantly improve the base LLMs, with MagicoderS-CL and MagicoderS-DS outperforming ChatGPT on HumanEval+ with only 7B parameters.
3. We fully open-source the model weights, training data, and source code at https://github.com/ise-uiuc/magicoder to facilitate future research.
Overall, OSS-INSTRUCT opens a new direction for creating low-bias and high-quality instruction-tuning data from the abundance of open-source references, enabling the development of powerful open-source code generation models.
Magicoder: Open-source Large Language Models for Code
Magicoder is a novel approach that leverages open-source code snippets to generate high-quality coding challenges for instruction tuning of large language models (LLMs). This method, called OSS-INSTRUCT, enables Magicoder to significantly outperform existing LLMs on a range of code generation benchmarks.
The key contributions of this work are:
1. OSS-INSTRUCT: Magicoder generates instruction-response pairs by extracting code snippets from open-source repositories and pairing them with natural language descriptions. This approach produces diverse, low-bias, and high-quality training data for instruction tuning.
2. Magicoder Models: The Magicoder models, including Magicoder-CL and MagicoderS-CL, are instruction-tuned on the OSS-INSTRUCT data. Despite having less than 7B parameters, they outperform larger models like the 15B WizardCoder on various code generation tasks.
3. Comprehensive Evaluation: Magicoder is evaluated on a wide range of benchmarks, including multilingual code generation (MultiPL-E), data science coding (DS-1000), and program synthesis (HumanEval, MBPP). The results demonstrate the superior performance of Magicoder compared to state-of-the-art models.
4. Ablation Studies: The authors investigate the impact of the training data distribution on the model's performance, showing that instruction tuning on diverse programming languages can still boost the overall coding ability. They also compare OSS-INSTRUCT with direct finetuning on comment-function pairs, highlighting the importance of data quality over format.
5. Open-sourcing: The authors fully open-source the Magicoder model weights, training data, and source code to enable future research in LLMs for code.
The Magicoder models are trained using the OSS-INSTRUCT approach, which generates instruction-response pairs from open-source code snippets. This method allows the models to learn from real-world coding examples, leading to significant performance improvements compared to existing LLMs.
On the multilingual code generation benchmark (MultiPL-E), Magicoder-CL outperforms the base CodeLLAMA-Python-7B model by a large margin across all studied programming languages. Moreover, MagicoderS-CL, which combines OSS-INSTRUCT with the Evol-Instruct technique, achieves comparable performance to the 34B WizardCoder model while using only 7B parameters.
The authors also evaluate Magicoder on the DS-1000 dataset, which assesses code generation for data science tasks. The results show that Magicoder-CL-7B outperforms all the baselines, including the state-of-the-art WizardCoder models. MagicoderS-CL-7B further improves upon this, achieving an 8.3 percentage point absolute improvement over the 15B WizardCoder-SC.
The authors conduct ablation studies to understand the impact of the training data distribution on the model's performance. They find that instruction tuning on different programming languages can boost the overall coding ability, even for out-of-distribution languages. Additionally, the authors compare OSS-INSTRUCT with direct finetuning on comment-function pairs, demonstrating the superiority of the OSS-INSTRUCT approach in terms of data quality and model performance.
The authors also compare Magicoder with the recently released DeepSeek-Coder models. Despite DeepSeek-Coder's impressive performance, the Magicoder variants, particularly MagicoderS-DS, are able to surpass DeepSeek-Coder-Instruct-6.7B on the HumanEval and MBPP benchmarks while using only a fraction of the training tokens.
In conclusion, the Magicoder models, enabled by the novel OSS-INSTRUCT data generation method, demonstrate state-of-the-art performance on a wide range of code generation tasks. The authors' open-sourcing of the model weights, training data, and source code aims to facilitate further research and development in the field of LLMs for code.