Technology
OpenCodeP
An open-source framework for large-scale code pre-training and evaluation using curated multi-language datasets.
OpenCodeP streamlines the development of code-centric LLMs by providing a unified pipeline for data cleaning, tokenization, and distributed training. It leverages the 1.2TB Stack dataset and specialized benchmarks like HumanEval to ensure high-fidelity performance across 80+ programming languages. The toolkit includes optimized scripts for Megatron-LM and DeepSpeed, enabling developers to scale models from 1B to 33B parameters with verifiable efficiency.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1