Summary of [2210.02414] GLM-130B: An Open Bilingual Pre-trained Model

Summary [2210.02414] GLM-130B: An Open Bilingual Pre-trained Model arxiv.org

686 words - html page - View html page

One Line

arXivLabs provides a framework to develop and share arXiv features, and GLM-130B is an open-source bilingual pre-trained language model with 130 billion parameters supported by the Simons Foundation.

Key Points

arXivLabs is a framework for developing and sharing new arXiv features
GLM-130B is an open-source bilingual pre-trained language model with 130 billion parameters
GLM-130B offers significant outperformance over GPT-3 175B on a wide range of popular English and Chinese benchmarks
GLM-130B can be effectively inferred on 4 RTX 2080 Ti (11G) GPUs or 8 RTX 3090 GPUs
Support for the GLM-130B project was provided by the Simons Foundation and member institutions
arXiv provides status notifications, mailings, and contact information

Summary

174 word summary

Get status notifications via arXiv Operational Status, Web Accessibility Assistance, and Privacy Policy. To subscribe to arXiv mailings, click here. To contact arXiv, click here.

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on their website. It is committed to values of openness, community, excellence, and user data privacy. Projects include CORE Recommender, Connected Papers, Spaces, Replicate, ScienceCast, Papers with Code, Smart Citations, Litmaps, and Bibliographic Explorer.

The article provides a DOI link to access code, training logs, related toolkit, and lessons learned for using 100B-scale models. The GLM-130B model weights are publicly available, and it can be effectively inferred on 4 RTX 2080 Ti (11G) GPUs or 8 RTX 3090 GPUs. It offers significant outperformance over GPT-3 175B on a wide range of popular English and Chinese benchmarks. We introduce GLM-130B, an open-source bilingual (English and Chinese) pre-trained language model with 130 billion parameters, an attempt to surpass GPT-3. Support for this project was provided by the Simons Foundation and member institutions. We acknowledge their support.

Raw indexed text (4,638 chars / 686 words / 193 lines)

[2210.02414] GLM-130B: An Open Bilingual Pre-trained Model

Global Survey

In just 3 minutes help us understand

how you see arXiv

TAKE SURVEY

We gratefully acknowledge support from

the Simons Foundation and member institutions.

arXiv:2210.02414

Help

Advanced Search

All fields

Title

Author

Abstract

Comments

Journal reference

ACM classification

MSC classification

Report number

arXiv identifier

DOI

ORCID

arXiv author ID

Help pages

Full text

open search

open navigation menu

quick links

Help Pages

About

Computer Science > Computation and Language

arXiv:2210.02414

(cs)

[Submitted on 5 Oct 2022]

Title:

GLM-130B: An Open Bilingual Pre-trained Model

Authors:

Aohan Zeng

Xiao Liu

Zhengxiao Du

Zihan Wang

Hanyu Lai

Ming Ding

Zhuoyi Yang

Yifan Xu

Wendi Zheng

Xiao Xia

Weng Lam Tam

Zixuan Ma

Yufei Xue

Jidong Zhai

Wenguang Chen

Peng Zhang

Yuxiao Dong

Jie Tang

Download PDF

Abstract:

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language

model with 130 billion parameters. It is an attempt to open-source a 100B-scale

model at least as good as GPT-3 and unveil how models of such a scale can be

successfully pre-trained. Over the course of this effort, we face numerous

unexpected technical and engineering challenges, particularly on loss spikes

and disconvergence. In this paper, we introduce the training process of

GLM-130B including its design choices, training strategies for both efficiency

and stability, and engineering efforts. The resultant GLM-130B model offers

significant outperformance over GPT-3 175B on a wide range of popular English

benchmarks while the performance advantage is not observed in OPT-175B and

BLOOM-176B. It also consistently and significantly outperforms ERNIE TITAN 3.0

260B -- the largest Chinese language model -- across related benchmarks.

Finally, we leverage a unique scaling property of GLM-130B to reach INT4

quantization, without quantization aware training and with almost no

performance loss, making it the first among 100B-scale models. More

importantly, the property allows its effective inference on 4

RTX 3090

(24G) or 8

RTX 2080 Ti (11G) GPUs, the most ever affordable GPUs

required for using 100B-scale models. The GLM-130B model weights are publicly

accessible and its code, training logs, related toolkit, and lessons learned

are open-sourced at

this https URL

Comments:

47 pages

Subjects:

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cite as:

arXiv:2210.02414

[cs.CL]

(or

arXiv:2210.02414v1

[cs.CL]

for this version)

https://doi.org/10.48550/arXiv.2210.02414

Focus to learn more

arXiv-issued DOI via DataCite

Submission history

From: Xiao Liu [

view email

[v1]

Wed, 5 Oct 2022 17:34:44 UTC (5,359 KB)

Full-text links:

Download:

PDF

Other formats

Current browse context:

cs.CL

< prev

next >

new

recent

2210

Change to browse by:

cs.AI

cs.LG

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

export bibtex citation

Bibtex formatted citation

Data provided by:

Bookmark

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer

What is the Explorer?

Litmaps Toggle

Litmaps

What is Litmaps?

scite.ai Toggle

scite Smart Citations

What are Smart Citations?

Code, Data, Media

Code, Data and Media Associated with this Article

Links to Code Toggle

Papers with Code

What is Papers with Code?

ScienceCast Toggle

ScienceCast

What is ScienceCast?

Demos

Replicate Toggle

Replicate

What is Replicate?

Spaces Toggle

Hugging Face Spaces

What is Spaces?