Technology

DINet

A deep information-driven network for high-fidelity talking head synthesis using spatial-temporal vision transformers.

DINet (Deep Information-driven Network) solves the synchronization gap in lip-syncing by leveraging a feature-adaptive transformation module. It processes high-resolution video (up to 512x512) by extracting spatial features from reference images and temporal cues from audio sequences. The architecture utilizes a proprietary perception loss and a multi-scale discriminator to ensure facial movements remain fluid at 25 frames per second. By focusing on regional facial dynamics rather than global warping, DINet maintains identity consistency across diverse head poses and lighting conditions.

https://github.com/MRYingG/DINet

0 projects · 0 cities

Recent Talks & Demos

Showing 1-0 of 0

Members-Only

No public projects found for this technology yet.