ByteDance AI Research proposes a new self-supervised learning framework to create high-quality stylized 3D avatars with a mix of continuous and discrete parameters

ByteDance AI Research proposes a new self-supervised learning framework to create high-quality stylized 3D avatars with a mix of continuous and discrete parameters

A key entry point into the digital world, which is more prevalent in modern life for socializing, shopping, playing games and other activities, is a visually appealing and animated 3D avatar. A decent avatar should be attractive and customized to fit the user’s appearance. Many popular avatar systems, such as Zepeto1 and ReadyPlayer2, use animated and stylized looks because they are fun and easy to use. However, choosing and manually changing an avatar typically involves painstaking editing of many graphical elements, which is both time consuming and difficult for inexperienced users. In this research, they study the automated generation of stylized 3D avatars from a single selfie taken head-on.

Specifically, given a selfie image, their algorithm predicts an avatar vector as a complete setup for a graphics engine to generate a 3D avatar and render avatar images from predefined 3D assets. The avatar vector consists of predefined resource-specific parameters, which can be continuous (for example, head length) or discrete (for example, hair types). A naive solution is to annotate a series of selfie images and train a model to predict the avatar vector via supervised learning. However, large-scale annotations are required to manage a large range of resources (usually in the hundreds). Self-supervised approaches are suggested to train a differentiable mimic replicating graphics engine renderings to automatically match the produced avatar image with the selfie image using different identification losses and semantic segmentation, which would reduce the cost of annotation.

To be more specific, given a selfie photograph, their system provides an avatar vector as the whole setup for a graphics engine to produce a 3D avatar and render avatar images from specified 3D resources. The characteristics that make up the avatar vector are specific to the preset assets and can be continuous (such as head length) or discrete (such as hair types). A simple method is to annotate a collection of selfies and use supervised learning to build a model to predict the avatar vector. However, large-scale annotations are required to handle a wide variety of resources (usually in the hundreds).

Build high-quality training datasets with Kili Technology and solve NLP machine learning challenges to develop powerful ML applications

Avatar Vector Conversion, Self-Supervised Avatar Parameterization, and Portrait Stylization are the three steps of their innovative architecture. According to Fig. 1, the identifying information (hairstyle, skin tone, glasses, etc.) is maintained throughout the pipeline while the domain gap is gradually closed during the three phases. The portrait stylization stage first focuses on the crossover of the 2D visual appearance domain from the real to the stylized. This step maintains the image space while producing the input selfie as a stylized avatar. Crude use of current stylization techniques for translation will keep things like expression, which will intrusively complicate later stages of the pipeline.

Figure 1

As a result, they developed a modified version of AgileGAN to ensure homogeneity of expression while maintaining user identification. The parameterization step of the self-supervised avatar then concerns the transition from the pixel-based image to the vector-based avatar. They found that a strong application of parameter discreteness prevents the optimization from achieving convergent behavior. They adopt a lenient formulation known as a relaxed avatar vector to overcome this problem, by encoding discrete parameters as one-hot continuous vectors. They taught a mimic to behave like the nondifferentiable engine to allow for differentiability in training. All discrete parameters are converted to one-hot vectors in the Avatar Vector Conversion step. The domain spans from relaxed avatar vector space to strict avatar vector space. The graphics engine can then build the final avatars and render them using the strict avatar vector. They use a unique seek technique that produces superior results over direct quantization. They employ human preference research to evaluate their findings and compare results with basic approaches like F2P and manual manufacturing to see how effectively their method protects personal uniqueness. Their results reach scores substantially higher than those of the basic techniques and quite similar to those of the creation of the hand.

They also provide an ablation study to support their pipeline design decisions. Their technical contributions include, in brief, the following:

A new self-supervised learning framework for producing high quality stylized 3D avatars with a combination of continuous and discrete parameters

A new method to bridge the substantial style domain gap in creating stylized 3D avatars using portrait stylization

A cascading relaxation and search pipeline to address the convergence problem in optimizing discrete avatar parameters.

You can find a video demonstration of the document on their site.


Check out the Paper AND Project. All credit for this research goes to the researchers of this project. Also, don’t forget to subscribe our Reddit page, Discord channelAND Email newsletterwhere we share the latest news on AI research, cool AI projects and more.


Aneesh Tickoo is a Consulting Intern at MarktechPost. She is currently pursuing her BA in Data Science and Artificial Intelligence from Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects that harness the power of machine learning. Her research interest is image processing and she is passionate about building solutions around it. She loves connecting with people and collaborating on interesting projects.


Gain a competitive edge with data – actionable market insights for global brands, retailers, analysts and investors. (Sponsored)

#ByteDance #Research #proposes #selfsupervised #learning #framework #create #highquality #stylized #avatars #mix #continuous #discrete #parameters
Image Source : www.marktechpost.com

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *