Hand mesh reconstruction from poses using spiral convolution in Unreal Engine

This masters thesis is about constructing a 3D mesh of a hand from a pose using a neural network. The trained network is integrated into the Unreal Engine, to be used in combination with real-time hand tracking.

Description

The virtual reality field is evolving with breakthroughs in entertainment, medicine, academia, and business. Avatar appearance is now more important than ever, especially hand visualization, since hands are often in view, used for interaction, and vital for non-verbal communication. However, hands are complex and hard to simulate realistically, creating a demand for quick, anatomically realistic hand creation methods.

The goal of this thesis is to create 3D objects of hands, which are more realistic than other currently used models. The creation of a 3D hand from a pose, defined by joint angles, should be fast enough to enable the use in real-time applications.

The approach I have taken is to implement a deep learning model that uses spiral convolutions to process 3D data. I generate training data using the pysics based NIMBLE hand model, and use poses I generated with the Leap Motion Controller. I use the Unreal Engine to execute the neural network model and create a hand-tracking application.

Network Setup

I first create a dataset of poses by recording my hand with the Leap Motion Controller. I utilize NIMBLE to create a realistic mesh of a hand with the given pose.



Next, I train an autoencoder structure, which implements the spiral convolution approached discussed in the thesis. This way, the network can learn the key characteristics of a hand and its pose, the latent representation.

To train a model that is able to output a 3D mesh of a hand based on a pose, I remove the encoder of the autoencoder and replace it with a small, fully connected encoder. It receives the pose as an input and learns to translate it into the latent representation of the hand. The decoder functions as a generator from the latent space to the 3D mesh. As it is already trained, the decoder’s weights are locked.

Unreal Integration

I first serialize the trained model using TorchScript. The PyTorch C++ library, is used for integrating the serialized neural network into Unreal Engine. A custom DLL is used for a simple wrapper that can load and execute the serialized version of the model. By adapting a plugin from NeuralVFX’s basic LibTorch plugin, that leverages LibTorch, the project is able to communicate with the DLL and load and execute the neural network model directly within Unreal Engine. The output is passed to a procedural mesh, which is able to visualize the output of the normalized mesh.

Results

The comparison to SpiralNet++, an earlier variant of the spiral convolution, show very similar results, with slightly faster execution times. Generally execution times are very fast, with most inferences taking less than 1ms as a TorchScript model.



The qualitative analysis of the neural network shows very good results regarding the difference between ground truth data (NIMBLE) and the model output. In a comparison of eight poses from the test data of the LEAP dataset this difference is visualized. Each example consists of two viewpoints that show the ground truth mesh and the generated mesh: a top view and a front view. The gradient of the generated mesh shows the vertex distance as described above. All poses are very close to the ground truth mesh, showing a good construction from the poses.


The hand-tracking with the Leap Motion Controller, work well and fluid. However, some issues regarding the mapping between the Leap Motion hand model and the NIMBLE hand model can be observed.

Files

Full version of the master thesis

License

This original work is copyright by University of Bremen.
Any software of this work is covered by the European Union Public Licence v1.2. To view a copy of this license, visit eur-lex.europa.eu.
The Thesis provided above (as PDF file) is licensed under Attribution-NonCommercial-NoDerivatives 4.0 International.
Any other assets (3D models, movies, documents, etc.) are covered by the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit creativecommons.org.
If you use any of the assets or software to produce a publication, then you must give credit and put a reference in your publication.
If you would like to use our software in proprietary software, you can obtain an exception from the above license (aka. dual licensing). Please contact zach at cs.uni-bremen dot de.