Real Time Controllable Motion Transition for Characters
I implemented this paper from scratch during my internship at Apple. The idea is to use a neural network to interpolate between 2 animation keyframes using a generative model.
The input data here are .bvh files, from which we extract the joint position and rotations for each frame. We feed the pose information to the encoder of the conditional-VAE to generate a latent space. We then sample from this latent space and pass it through a decoder which has multiple expert networks. The gating scheme allows for the network to choose the right expert from the decoder.
The VAE network is trained using multiple loss terms like reconstruction loss, KL Divergence, Bone Length Loss, Footskating Loss, L1 Rotation loss. These loss functions constrain the generated pose to some acceptable human pose.
Next we train a transition sampler, which takes in the current frame and the target frame and outputs the next frame in the sequence. The network architecture used for this has been mentioned in the paper.