Adrian Danzglock, Mustafa Imad Abdullah Alkendi, Luca Gee, Metasit Getrak, Nick Stelke, Hermann Meißenhelter, René Weller, Gabriel Zachmann

*Hermann Meißenhelter, Supervisor of the Masterproject DejaView†René Weller‡Gabriel Zachmann, Leader of the Workgroup CGVR

Abstract Physical reasoning is a key component in simulation-based robotics and optimization. Automatic differentiation enables efficient analysis and optimization within physics-based simulations. At the same time, virtual reality (VR) offers an immersive and intuitive way for users to engage with virtual environments.

The goal of this project is to integrate differentiable physics simulators into our custom pipeline, which is connected to a VR environment—enabling an interactive, immersive experience for users and supporting advanced robotic physical reasoning.

In this project, we used and evaluated two physics simulators: Uncertain Physics, developed by the Computer Graphics and VR Lab at the University of Bremen, and Warp, developed by NVIDIA. Both simulators were tested using three optimization algorithms - gradient descent, BFGS, and Adam. The results showed that, for the given problem, Uncertain Physics achieved significantly lower computation times per iteration across all optimizers when using forward-mode automatic differentiation.

Index Terms Physics, Simulation, WARP, automatic differentiation, Unreal Engine

Video 1: Project Introduction

IN this paper, we present our progress in the development and implementation of a differentiable physics simulation that is linked to the Unreal Engine for Virtual Reality (VR) to enable physical reasoning. Our focus is on the integration of automatic differentiation, which allows one to efficiently perform sensitivity analyses and optimizations within physical simulations.

We also investigate how this optimization capability can be used for inverse physical inference. Optimizing a scene’s initial state to reach a specified target state after some time can help explain unobserved causes of an observed state. In a robotics context, inverse physical reasoning could, for instance, allow a robotic agent to reason about the original location of a dropped item or analyze the causes of an unintended effect like knocking an object over.

A central part of this work is the detailed investigation of the theoretical and practical background of automatic differentiation, which is a key technology for differentiable simulations. We also address the challenges and solutions for seamlessly integrating differentiable physics into the user interface (UI) and user experience (UX) within a VR environment.

In addition, we compare our own implementation within Uncertain Physics with the established differentiable physics engine WARP from NVIDIA. This comparison includes both qualitative and quantitative evaluations to identify the advantages and disadvantages of both approaches in terms of efficiency, accuracy, and application flexibility. The goal is to provide an analysis of how our method can contribute to the advancement of interactive VR-based physics simulations.

Physical Simulation plays an important role in many areas. Many types of bodies can be treated in simulation. The topic of physics simulation has been studied and improved over the years. For example, Bender et al.[Ben+14] reviewed position-based simulation methods developed in the field of physics simulation, and Manteaux et al.[Man+17] explored the idea of how to compromise a trade-off between simulation accuracy and speed. In addition, many physics simulators and simulation methods have also been developed on this topic[Hei+21; Mac22]. These studies showed that physics simulation was still relevant at the time this project was developed. Physics simulation played an essential role in this project, as this project was mainly concerned with the simulation of rigid bodies. To summarize, physics simulation is important to this project, as the simulation of rigid bodies has been relevant in this project ever since. Over the years, this topic has been studied and improved, which supports the development of this project.

Physical Inference Physical inference is crucial for understanding our intuitive grasp of the universe's physical structure, the dynamic interplay of its properties, and the prediction of how physical events will unfold over time [Fis+16]. In their research, Battaglia et al.[BHT13] suggest that physical inferences function as a probabilistic, quantifiable simulation of rigid and soft bodies, along with fluid dynamics, that are initially processed in the brain as an "Intuitive Physics Engine," akin to the simulations of computer physics engines in graphics and video games. Sanchez-Gonzalez et al. [San+20] presented a graph network-based simulator that was able to infer complex particle-based materials and their physical interactions with each other in different physical domains over time while remaining generalizable to tasks beyond its training. There was also research on the physical stability of block construction towers and predicting their fall, with the goal of simulating physical inference akin to human perception by visually inferring stability. Li et al.[Li+16] focus on shifting scene parameters and how inference handles varied factors, whereas Lerer et al. [LGF16] aim to estimate the trajectory of falling blocks in addition to predicting whether they would fall.

Differentiable Simulation is a tool for many areas, e.g. computational physics, robotics, and machine learning. Differential simulators are continuously differentiable, i.e. they are able to calculate gradients for a given loss function over the entire simulation. Differentiable simulations consist of several components, including the calculation of gradients, the dynamics model, the contact model, and the integrator [New+24]. Numerous differentiable simulators have been developed in recent years.

Tiny Differentiable Simulator (TDS)[Hei+21] is a physics library based on C++ and CUDA that enables fast and parallel calculations. The simulator only supports rigid body contacts, which means that the simulation of contacts with deformable objects does not work. A limitation of this simulator is that only primitive shapes are supported.

Differentiable Projective Dynamics (DiffPD)[Du+21] is a differentiable soft body simulator that uses an implicit time step instead of an explicit one, which eliminates the problem when the simulation step size is not small enough. The idea in the development was to use the Cholesky decomposition in standard Projective Dynamics (PD) to speed up the backpropagation, resulting in a 4 to 19 times faster simulation time compared to the standard Newton method and a 4 to 6 times faster time in locomotion tasks. The main limitation of this idea was that it was not fully differentiable in some cases, especially when both contact and friction were present.

Differentiable Cloth Simulation with Dry Frictional Contact (DiffCloth)[Li+22] is a differentiable material simulator. The simulator focuses on the projective dynamics (PD) domain with dry frictional contact. In addition, the technique also exploits the contact gradient by using an iterative solver to speed up the process of backpropagation, which makes the simulation faster than using a direct solver. One limitation of this work that needs to be pointed out is that self-collisions between vertex-face and edge-edge have not been considered and need to be implemented for a more realistic simulation.

Automatic Differentiation (AD) is a method where the partial derivatives of each individual operation within a calculation flow are systematically stored and passed on. This enables an efficient and accurate determination of gradients without the numerical inaccuracies of classical difference methods.

Depending on the application and problem, two different variants of automatic differentiation can be used: forward differentiation (forward mode) and reverse differentiation (reverse mode). While forward differentiation is particularly suitable for functions with few input variables, reverse differentiation is often used in applications with many parameters, such as the training of neural networks.

Fig. 1. The Forward computation is done in both modes. The forward mode calculates the gradient during the forward computation, and the reverse mode after the forward computation [DLS24]

The forward mode of automatic differentiation is particularly suitable for optimization problems with only a few optimized parameters. The gradient is calculated and passed on together with the actual function value. This method is efficient if the number of input variables is small, because the number of calculation steps required remains proportional to the number of parameters to be differentiated.

If many different parameters are to be optimized or the parameters are initialized at different times, the reverse mode of automatic differentiation is particularly suitable. In this method, a directed graph is constructed during the calculation, in which each edge represents the partial derivative of the corresponding operation.

Instead of determining the gradients directly during the forward calculation, the reverse mode saves all intermediate steps of the derivative calculation in this calculation graph. The gradient is then determined by backpropagation by applying the chain rule of differentiation from the end nodes of the graph back to the input values.

The method is particularly efficient when the number of parameters to be optimized is large, since the computational effort remains independent of the number of input variables and instead depends on the number of output variables. Therefore, the reverse mode is often used in machine learning, especially in the training of neural networks, where numerous weights and parameters have to be optimized simultaneously.

Because several different software components are used in the pipeline, we give a broad overview of the general components in use and how they interact.

Fig. 2. A typical run through the pipeline

The main point of interaction with the user is the VR frontend in Unreal Engine. Here, the user sets up the world state and defines the problem he wants to solve. This setup is then processed into a custom format.

Then, either of the two physics simulators can interpret the problem and world state, apply optimization techniques to obtain a solution, and return the answers.

As can be seen in Figure 2, it is also possible to use the custom format and view the results directly as well, without using the Unreal Engine frontend.

Here, we provide a comprehensive overview of the methods and approaches we use to optimize physical problems. In particular, we consider the determination of the gradients required for optimization by means of automatic differentiation.

A central aspect of our methodology is the calculation of errors in n-dimensional space. We analyze various metrics, such as the position error in meters, the velocity error in meters per second, and the rotation error, in order to enable a precise and differentiated evaluation of the optimization results.

In many physical optimization problems, feedback from the environment—such as information about collisions or obstacle positions—can be sparse or indirect. This may result in poorly connected computation graphs and unstable gradient propagation. To address this, we incorporate an additional penalty term into the loss function, referred to as Penalty, which is specifically designed to guide the gradient toward inducing collisions between targeted bodies. This not only improves the stability of the optimization process but also ensures that relevant physical interactions are more effectively captured. Additionally, we evaluate the performance of different optimization algorithms (optimizers) to investigate their influence on convergence speed and the overall effectiveness of the optimization.

Specifically, the optimizers Adam [KB14], BFGS [Bro70] [Fle70] [Gol70] [Sha70] and Gradient Descent(GD) are considered. Of these, GD is the simplest, optimizing parameters by following the direction of the gradient with a fixed step size for each iteration. Adam refines this concept by keeping track of a moving average of past gradients and thereby accounting for a notion of momentum when choosing its step. BFGS also considers an approximation of the Hessian matrix of second-order partial derivatives of the parameters, which is updated after each iteration, when deciding a search direction for the optimization step. An appropriate step size is then chosen via a line search.

In addition, we provide a brief overview of the epsilon-greedy exploration strategy, which is employed to escape local minima and ensure broader exploration of the solution space. Epsilon-greedy balances exploitation and exploration by selecting the best-known option most of the time while occasionally making random choices. This behavior is controlled by a parameter

𝜖

(epsilon), which represents the probability of choosing a random action. Over time,

𝜖

is typically decreased, reducing exploration in favor of exploitation as the optimization progresses. In our use case, the epsilon-greedy strategy is particularly useful when the optimization process becomes trapped in a local minimum, where the parameter vector

𝜃

has converged. This targeted random exploration helps prevent premature convergence to suboptimal solutions—an advantage that becomes especially important in high-dimensional or complex physical environments.

To calculate the loss of a simulation run, we do not only consider the end position, but rather the best position reached during the run. We have done this to avoid excessive errors and thus high gradients.

In our problem, we can optimize several physical quantities simultaneously: the position, the orientation, as well as the linear and angular velocity. However, the resulting loss terms are physically difficult to convert into a uniform scale. While the position difference is described by a Cartesian error, the velocity error results as a Cartesian error in relation to time, and for the orientation we use a geodesic error. Due to these different units of measurement, a direct comparison or a simple aggregation of the loss values is not trivial.

Nevertheless, the differentiable calculation allows us to combine all loss functions and optimize them together. One problem, however, is the unequal scaling of the various error terms, which can lead to individual loss components dominating. The geodetic error in particular is often difficult to set into a meaningful relationship with the Cartesian error, which makes the balance within the optimization more difficult.

To address this problem, in the Uncertain Physics backend, we have additionally implemented the average distance of model points [Hin+13] as an alternative loss formula. To obtain this loss, the mesh of the optimized object is considered in its pose during the simulation compared to its intended target pose. For each vertex of the object mesh, the distance between its position in the actual and the target pose is calculated, and the average over all vertices is taken as the final loss.

This captures both the translational and the rotational components in terms of a Cartesian error, preventing either factor from dominating the loss term due to differences in scale. Using the Average Distance of Model Points furthermore means that greater importance is placed on correct rotation for objects with larger or less compact shapes.

Since the optimization takes place in an unknown environment and only limited feedback is available per simulation run, we use one or more penalty functions. These have two main purposes: On the one hand, they help to specifically establish certain contacts within the trajectory in order to generate gradient information, especially in cases where the calculation graph is not fully connected. On the other hand, they specifically promote collisions between certain objects in order to force desired interactions.

The use of these penalty functions leads to faster solution finding, as an unnecessary exploration of areas of the action space is avoided in which either no meaningful gradient exists or the optimization only leads to suboptimal results.

Physics simulators play a crucial role in this project. They provide the environment for optimization. Two physics simulators were used in the development of this project. One is our in-house simulator called Uncertain Physics from the Computer Graphics and Virtual Reality (CGVR) group at the University of Bremen. The other is Warp from NVIDIA. In this chapter, we give an overview of the physics simulators we used for the development of this project. This chapter covers the basic idea, the collision model, and other specific information for each simulator.

Uncertain Physics [MWZ24] [WZ09] provides simple physics simulation on the CPU for rigid bodies in C++. Object shapes are approximated by filling them in with spheres. These Sphere packings are previously generated by an additional program with GPU support, although a less efficient method for generating Sphere packings at runtime on the CPU is also available for prototyping. Collisions are detected on the level of the individual spheres comprising the colliding objects and resolved by applying appropriate forces to the parent objects. A particular drawback of this approach is the high number of spheres required for good approximations of flat rectangular objects. Both a penalty-based approach, considering the overlap of spheres, and an impulse-based approach, considering velocities directly, are available for resolving collisions. Differentiability of the simulation had to be implemented during the project.

NVIDIA Warp [Mac22] is a framework for writing simulation and graphics code. The framework is based on Python and uses a Python to C++/CUDA compilation model, which enables the use of kernel calculations on GPUs in Python functions. Since the framework enables parallel optimization, this aspect was used in this project for fast simulation. Warp provides a simple and efficient path to an accelerated program for AI, robotics, and machine learning simulation.

Fig. 3. Overview of the kernel compilation pipeline in Warp [Cor25]

The framework uses Signed Distance Fields (SDFs) on a GPU as a collision model in the simulation. Since the framework utilizes the use of a GPU for collision calculation, graphics hardware is used to accelerate compilation. The implementation of graphics hardware represents a trade-off between the performance and quality of the collision, as a fast and robust calculation of Signed Distance Fields on a CPU is often a performance bottleneck or a nearly impossible task due to the size of the grid and the complexity of the mesh type. Computing collisions with the Signed Distance field collision model on a GPU resulted in an average speed-up factor of 6 compared to a CPU [ED08].

Differentiable programming is implemented in the framework. When the logic for the forward mode is generated, Warp can construct a gradient and pass the result back to the framework without any further implementation. For this method to work in this project, the forward calculation had to be written for the kernel; otherwise, the backward mode would not capture the graph, and the gradient would not be constructed.

Warp provides a built-in type for managing mesh data that supports mesh-related queries, such as overlap checks, closest-point, and ray cast. Since the framework enables mesh data management, Warp is able to construct meshes from various file formats, such as .obj files. In addition, several primitive meshes, i.e. cube, sphere, and cylinder, are initially implemented in Warp for quick implementation and testing.

Warp contains many implementations related to the capabilities of the framework, i.e. parallel computations for simulation and optimization problems. Various examples, such as simulation of fabrics, optimization of soft bodies, and simulation of robot joints, have been provided. These resources provide a starting point for developers interested in developing with Warp.

To summarize, NVIDIA Warp is a Python-based simulation framework that enables the use of GPU kernels. The framework uses signed distance fields on a GPU as a collision model, which enables fast collision calculation in the simulation. Differentiable simulation is implemented in the framework, and everything related to it is calculated within the GPU kernels. Warp provides mesh data management that allows the import of meshes from external files. It also provides examples for those interested in learning it. In this project, Warp was used to simulate the scene so that the simulation and optimization could be performed in parallel.

UE programming is done in C++ and includes classes, functions, interfaces, and variables. The Unreal Engine Reflection System encapsulates classes with a variety of macros that enable engine and editor functionality. The UClass macro is used to tag classes that are derived from UObject, the base class for objects in UE. The UFunction and UProperty macros can support specifiers in indicating how they should be used by UE while also providing optional access to them by Unreal Engine’s Blueprints. The Unreal Smart Pointer Library is a custom implementation of C++11 smart pointers that contains various smart pointer types that, depending on the type, prevent memory leaks, break reference cycles and dangling pointers, include optional thread safety code, and ensure runtime safety.

On top of the C++ scripts, UE provides the blueprint. This Visual Scripting system in Unreal Editor offers a node-based interface for visually creating blueprints. Event graphs are node graphs that use events as entry points and then connect function calls, variables with values or references to an object or actor, and flow control nodes like Branch, FlipFlop, or Switch Nodes to perform custom gameplay actions [Inc25a].

Fig. 4. Overview of OpenXR’s cross-platform pipeline [Inc25b]

The OpenXR runtime plugin in Unreal Engine implements the OpenXR APIs developed by the Khronos group. The OpenXR APIs, as seen in Figure 4, allow a solution to be easily ported and optimized to different platforms by standardizing common operations, such as accessing controller/peripheral states and obtaining tracking positions. It also controls platform operation and application lifecycle. These platforms include Oculus, Windows Mixed Reality, and SteamVR, each with its own set of plugins for managing platforms [Inc25b].

The SteamVR platform, Valve’s virtual reality headset system, is supported by Unreal Engine through the OpenXR runtime plugin. Like OpenXR, the SteamVR plugin works with various headsets, including Vive, Oculus, and Windows Mixed Reality. Naturally, OpenXR plugin applications work on devices that support OpenXR APIs, while SteamVR plugin applications work on devices that support SteamVR [Inc25a].

The goal of our implementation is to provide a robust VR application to interact with physical reasoning systems. To achieve this, our implementation leverages several applications, including the different modified physics simulators, Unreal Engine 5, and additional software such as Protosphere and a diverse set of libraries.

The high-level pipeline consists of the VR application (client) and the chosen physics backend (server). The staged scene is then encoded from VR in our custom scene format to be interpreted by the physics simulators, which simulate the scene and reply with their results.

In this section the different components and their implementation are explored in more detail.

We developed a custom reverse-mode automatic differentiation library for our specific use case of physics simulation, capable of handling operations involving Scalars, Vectors, Matrices, and Quaternions.

The library is built around a base class, from which all other classes inherit. This base class maintains references to a node’s parent nodes and tracks whether the node is part of an observed branch of the computational tree. During the forward pass, each operation not only computes the result but also stores the derivative of that operation. When one of the inputs to an operation is observed, the parent nodes are recorded in the child node. Since the nodes are implemented as shared pointers, any unused (unobserved) parent nodes are automatically deleted when no other observed objects reference them.

After completing the forward pass, we initiate the reverse pass to propagate the gradients. Before applying the chain rule across the tree, we first order the nodes to ensure that we apply the partial derivatives only to parent nodes for which all child nodes have been processed.

Given the varying dimensions of Scalars, Vectors, Matrices, and Quaternions, we ensure that the shape of the partial derivatives aligns with the shape of the corresponding values (for instance, a Vector3 has a Vector with size 3 as its gradient). Additionally, since we represent Quaternions internally as rotation matrices, the gradients of Quaternions are represented as 3x3 matrices, which are converted back to Quaternions at the end of the process.

Initially, Uncertain Physics had to be modified to support automatic differentiation. This can be achieved by replacing the basic numerical data types used in the simulation with data types supporting automatic differentiation through overloaded operators.

To allow flexible usage of different implementations of differentiable data types as well as usage without any overhead for differentiation where no gradient is required, the C++ template mechanism was used, similar to the Tiny Differentiable Simulator [Hei+21], to support various data types. In particular, the forward mode implementation of the Eigen Library, our custom reverse mode implementation, and data types without differentiation previously used by Uncertain Physics were considered during the project.

The information needed for an optimization run is read from an XML file. The XML file lists all objects and object poses in the scene and indicates the locations of their corresponding sphere packing files. This information is used to register these objects with Uncertain Physics for the simulation.

Also from the XML file, options for the optimization process are set. This includes specifying the object whose trajectory is to be optimized, the target pose as well as the number of iterations, the optimizer, and the loss function to use.

The loss value can either be calculated once for the final state of the simulation or once for every step, considering the state with the best value as the final loss. Additionally, the pose error can be calculated either by combining separate loss terms for position and orientation or by the average distance of model points. Because it is more expensive to calculate for complex meshes, the latter option is only practical for optimizing the final state.

During the optimization process for each step, the simulation is run for the specified time, either with forward or reverse mode differentiable data types to receive the gradient of the loss. Using this gradient, the input parameters are optimized using either Adam, BFGS, or gradient descent before resetting the simulation instance for the next step. An epsilon-greedy exploration strategy is applied to consider new initial guesses for the optimized parameters when there is little improvement over several steps, indicating that the optimization is approaching a local minimum.

In addition to optimizing for a single trajectory to a given target pose, a strategy to explore a given starting area was implemented to estimate the probabilities of starting positions in that area, given the end pose. These probabilities can then be visualized as a heatmap. The approach is sampling-based, dividing the starting area into a number of grid cells and beginning several optimization runs with different initial guesses for the velocity. If several optimization runs in one cell result in similar solutions for the initial velocities, only one is kept to avoid counting the same solutions several times. It is assumed that there is some simulation error, and therefore, solutions with loss values greater than zero could still represent an accurate solution in the real world. Therefore, solutions are weighted by the likelihood of their loss value being emitted by a normal distribution around zero. To further refine the weighting of solutions, slight variations of the initial velocity are explored for each solution to determine a radius within which it still results in low loss values.

Several Warp implementations were performed in this project. In this section, we present our implementations done in NVIDIA Warp and explain the process from receiving a message from the Unreal Engine to sending back the trajectories for the Unreal Engine to visualize.

First, the scene had to be imported from the Unreal Engine into Warp via the WebSocket. The message from the WebSocket contained the path to the XML file containing the scene description. The custom XML script loaded the objects of the scene and the simulation objective into the Warp model. This step also assigned the initial attributes besides position and rotation for each object, i.e. the initial velocity and whether it should be fixed in the scene.

Warp then created the model from the scene. The XPBD integrator [MMC16] was used for the simulation. The loss was then calculated in the forward calculation, and after the forward calculation was performed, the backward graph was captured. In forward mode, the calculation was performed on the GPU kernels, and the best loss was considered. Since, in Warp, almost every calculation needs to be done on the kernels, to retrieve the best loss during graph generation, we had to make sure that the loss value on the kernel was sent back to the CPU before the best loss was calculated. For this reason, we had to calculate the loss for each step on the kernel. Then, the best loss index was determined using the kernel. After that, we had to wait until the best index was sent back to the CPU. Finally, we accessed the best loss from the index we retrieved. We did this because Warp did not ensure synchronization between the CPU and the GPU during graph construction in the forward computation.

Then, the optimization began. Three optimization methods were implemented. These methods are gradient descent, Broyden–Fletcher–Goldfarb–Shanno (BFGS) [Bro70; Fle70; Gol70; Sha70], and Adam[KB14]. In addition to these optimization methods, the epsilon-greedy method was also implemented in Warp. During this optimization step, the trajectories of every step were constructed. In addition, the results from each optimization step, i.e. loss, velocity, and simulation duration, were also saved for later analysis.

After completing the optimization steps, the best trajectory with the lowest loss and the results, which were constructed during the optimization, were saved to a CSV file, and the best trajectory and the heatmap for the best starting position were sent back to the Unreal Engine via the WebSocket for visualization.

In short, to complete the simulation process in Warp, the scene and target were first imported into Warp, followed by the loss calculation and optimization. Once the optimization was complete, the best trajectory was sent back to Unreal in a CSV format for visualization.

For communication, we use a modular approach based on a custom data format and communication via Web Sockets.

We need to transfer an optimization problem from the front-end, from here on called a ”scene”, to the back-end. To improve the modularity of our process, we encode every scene in a custom XML format. This way, the origin of a scene and the physics backend can be chosen independently, and the encoding remains human-readable.

The XML stores information about the scene, such as the position, orientation, and velocity of every object, as well as parameters for the optimization, including the goal pose. Optional parameters include a custom loss function and penalty terms. Since the XML format is plain text, proprietary software is not required to edit a scene, although it is recommended to use a front-end for efficiency.

Every object in the XML also provides a filename or, optionally, a file path. These indicate where the meshes (.obj files) and sphere packings (.spheres files) are located. Their content is not embedded inside the XML, as different physics back-ends require different files. Additionally, this enables the meshes and sphere packings to be reused across multiple XML files, saving disk space.

To export a scene from the Unreal Engine application to our XML format, several processing steps are required.

First, the scene is analyzed to ensure that all necessary components are present. This includes a ”source” object, for which we want to determine the most probable trajectory. A ”target” object is also needed, as it defines the goal pose we aim to achieve.

Then, all objects are exported based on their tags, which are applied using Unreal Engine’s tag system. The tags convey crucial information about each object in a way that can be easily changed. We use tags to have an object-based source of information rather than having an external list. This is especially helpful because it also acknowledges changes done in the editor in addition to those done at runtime. The tags can be hard-coded in the Editor, selected in VR, or determined heuristically. For automatic detection, a basic Euclidean distance check is used. If objects in the scene are sufficiently close to the area around the target and source, they are selected. Additionally, an object can be marked as ”movable”, meaning it will not be fixed in the physics simulation and can be seen later in the trajectory replay. Each object also receives a tag to define its required level of geometrical detail.

By default, every selected object receives the lowest level of detail (LOD-5), which is the collision shape from Unreal Engine. All movable objects and those close enough to the important area are tagged as high-detail. However, a visibility check from the source object to each high-detail object is performed, converting occluded objects to medium-detail. This reduces the overall number of triangles by removing potentially redundant geometry. All LOD’s are precalculated by Unreal Engine using predetermined settings to maintain a comparable polygon count across LOD’s. The highest amount of detail (LOD-0) is never used, as based on our observations, it does not improve results, but dramatically slows down the physics simulation on a mesh-based backend such as Warp.

All objects are then added to the XML and saved to the project folder. The meshes are saved in .obj format by iterating over all vertices and faces of the object. For objects with LOD-5, the mesh needs to be built from the collision shape first. They receive a prefix to their name, which estimates how many spheres will be needed for the sphere packing. This calculation is based on a cubic function that takes the volume of an object in the world as input. Additionally, the LOD index is added.

To use the CollDet-based backend, we need to provide sphere packings. These can be created either on the CPU as an approximation at runtime or through preprocessing with Proto Sphere. For this, we created a script that uses the information from the file prefix to create each sphere packing in the project folder.

Fig. 5. Example representation of the scene XML

For communication between the front-end and back-end applications, we utilize Web Sockets. Every backend has its own port number to which the Unreal Engine can connect. After exporting the scene, the backend receives the file path to the scene XML. This interface was chosen to also support manually loading an XML scene with each backend without needing to run Unreal Engine.

After the simulation, the backend returns its solution, i.e. the trajectories for all movable objects in the scene or the probability heatmap. These can then be visualized in the Unreal Engine front-end.

To visualize the results of the simulation, heatmaps and trajectories are used. These were chosen to have an easily understandable representation that is also suited for VR use.

While both backends simulate hundreds of iterations for each request, we only select the iteration with the lowest loss for the visualization. Only the best is used, as using the n-best trajectories proved to be too much clutter for VR usage.

A trajectory is recorded by logging all objects’ positions and rotations in combination with their object ID for every time step in the simulation. To save performance, this is not done for every iteration. When the best iteration is known, it gets simulated again with trajectory logging enabled. The resulting log is in CSV format and can be passed over the WebSocket.

The Unreal Engine backend then detects it as a trajectory and constructs a spline-mesh from the CSV data. To show exactly what happened in the iteration, the source object is then animated along the trajectory. The animation interpolates between all time steps in the CSV for all objects that were logged. This means that not only is the main object moving, but also the physics objects that got impacted by it. The speed for this animation is based on which backend was chosen, as warp and uncertain physics use differently sized time steps for their simulation, and the resulting animation should act in real time.

Fig. 6. The trajectory with the best loss, visualized

Not only do we want to find a probable trajectory, but also where the main object was positioned before being moved. For many of the tested problems, there are several viable solutions. To better visualize this, we created a heatmap based on the loss of all iterations. The heatmap shows the average loss of each position in an area. Through this, probable positions can easily be understood by the user, with larger areas of the same color showing more robustness in the solution.

The heatmaps from both backends differ in their representation. The heatmap generation in Warp tries all iterations normally and interpolates the results. The heatmap generation in uncertain physics uses a multithreaded grid-search approach. This means the resulting heatmap is subdivided into regions of a certain size, for which different solutions are tried in parallel, while the resulting heatmap is constructed using statistical methods instead of the direct interpolation approach from warp.

A heatmap is generated by the backend that performs the simulation. It is then encoded in a PPM P3 format, which can be sent over a WebSocket. When receiving the message, the Unreal Engine backend detects it as a heatmap. Because the heatmap is created at runtime, it is not handled like a usual texture.

The heatmap uses a flat decal that is snapped onto the nearest surface below it. This approach avoids accidentally painting on objects that move above it as well. The heatmap gets parsed into an array of pixel values, which are then used to create a new dynamic material instance for the decal.

Fig. 7. A heatmap showing the most probable starting position based on lowest loss (blue)

The previous parts are integrated through VR, focusing on a smooth and straightforward user experience. We approach this by allowing users to interact in three ways: with grabbable actors, with notification UI, and with XML settings UI.

Fig. 8. Overview of the available action mapping (shown in green, blue, and red) with their implemented actions and assigned buttons (shown in purple and yellow)

Given the limited number of inputs on the HTC Vive controllers, we assigned specific inputs to execute multiple actions. To determine the actions associated with each button, we implemented a boolean system to toggle a variable's flag that we use in a manner similar to mapping contexts that define which button does which action, in addition to default controls that remained active, as shown in Figure 8. The user with default controls can teleport, turn, grab grabbable actors, switch the mapping context, and toggle an interactive menu for grabbable actors. With VR Pointer, the user can toggle the VR pointer, grasp grabbable actors at a distance, change the location modification scale, and change where a grabbed actor is at a distance. In WebSocket Communication, the user can toggle the connection to the server, send data to the server, and switch the backend server.

Actors are entities instantiated within a scene that possess data regarding their transformation, including position, rotation, and scale. Grabbable actors are defined as those that include a grab component, enabling users to grab them, modify their position and rotation, and subsequently release them.

The user can interact with the grabbable actor in three ways: transforming its position and rotation, viewing information about its name, position, rotation, and associated tag (none, source, or target), and assigning the desired target tag important for web socket communication with the backend part, as previously mentioned.

VR Grab at a Distance In addition to the existing VR grabbing functionality in Unreal’s VR projects, we intended to create a tool for users to easily make placements in hard-to-reach areas, such as the top of a kitchen cupboard. Consequently, we developed the VR Grab at a Distance tool, which starts with the VR pointer beam, which is developed using Niagara system’s dynamic beam. We modify each particle’s spawn lifetime, spawn count, and emitter state loop duration of the dynamic beam. We additionally adjust the beam’s width to ensure that it projects as a straight beam by using a variable that dynamically controls the beam’s length.

Utilizing a flip-flop switch to spawn and despawn the VR pointer beam from the motion controller’s left-hand aim point. We use a ray cast to obtain the initial break result that contains a grabbable actor and store it as a reference. By determining the location of the targeted grabble actor, we can adjust the length of the VR pointer beam to either a predetermined maximum length or the distance from the spawn point of the VR pointer beam to the position of the pointed grabble actor.

The pointed grabbable actor can be grabbed and ungrabbed at a distance via the VR pointer beam. When grabbed, the actor is aligned with the end of the VR pointer beam, and the distance at which the actor is positioned can be manually adjusted along the trajectory from the spawn point to the maximum length of the VR pointer beam. The scale of this distance modification can be altered, allowing for both minute adjustments for precision and substantial adjustments to speed up the placement of the grabbed actor.

Interactive menu The interactive menu enables the user to view information about the pointed grabbable actor and to assign the Target tag. The menu is created via Unreal’s Widget Blueprint. Utilizing the widget blueprint editor, we developed user interface elements to exhibit text and the target assignment button. The text display is configured by referencing the specified grabbable actor, which is provided to the menu every half second and verified as a valid input (excluding the selector class). If found acceptable, we extract the actor’s name, position, rotation, and tag, storing them as variables within the menu’s widget blueprint to assign to a format text node, which takes these inputs and outputs the displayed text. The interactive menu is toggled using default controls and displayed above the user’s right hand in VR.

Custom events are nodes that can be created by developers to be called to perform a series of actions in response to certain events that occur within the scene, which we needed to accommodate when the user is notified about events occurring due to user actions, such as modifying the location modification scale, or when receiving a response from the Web Socket communication pipeline.

We implemented a notification system using Unreal’s widget blueprints. With the widget blueprint editor, we created UI elements with custom animations to display these notifications as pop-ups in the screen’s top left corner. Each animation is called by a custom event in the widget blueprint graph. These custom events are triggered by a (switch on string) node, which was used to check every half second if an animation flag has been set. The setter and getter functionality for the animation flag are located in the WebSocketActor class, which is where the majority of the animation flags are set.

As previously mentioned, the XML file defines the optimization process parameters. As shown in Figure 9, we aimed to allow users to provide specific parameters within the XML format. The user can specify the number of iterations to be simulated by the backend, the optimizer used, whether the heatmap is requested, and the specification of the heatmap’s decal dimensions.

Fig. 9. The XML settings screen with options for the optimization process

The XML settings were implemented using Unreal’s Widget Blueprint. The widget blueprint editor was used to develop UI texts with an interactive checkbox and sliders, which activated an event in the widget blueprint graph when a value was modified. The triggered events were called to retrieve the parameters from the XML using the GameInstance; these parameters were subsequently modified and updated using the XML settings. When modifying the optimizer, a (switch on int) node is employed to adjust the iteration slider settings. This included both a minimum and maximum value for the iteration slider, in addition to a specified step size. The readjustment of the slider depended on the optimizer and the physics simulator used.

Video 2: Interaction within the VR environment

Fig. 10. Comparison of loss and duration of iterations with Adam, BFGS, and GD in Uncertain Physics. Additionally, durations for Adam with reverse mode differentiation are shown for comparison.

Fig. 11. Comparison of development of iteration durations during optimization runs for different optimizers for Uncertain Physics.

In terms of average runtime per iteration, as seen in 10, all three optimizers produced similar results, with BFGS iterations taking slightly longer. Since the computational complexity of the optimization step should be negligible compared to the simulation run, the differences between optimizers and the variance between iterations can be explained by the exploration of different situations in the simulations, where complexity chiefly stems from collisions. Specifically, this effect is demonstrated by the high duration in the initial iteration of all three optimizers as seen in 11, where the object begins directly on top of a surface with no initial velocity to quickly resolve the collision. Compared to the forward mode automatic differentiation, iterations using the reverse mode take significantly longer on average. Because of the relatively small number of optimized parameters, this result is not unexpected, as the reverse mode implementation requires additional memory management for each operation.

Fig. 12. Comparison of development of loss values during optimization runs for different optimizers for Uncertain Physics.

After 1000 iterations, the lowest observed loss values differ markedly between optimizers. Both BFGS and Adam find relatively close solutions, with BFGS reaching the lowest loss, while gradient descent fails to find a satisfactory solution. BFGS also finds the best loss values after 100 and 500 iterations. Looking at the progress of loss values over the iterations, for all optimizers, the common pattern of optimization is hitting a plateau, followed by a sharp increase in loss value due to re-initializing with a different initial guess according to the employed epsilon-greedy exploration strategy. For both Adam and gradient descent, outliers and stretches with oscillating loss values emerge, which occur where the step size of the optimizer is too large to properly follow the gradient. In contrast, the optimization with BFGS is much smoother because a line search is employed to find the proper step size. This also allows BFGS to take very large steps if the gradient does not change rapidly.

Overall, the best results are achieved with the BFGS optimizer. While the average time per iteration is slightly higher because the optimization stays close to the initial state with many collisions and small velocity, this also yields good results early in the optimization run. Plain gradient descent seems unsuitable for optimization as the fixed step size cannot effectively deal with changing magnitudes of the gradient. The use of reverse mode automatic differentiation is also not suitable for the amount of optimized parameters.

Fig. 13. Comparison of the loss and the duration of all episodes with the optimizers Adam, BFGS, and GD with different numbers of total iterations

Figure 13 shows the distributions of the runtime of individual iterations as well as the observed loss values of the optimizers Adam, BFGS, and gradient descent (GD) for 100, 500, and 1000 iterations, respectively. Both diagrams use logarithmic scaling in order to better visualize the sometimes large differences between the values.

In terms of runtime, it can be seen that the Adam optimizer requires the most time on average per iteration. The GD optimizer is the fastest, especially with a low number of iterations. However, as the number of iterations increases, its average runtime increases and eventually exceeds that of BFGS. The BFGS optimizer remains comparatively constant in runtime across all iteration numbers, while Adam is consistently slower but shows no significant increase with the number of iterations. The high scatter (including outliers) for Adam is also striking, which indicates a certain instability in the execution time.

Figure 14 demonstrates the duration of each iteration for 1000 iterations. One aspect that can be perceived from the figure is that the first few iterations had a large amount of runtime. This might be because of the initialization and memory allocation inside the GPU.

Fig. 14. Comparison of the duration of single iterations over a 1000-iteration run

The loss values after a complete run differ, sometimes significantly, depending on the optimizer and the number of iterations. After only 100 iterations, Adam and GD did not achieve convincing results - the loss values are in a relatively narrow range, which indicates that the optimization is not yet sufficiently advanced. BFGS, on the other hand, already shows a high variance of the loss values at 100 iterations, which could be due to the sensitivity to the selected step size. Nevertheless, BFGS also achieves the best (lowest) loss value in this group.

After 500 iterations, the variance of the loss values is similar for all optimizers. However, the Adam optimizer stands out positively here due to a larger number of lower loss values. At 1000 iterations, Adam achieves the best results overall. This is probably due to the continuous adaptation of the learning rate in combination with the ability to deal efficiently with different initial values.

In Figure 15, we see the loss of the optimizers (Adam, BFGS, and GD). For these plots, we can see the differences in the optimization courses. Due to the exploration strategy (epsilon greedy) and environmental interaction, discontinuities are visible as hard edges throughout the iteration.

In the graph from Adam, we see that optimization contains downward trends. Also, oscillation is visible at some point during the run. This is probably due to the inertia effect, where the momentum of the optimizer results in an overshoot of the compensation, which leads to the optimization step in a gradient point in the opposite direction.

In the graph from BFGS, some spikes during the run are visible. This is due to the exploration of the step size of the algorithm.

In the graph from GD, the trend of the graph was stair-like. This was because there is no momentum in the algorithm, so there is no compensation applied.

Fig. 15. Comparison of the loss of single iterations over a 1000-iteration run

Adam proves to be the most effective optimizer in terms of loss minimization for longer optimization runs (500 iterations or more), although it has the longest runtime per iteration. GD scores with its speed for short runs, but scales less well with an increasing number of iterations. BFGS delivers robust results with a stable runtime, but sometimes shows strongly fluctuating results in the loss quality, especially with a low number of iterations.

TABLE I
Overview of loss values and average iteration duration after 1000 iterations for Warp and Uncertain Physics (forward mode) with different optimizers. Loss values are not fully comparable due to different scaling factors.


Simulator	Uncertain Physics			Warp

Optimizer	BFGS	Adam	GD	BFGS	Adam	GD

Best Loss	0.95	2.46	10.80	5.99	4.14	14.53

Average Duration(s)	0.10	0.08	0.08	0.54	0.63	0.54

Table I shows the best loss values and the average duration of iterations for the implemented optimizers in Uncertain Physics and Warp. The loss values are not directly comparable between Uncertain Physics and Warp due to different scaling factors of object sizes and positions, as well as different handling of the orientation component.

Overall, optimization with either BFGS or Adam can achieve useful results both in Uncertain Physics and in Warp, while GD proves comparatively unsuitable with both simulators, likely due to similarly unstable gradients where collisions are involved.

For the considered optimization problem, Uncertain Physics needs significantly less time per iteration for all optimizers when using forward mode automatic differentiation. Warp, however, only supports reverse mode differentiation, where it has a clear advantage over the reverse mode implementation in Uncertain Physics.

In conclusion, we implemented a framework for differentiable physics simulation linked to a VR interface. For this, we investigated the theoretical background of automatic differentiation and created our own implementation of it. We compared our implementation in state-of-the-art physics simulators with respect to efficiency, accuracy, and flexibility. With this project, we created an interactive VR-based software that enables users to experiment with physical reasoning while also acting as a benchmark to compare different physical simulators and physical reasoning problems.

The comparison of runtime for forward and reverse mode automatic differentiation is not comprehensive, as only one implementation for each approach was tested. Furthermore, collisions were not always properly resolved, passing through each other because timesteps were too large or resulting forces were too small. To solve this, proper parameters and scaling factors for the simulation had to be experimentally determined for the considered scenes.

During the development of this project, several limitations were noticed in the implementation of Warp.

One limitation was that the requirement of VRAM in the particular setting was very large. Since, in this project, the framework mainly used the GPU for the main operation of automatic differentiation, and each sub-step of the operation had to be stored in GPU memory to construct the backward graph, a large amount of VRAM was required to simulate the scene. Since the amount of VRAM consumed was relative to the number of sub-steps and the number of objects in the scene, we were not able to simulate the large scene containing many objects with a higher number of sub-steps with the hardware that we used.

Another reason is that most of the implementation was still sequential. Since the kernel could only receive certain types of variables as parameters, parallelization, which is the advantage of using GPU kernels, could not be performed efficiently in many parts. For this reason, the expectation for the simulation was sequential rather than parallel.

Finally, the process of graph acquisition during forward implementation caused additional time due to GPU and CPU synchronization. According to our observation, the framework seemed to have a synchronization problem during forward computation. For this reason, in order to construct the loss function correctly, we had to implement a way to wait until the variables were synchronized, which resulted in an additional time overhead.

The utilized communication format is TCP over WebSocket. This has the benefit of being widely supported, but is limited in its functionality without adding additional complexity. This becomes evident when different kinds of messages are exchanged over the connection. The receiving end needs to understand what kind of message it is and how to proceed with it. To implement this on a larger scale, certain communication rules need to be established to denote a message type or how to proceed with its contents. Additionally, WebSocket messages are limited in size to around 125kb in our case, which leads to the transfer of the file path being the only option for files exceeding this size. This could be mitigated by splitting the data into multiple messages, which again would need a shared set of communication rules.

Another limitation of Unreal Engine for our use case is that the levels of detail (LODs) do not satisfy our full requirements. The LODs give percentages and a limit of triangles for each level of detail. However, having a density of triangles would be a more fitting heuristic, which is not natively supported by Unreal Engine. Therefore, in the current implementation, objects with varying sizes might have too few or too many triangles after processing.

To properly assess the suitability of forward and reverse mode automatic differentiation, running further experiments with other automatic differentiation libraries and scenes with more parameters needing to be optimized could be of interest. Since the high runtime for reverse-mode is related to memory management, experiments on different hardware might also yield different results. To ensure that the developed framework works consistently with different scenes, methods for dynamically choosing proper parameters for the simulation, like the stiffness or scaling of objects, should be investigated, either through machine learning or manually determining good heuristics for these values.

When implementing Warp for the optimization problem, there are a few aspects to consider.

Since Warp has to capture everything in VRAM by default, which leads to high VRAM consumption, and the devices used in this project contain a limited amount of usable VRAM, it is recommended to run the simulation on better hardware with a higher amount of VRAM for testing out the simulation in a more complex scene.

In addition to testing on the more powerful machine, another interesting idea was to run everything on the CPU instead of the GPU. Since parallelization on the GPU was not used effectively in this project due to the limitations of gradient capture and the restrictive kernel parameter type, comparing the general conditions between the CPU and GPU for each setting could be an interesting topic to investigate. By testing this out, the time overhead for synchronization between the GPU and the CPU could also be explored.

To improve the system to be more adaptable, shifting away from using Actors in the scene, i.e., the WebSocketActor, to using a so-called GameInstance should be considered. Because the GameInstance does not need to be prepared for a certain environment, it enables portable use of its functions. By moving the functionality away from placed actors in the scene, arbitrary environments could be used without requiring them to be prepared beforehand. This also extends to the blueprint scripts, which in the current implementation are in part attached to the environment as well.

Furthermore, integrating additional interactions to alter the environment through VR would improve the versatility of the software for different scenarios. This could include spawning objects or selecting which objects are fixed/movable in VR.

The current project requires manual execution of Warp or Uncertain Physics prior to launching the VR application, which may add a layer of difficulty for users. Future work on the WebSocket communication could significantly automate this process and ease the execution of the software.