Nerfing along

NeRFs provide many benefits for 3D content: the rendering looks natural while the implementation is flexible. So I wanted to get hands on, and build myself a NeRF. I wanted to understand what’s possible to reproduce in 3D from just a spontaneous video capture. I chose a handheld holiday video from an old iPhone X while cycling on beautiful Maria Island.

Video taken while cycling on Maria Island

The camera moves along a fairly straight path, pointing a little right of the direction of travel. This contrasts with NeRFs or scans of objects, where the camera may do one or more full orbits of the object to get every perspective and thus produce seamless renders and clean models. I expect 3D generated from the video above will be missing some detail.

My aim was to build a NeRF from the video, render alternative camera paths, explore the generated geometry, and understand the application and limitation of the results. Here’s the view from one alternative camera path, which follows the original path at first, and then swings out to the side.

Alternative camera path rendered from NeRF

Worfklow overview

I used nerfstudio via their Colab notebook running on Colab Pro with GPU to render the final and intermediate products. The table below lists the major stages, tools and products.

StageToolProduct
Process video datans-process-data video via COLMAPImages of each frame (png) with inferred camera poses (json)
Train NeRFns-train nerfactoNeRF configuration data including final model checkpoint (ckpt)
Define camera pathsnerfstudio viewerCamera path definition based on keyframes (json)
Render videosns-renderNovel video of the NeRF scene (mp4)
Export geometryns-export pointcloudPoint cloud with surface colour and estimated normals (ply)
Consume geometryMeshlabVisualised pointcloud
nerfstudio workflow overview

For reference, I consumed about 3 Colab Pro “compute units” with one end-to-end train and render (6s 480p 60fps video), but including running the install steps (for transient runtimes) and doing multiple renders on different paths has consumed about 6 “compute units” per NeRF.

Workflow details

Here’s a more detailed walkthrough. There are lots of opportunities to improve.

Process video data

This stage produces a set of images from the video, corresponding to each requested frame, and uses COLMAP to infer the pose of each image. The video was 480p and 6s at 60fps. This processed data is suitable for training a NeRF. The result is visualised below in the nerfstudio viewer.

Posed video frames

I used the `sequential option for video but haven’t evaluated any speedup. I’m not having much luck with specifying the number of frames via the command line parameter either. The resultant files could be zipped and stored outside the Colab instance (locally or on Drive) for direct input to the training stage.

Train NeRF

The magic happens here. The nerfstudio viewer provides live exploration of the radiance field as it is progressively refined through training. The landscape was recognisable very early on in the training process and it was hard to discern improvements in the later stages (at least when using the viewer interactively).

The trained model can also be zipped and stored outside the Colab instance for direct input into later stages.

Define camera paths

I defined one camera path to initially follow the camera’s original trajectory and then deviate significantly to show alternative perspectives and test the limits of scene reconstruction. This path is shown below.

Deviating camera path

I also defined a second path that reversed the original camera trajectory. I downloaded these camera paths for reuse.

Render videos

Rendering the deviating path (video above), the originally visible details are recreated quite convincingly. Noise is visible when originally hidden details are exposed, and also generally around the edges of the frame. I would like to try videos from cameras with a wider field of view to see how much more of the scene they capture.

The second, reversed, path (below) also faithfully reconstructs visible objects, but with some loss of fidelity due to the reversed camera position, and displays more of noise outside the known scene.

Reversed camera path rendered from NeRF

Export geometry

I ran ns-export pointcloud and chose to add estimated normals to the export. I downloaded the ply file to work with it locally.

Consume geometry

Meshlab provides a nice visualisation of the point cloud out of the box, including the colour of each point and shading by estimated normal, as below.

Meshlab visualisation of exported point cloud

Meshlab provides a wide range of further processing tools, such as surface reconstruction. I also tried FreeCAD and Blender. Both imported and displayed the point cloud but I couldn’t easily tune the visualisation to look as good as above.

Next steps

I’d like to try some more videos, and explore how to better avoid noise artefacts in renders.


Posted

in

,

by