NeRFs provide many benefits for 3D content: the rendering looks natural while the implementation is flexible. So I wanted to get hands on, and build myself a NeRF. I wanted to understand what’s possible to reproduce in 3D from just a spontaneous video capture. I chose a handheld holiday video from an old iPhone X while cycling on beautiful Maria Island.
The camera moves along a fairly straight path, pointing a little right of the direction of travel. This contrasts with NeRFs or scans of objects, where the camera may do one or more full orbits of the object to get every perspective and thus produce seamless renders and clean models. I expect 3D generated from the video above will be missing some detail.
My aim was to build a NeRF from the video, render alternative camera paths, explore the generated geometry, and understand the application and limitation of the results. Here’s the view from one alternative camera path, which follows the original path at first, and then swings out to the side.
Worfklow overview
I used nerfstudio via their Colab notebook running on Colab Pro with GPU to render the final and intermediate products. The table below lists the major stages, tools and products.
Stage | Tool | Product |
Process video data | ns-process-data video via COLMAP | Images of each frame (png) with inferred camera poses (json) |
Train NeRF | ns-train nerfacto | NeRF configuration data including final model checkpoint (ckpt) |
Define camera paths | nerfstudio viewer | Camera path definition based on keyframes (json) |
Render videos | ns-render | Novel video of the NeRF scene (mp4) |
Export geometry | ns-export pointcloud | Point cloud with surface colour and estimated normals (ply) |
Consume geometry | Meshlab | Visualised pointcloud |
For reference, I consumed about 3 Colab Pro “compute units” with one end-to-end train and render (6s 480p 60fps video), but including running the install steps (for transient runtimes) and doing multiple renders on different paths has consumed about 6 “compute units” per NeRF.
Workflow details
Here’s a more detailed walkthrough. There are lots of opportunities to improve.
Process video data
This stage produces a set of images from the video, corresponding to each requested frame, and uses COLMAP to infer the pose of each image. The video was 480p and 6s at 60fps. This processed data is suitable for training a NeRF. The result is visualised below in the nerfstudio viewer.
I used the `sequential
option for video but haven’t evaluated any speedup. I’m not having much luck with specifying the number of frames via the command line parameter either. The resultant files could be zipped and stored outside the Colab instance (locally or on Drive) for direct input to the training stage.
Train NeRF
The magic happens here. The nerfstudio viewer provides live exploration of the radiance field as it is progressively refined through training. The landscape was recognisable very early on in the training process and it was hard to discern improvements in the later stages (at least when using the viewer interactively).
The trained model can also be zipped and stored outside the Colab instance for direct input into later stages.
Define camera paths
I defined one camera path to initially follow the camera’s original trajectory and then deviate significantly to show alternative perspectives and test the limits of scene reconstruction. This path is shown below.
I also defined a second path that reversed the original camera trajectory. I downloaded these camera paths for reuse.
Render videos
Rendering the deviating path (video above), the originally visible details are recreated quite convincingly. Noise is visible when originally hidden details are exposed, and also generally around the edges of the frame. I would like to try videos from cameras with a wider field of view to see how much more of the scene they capture.
The second, reversed, path (below) also faithfully reconstructs visible objects, but with some loss of fidelity due to the reversed camera position, and displays more of noise outside the known scene.
Export geometry
I ran ns-export pointcloud and chose to add estimated normals to the export. I downloaded the ply
file to work with it locally.
Consume geometry
Meshlab provides a nice visualisation of the point cloud out of the box, including the colour of each point and shading by estimated normal, as below.
Meshlab provides a wide range of further processing tools, such as surface reconstruction. I also tried FreeCAD and Blender. Both imported and displayed the point cloud but I couldn’t easily tune the visualisation to look as good as above.
Next steps
I’d like to try some more videos, and explore how to better avoid noise artefacts in renders.