BlenderAlchemy

Tldr; By treating 3D graphics edits as visual programs and scaling inference compute appropriately, an off-the-shelf VLM can act as a graphics agent within Blender!

Abstract

Graphics design is important for various applications, including movie production and game design. To create a high-quality scene, designers usually need to spend hours in software like Blender, in which they might need to interleave and repeat operations, such as connecting material nodes, hundreds of times. Moreover, slightly different design goals may require completely different sequences, making automation difficult. In this paper, we propose a system that leverages Vision-Language Models (VLMs), like GPT-4V, to intelligently search the design action space to arrive at an answer that can satisfy a user's intent. Specifically, we design a vision-based edit generator and state evaluator to work together to find the correct sequence of actions to achieve the goal. Inspired by the role of visual imagination in the human design process, we supplement the visual reasoning capabilities of VLMs with "imagined" reference images from image-generation models, providing visual grounding of abstract language descriptions. In this paper, we provide empirical evidence suggesting our system can produce simple but tedious Blender editing sequences for tasks such as editing procedural materials and geometry from text and/or reference images, as well as adjusting lighting configurations for product renderings in complex scenes.

Editing 3D Graphics as Visual Program Refinement

To perform edits within the Blender 3D design environment, BlenderAlchemy iteratively refines a program that defines a sequence of edits within Blender. This is done using our visual program refinement procedure, which is composed of an edit generator G and a state evaluator V, which iteratively generates and selects among different edit hypotheses, respectively. Both the edit generator and the state evaluator are guided by an input user intention, specified using a combination of language and reference images, either provided or hallucinated using an text-to-image generator within the Visual Imagination module. At each step, we allow for the system to revert back to the edit hypothesis from a previous iteration.

In the context of the material editing task, consider the task of transforming a wooden procedural material into marbled granite. The following is an illustrative sample of a sequence of edit generation and state selection steps.

Materials

Using this system, we can edit procedural materials using language descriptions. We show a few samples below, edited based on the wooden material on the left.

Below, we show the application of a set of materials synthesized by BlenderAlchemy on a diverse set of scenes based off of assets created by 3D artists. BlenderAlchemy is capable of producing usable materials guided by language descriptions and also generating variations of the same kind of material.

Old metal (Original)

Ice slats

Surface of the sun

The code edits synthesized by BlenderAlchemy represent changes to the procedural material graph of the input procedural material, ranging in changes in continuous values, node connectivity, and node types. Take the following example of editing a procedural wood material (top) into marbled granite (bottom), using the language description shown below:

Geometry

Certain geometry edits can be expressed in code as well, and are therefore amenable to edits from BlenderAlchemy. For procedural geometry, we can convert geometry nodes into a python script using a similar process as the one for materials. In the following example, BlenderAlchemy edits a subset of 60 geometry nodes to make the roses "bloom".

Blend shapes can also be expressed as a python program, which assigns coefficients to different blend shapes. In the example below, BlenderAlchemy edits the blend coefficients to create different facial expressions to match the emotion behind hypothetical script lines, using 19 facial blend shapes.

Lighting

Since BlenderAlchemy works by editing programs, BlenderAlchemy can also change lighting configurations within scenes, since the parameters of each lighting source can be programmatically represented. Using the same method as for material editing, we can iteratively synthesize lighting setups that match a certain language description by cycling between automatically generating candidates of lighting candidates and selecting among them.

Employing BlenderAlchemy iteratively between optimizing lighting and materials allow a user to tweak both in the input scene to satisfy their desired intention.

Citation

If you found the paper or code useful, please cite:

@inproceedings{huang2024blenderalchemy,
  title={Blenderalchemy: Editing 3d graphics with vision-language models},
  author={Huang, Ian and Yang, Guandao and Guibas, Leonidas},
  booktitle={European Conference on Computer Vision},
  pages={297--314},
  year={2024},
  organization={Springer}
}

BlenderAlchemy Editing 3D Graphics with Vision-Language Models