Immersive scene generation, notably panorama creation, benefits significantly from the adaptation of large pre-trained text-to-image (T2I) models for multi-view image generation. Due to the high cost of acquiring multi-view images, tuning-free generation is preferred. However, existing methods are either limited to simple correspondences or require extensive fine-tuning to capture complex ones. We present PanoFree, a novel method for tuning-free multi-view image generation that supports an extensive array of correspondences. PanoFree sequentially generates multi-view images using iterative warping and inpainting, addressing the key issues of inconsistency and artifacts from error accumulation without the need for fine-tuning.
Overview of our PanoFree method, taking 360° Panorama Generation as an example. (a): At a framework level, PanoFree adopts two generation paths with opposite viewpoint translation or rotation. It enhances consistency by symmetrically selecting views from the other path as guidance to generate a new view. Loop closure is ensured by merging these two paths. (b): In each warping and inpainting step, PanoFree reduces accumulated error by guiding the inpainting process with cross-view images, along with estimating and erasing risky areas.
PanoFree can generate multi-view images according to different types of correspondences without fine-tuning, and a natural application is tuning-free generation for different types of panoramas. We demonstrate this by generating three commonly used panoramas. Top: Planar Panorama; Middle: 360° Panorama; Bottom: Full Panorama.