EXPLORING REPRESENTATIONS FOR 3D RECONSTRUCTION FROM IMPAIRED REAL-WORLD DATA

Miller, Erik-LearnedSelvaraju, Pratheba2025-04-042025-04-042025-0210.7275/56002https://hdl.handle.net/20.500.14394/560023D reconstruction from real-world data is essential in applications like augmented reality, robotics, medical imaging, and autonomous navigation. However, this data is often noisy, incomplete, occluded, or corrupted. Despite these imperfections, utilizing this data is necessary to develop reconstruction methods that can be applied in real-world scenarios. Each application comes with unique requirements and constraints, making it important to select representations tailored to the specific characteristics. Recognizing that our world is primarily composed of two types of objects, static (rigid) and dynamic (non-rigid) body structures, this thesis focuses on reconstruction tasks by exploring representations best suited to each type, ensuring adaptability to applications with similar characteristics, rather than reinventing wheels for each case. We focus on static structure reconstruction in fabrication industries that produce real-world products. It often deals with non-malleable materials which has zero-gaussian curvature property. To address reconstruction with this property constraint, we introduce Developability Approximation for Neural Implicits through Rank Minimization, a neural network model that represents surfaces as piecewise zero-gaussian curvature patches. The model encodes data implicitly, offering an advantage over prior explicit methods that struggle with high tessellation and shape fidelity. Applying this method to large-scale urban planning requires understanding building structures which is made of several different components with different non-malleable materials. Thus, automatically identifying these components becomes essential. To this end, we created a large-scale dataset of 2,000 diverse building exteriors (e.g., residential, commercial, stadium) named BuildingNet. Using this dataset, we developed a Graph Neural Network (GNN) model to automatically label building components. Next, we explore dynamic object reconstruction, focusing on human faces, by introducing OFER: Occluded Face Expression Reconstruction. OFER reconstructs expressive human faces from occluded images. Occlusion introduce new sources of ambiguity in hidden regions, requiring multi-hypotheses solution. Toward this, OFER employs a parametric face model and trains hybrid UNet-Attention diffusion models to generate diverse expression coefficients. This representation ensures smooth, plausible reconstructions with integrity to the visible parts and ease of animatability through simple parameter adjustments. In facial animation, real-time performance is crucial for applications like gaming and augmented reality, which require computational efficiency while preserving high quality. Traditional UNet-based diffusion models often suffer from slower temporal coherence and long range sequence, while attention computation results in computational overhead and slower inference time. To tackle this, we explore efficient computational representations and introduce FORA: Fast-Forward Caching for Diffusion Transformer Acceleration. FORA employs a caching mechanism that reuses intermediate outputs, thereby minimizing computational overhead without requiring model retraining, enabling faster processing with minimal trade-offs in quality.en-USAttribution-NonCommercial-ShareAlike 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-sa/4.0/3D reconstructionrepresentationsEXPLORING REPRESENTATIONS FOR 3D RECONSTRUCTION FROM IMPAIRED REAL-WORLD DATADissertation (Open Access)https://orcid.org/0009-0008-7311-5470