Celebrating Shubham Goel (Research Scientist, Avataar) and his esteemed co-authors' innovative work on the SAP3D project as they present their paper at CVPR 2024!
The paper, "The More You See in 2D, the More You Perceive in 3D," presents a system for unposed 3D reconstruction and novel view synthesis. This collaborative research with UC Berkeley adapts a pre-trained model at test time to build a specific 3D understanding of the object, enhancing its 3D perception with each additional image—akin to how human vision works.
Congratulations again to Shubham and the team!
#CVPR2024#computervision
Engineer at AIMonk Labs || Crafting Stable AI Products and Enhancing Software Aesthetics || Enthusiastic about Robotics and Cutting-Edge AI Developments || Sharing the Hottest Trends in Artificial Intelligence.
🚨 CVPR 2024 (Oral) Paper Alert 🚨
➡️Paper Title: MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild
🌟Few pointers from the paper
🎯In this paper authors have presented MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos.
🎯Reconstructing multiple individuals moving and interacting naturally from monocular in-the-wild videos poses a challenging task. Addressing it necessitates precise pixel-level disentanglement of individuals without any prior knowledge about the subjects.
🎯Moreover, it requires recovering intricate and complete 3D human shapes from short video sequences, intensifying the level of difficulty.
🎯 To tackle these challenges, authors first define a layered neural representation for the entire scene, composited by individual human and background models. They learned the layered neural representation from videos via their layer-wise differentiable volume rendering.
🎯This learning process is further enhanced by their hybrid instance segmentation approach which combines the self-supervised 3D segmentation and the promptable 2D segmentation module, yielding reliable instance segmentation supervision even under close human interaction.
🎯A confidence-guided optimization formulation is introduced to optimize the human poses and shape/appearance alternately.
🎯They incorporated effective objectives to refine human poses via photometric information and impose physically plausible constraints on human dynamics, leading to temporally consistent 3D reconstructions with high fidelity.
🏢Organization: ETH Zürich, Microsoft
🧙Paper Authors: Zeren J., Chen Guo, Manuel Kaufmann, Tianjian Jiang , Julien Valentin, Otmar Hilliges, Jie Song
1️⃣Read the Full Paper here: https://lnkd.in/gDUzF63a
2️⃣Project Page: https://lnkd.in/g_UVYcqN
3️⃣Code: Coming 🔜
🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊
🎵 Music by raspberrymusic from Pixabay
Find this Valuable 💎 ?
♻️REPOST and teach your network something new
Follow me 👣, Naveen Manwani, for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.
#CVPR2024#3D
Engineer at AIMonk Labs || Crafting Stable AI Products and Enhancing Software Aesthetics || Enthusiastic about Robotics and Cutting-Edge AI Developments || Sharing the Hottest Trends in Artificial Intelligence.
🚨CVPR 2024 Paper Alert 🚨
➡️Paper Title: WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
🌟Few pointers from the paper
🎯The estimation of 3D human motion from video has progressed rapidly but current methods still have several key limitations.
🎯First, most methods estimate the human in camera coordinates. Second, prior work on estimating humans in global coordinates often assumes a flat ground plane and produces foot sliding. Third, the most accurate methods rely on computationally expensive optimization pipelines, limiting their use to offline applications. Finally, existing video-based methods are surprisingly less accurate than single-frame methods.
🎯Authors in this paper address these limitations with “WHAM (World-grounded Humans with Accurate Motion), which accurately and efficiently reconstructs 3D human motion in a global coordinate system from video.
🎯 WHAM learns to lift 2D keypoint sequences to 3D using motion capture data and fuses this with video features, integrating motion context and visual information.
🎯WHAM exploits camera angular velocity estimated from a SLAM method together with human motion to estimate the body’s global trajectory.
🎯They combined this with a contact-aware trajectory refinement method that lets WHAM capture human motion in diverse conditions, such as climbing stairs.
🎯WHAM outperforms all existing 3D human motion recovery methods across multiple in-the-wild benchmarks.
🏢Organization: Carnegie Mellon University, Max Planck Institute for Intelligent Systems
🧙Paper Authors: Soyong Shin, Juyong Kim, Eni Halilaj, Michael J. Black
1️⃣Read the Full Paper here: https://lnkd.in/gn7jUUw2
2️⃣Project Page: https://lnkd.in/gyD6abRD
3️⃣Code: https://lnkd.in/gy9dpvCr
🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊
🎵 Music by Denys Kyshchuk from Pixabay
Find this Valuable 💎 ?
♻️REPOST and teach your network something new
Follow me 👣, Naveen Manwani, for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.
#CVPR2024
Engineer at AIMonk Labs || Crafting Stable AI Products and Enhancing Software Aesthetics || Enthusiastic about Robotics and Cutting-Edge AI Developments || Sharing the Hottest Trends in Artificial Intelligence.
🚨Paper Alert 🚨
➡️Paper Title: POCO: 3D Pose and Shape Estimation using Confidence
🌟Few pointers from the paper
🐚Currently, most 3D Human Pose and Shape (HPS) regressors, do not report the confidence of their outputs, meaning that downstream tasks [ human action recognition or 3D graphics] cannot differentiate accurate estimates from inaccurate ones.
🐚To address this, the authors have developed “POCO”, a novel framework for training HPS regressors to estimate not only a 3D human body, but also their confidence, in a single feed-forward pass.
🐚Specifically, POCO estimates both the 3D body pose and a per-sample variance. The key idea is to introduce a Dual Conditioning Strategy (DCS) for regressing uncertainty that is highly correlated to pose reconstruction quality.
🐚The POCO framework can be applied to any HPS regressor and here they evaluated it by modifying HMR, PARE, and CLIFF.
🐚In all cases, training the network to reason about uncertainty helps it learn to more accurately estimate 3D pose.
🏢Organization: Max Planck Institute for Intelligent Systems, T¨ubingen, Germany, Inria, ´Ecole normale sup´erieure, CNRS, PSL Research University, France, University of Amsterdam, the Netherlands
🧙Paper Authors: Sai Kumar Dwivedi, Cordelia Schmid, Hongwei Yi, Michael J. Black, Dimitris Tzionas
1️⃣Read the Full Paper here: https://lnkd.in/gnG3kSgQ
2️⃣Project Page: https://lnkd.in/gQb2SwuN
3️⃣Code: https://lnkd.in/gPNdYN_G
4️⃣Video: https://lnkd.in/gcNCf9gJ
🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊
Music by Umasha Pros from Pixabay
Find this Valuable 💎 ?
♻️REPOST and teach your network something new
Follow me 👣, Naveen Manwani, for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.
Engineer at AIMonk Labs || Crafting Stable AI Products and Enhancing Software Aesthetics || Enthusiastic about Robotics and Cutting-Edge AI Developments || Sharing the Hottest Trends in Artificial Intelligence.
🚨CVPR 2024 Best Paper Runners-Up Alert 🚨
➡️Paper Title: pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
🌟Few pointers from the paper
🎯In this paper authors have introduced “pixelSplat”, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images.
🎯Their model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time.
🎯To overcome local minima inherent to sparse and locally supported representations, authors predict a dense probability distribution over 3D and sample Gaussian means from that probability distribution.
🎯They make this sampling operation differentiable via a reparameterization trick, allowing them to back-propagate gradients through the Gaussian splatting representation.
🎯They benchmark their method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where they outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of magnitude while reconstructing an interpretable and editable 3D radiance field.
🏢Organization: Massachusetts Institute of Technology, Simon Fraser University, University of Toronto
🧙Paper Authors: David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, Vincent Sitzmann
1️⃣Read the Full Paper here: https://lnkd.in/gXRimFbt
2️⃣Project Page: https://lnkd.in/gnHnT3rN
3️⃣Code: https://lnkd.in/g2M-vM34
4️⃣Pre-trained Models: https://lnkd.in/g9xAXEXn
🥳Heartfelt congratulations to all the talented authors! 🥳
🎥 Be sure to watch the attached Demo Video
Find this Valuable 💎 ?
♻️REPOST and teach your network something new
Follow me 👣, Naveen Manwani, for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.
#CVPR2024
Engineer at AIMonk Labs || Crafting Stable AI Products and Enhancing Software Aesthetics || Enthusiastic about Robotics and Cutting-Edge AI Developments || Sharing the Hottest Trends in Artificial Intelligence.
🚨CVPR 2024 Best Paper Runners-Up Alert 🚨
➡️Paper Title: pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
🌟Few pointers from the paper
🎯In this paper authors have introduced “pixelSplat”, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images.
🎯Their model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time.
🎯To overcome local minima inherent to sparse and locally supported representations, authors predict a dense probability distribution over 3D and sample Gaussian means from that probability distribution.
🎯They make this sampling operation differentiable via a reparameterization trick, allowing them to back-propagate gradients through the Gaussian splatting representation.
🎯They benchmark their method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where they outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of magnitude while reconstructing an interpretable and editable 3D radiance field.
🏢Organization: Massachusetts Institute of Technology, Simon Fraser University, University of Toronto
🧙Paper Authors: David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, Vincent Sitzmann
1️⃣Read the Full Paper here: https://lnkd.in/gXRimFbt
2️⃣Project Page: https://lnkd.in/gnHnT3rN
3️⃣Code: https://lnkd.in/g2M-vM34
4️⃣Pre-trained Models: https://lnkd.in/g9xAXEXn
🥳Heartfelt congratulations to all the talented authors! 🥳
🎥 Be sure to watch the attached Demo Video
Find this Valuable 💎 ?
♻️REPOST and teach your network something new
Follow me 👣, Naveen Manwani, for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.
#CVPR2024
Engineer at AIMonk Labs || Crafting Stable AI Products and Enhancing Software Aesthetics || Enthusiastic about Robotics and Cutting-Edge AI Developments || Sharing the Hottest Trends in Artificial Intelligence.
🚨 CVPR 2024 Alert 🚨
➡️Paper Title: SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering
🌟Few pointers from the paper
🎯In this paper, authors have proposed a new 4D motion modeling paradigm, “SurMo”, that jointly models the temporal dynamics and human appearances in a unified framework with three key designs:
🚀1) Surface-based motion encoding that models 4D human motions with an efficient compact surface-based triplane. It encodes both spatial and temporal motion relations on the dense surface manifold of a statistical body template, which inherits body topology priors for generalizable novel view synthesis with sparse training observations.
🚀2) Physical motion decoding that is designed to encourage physical motion learning by decoding the motion triplane features at timestep t to predict both spatial derivatives and temporal derivatives at the next timestep t 1 in the training stage.
🚀3) 4D appearance decoding that renders the motion triplanes into images by an efficient volumetric surface-conditioned renderer that focuses on the rendering of body surfaces with motion learning conditioning.
🎯Authors have achieved state-of-the-art results and showed that their new paradigm is capable of learning high-fidelity appearances from fast motion sequences (e.g., AIST dance videos) or synthesizing motion-dependent shadows in challenging scenarios.
🏢Organization: S-Lab, Nanyang Technological University Singapore
🧙Paper Authors: Tao Hu, Fangzhou Hong, Ziwei Liu
1️⃣Read the Full Paper here: https://lnkd.in/gF4RFg7H
2️⃣Project Page: https://lnkd.in/gDbSrEGs
3️⃣Code: https://lnkd.in/g7Qh8GSt
4️⃣Video: https://lnkd.in/g-xzhkeY
🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊
Music by Sergio Prosvirini from Pixabay
Find this Valuable 💎 ?
♻️REPOST and teach your network something new
Follow me, Naveen Manwani, for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.
#cvpr2024
Engineer at AIMonk Labs || Crafting Stable AI Products and Enhancing Software Aesthetics || Enthusiastic about Robotics and Cutting-Edge AI Developments || Sharing the Hottest Trends in Artificial Intelligence.
🚨CVPR 2024 Alert 🚨
➡️Paper Title: Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
🌟Few pointers from the paper
🎯In this paper authors have presented a novel approach that combines the controllability of dynamic 3D meshes with the expressivity and editability of emerging diffusion models.
🎯In their approach they take an animated, low-fidelity rendered mesh as input and injects the ground truth correspondence information obtained from the dynamic mesh into various stages of a pre-trained text-to-image generation model to output high-quality and temporally consistent frames.
🏢Organization: Stanford University, Adobe Research
🧙Paper Authors: Shengqu Cai, Duygu Ceylan, Matheus Gadelha, Chun-Hao Huang , Tuanfeng Yang Wang, Gordon Wetzstein
1️⃣Read the Full Paper here: https://lnkd.in/gdnGca9G
2️⃣Project Page: https://lnkd.in/gYSJAkZM
🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊
Find this Valuable 💎 ?
♻️REPOST and teach your network something new
Follow me, Naveen Manwani, for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.
#cvpr2024#diffusionmodels
📃Scientific paper: Self-supervised Pre-training with Masked Shape Prediction for 3D Scene
Understanding
Abstract:
Masked signal modeling has greatly advanced self-supervised pre-training for
language and 2D images.
However, it is still not fully explored in 3D scene
understanding.
Thus, this paper introduces Masked Shape Prediction (MSP), a new
framework to conduct masked signal modeling in 3D scenes.
MSP uses the
essential 3D semantic cue, i.e., geometric shape, as the prediction target for
masked points.
The context-enhanced shape target consisting of explicit shape
context and implicit deep shape feature is proposed to facilitate exploiting
contextual cues in shape prediction.
Meanwhile, the pre-training architecture
in MSP is carefully designed to alleviate the masked shape leakage from point
coordinates.
Experiments on multiple 3D understanding tasks on both indoor and
outdoor datasets demonstrate the effectiveness of MSP in learning good feature
representations to consistently boost downstream performance.
;Comment: CVPR 2023
Discover the rest of the scientific article on es/iode ➡️https://etcse.fr/OJ6K0
Geometric Deep Learning has become a transformative force in Computer-Aided Design (CAD), and has the potential to revolutionize the way designers and engineers approach and enhance the design process.
In a research project supported by the Thomas B. Thriges Foundation and the Industriens Foundation, we have started working towards this direction. The first challenge to address was to homogenize the terminology, and provide a go-to (tutorial-like) description of this emerging field to everyone who wants to learn, apply, and research such methods for CAD.
Survey: https://lnkd.in/dQGutJT6#deeplearning#cad#caddesign#artificialintelligence
Dive into the cutting-edge realm of material science! 🌐 Explore precision with advanced techniques like Contour scanning, 2D/3D mapping, and the revolutionary Rotating Knoop Indenter. 🔄In our 45-minute webinar, uncover the perks of Contour scanning, the nuances of 2D/3D Area pattern mapping, and the ingenious applications of a Rotating Knoop Indenter in material analysis. 🧪 Elevate your understanding of material characterization. Register at https://ow.ly/79bk50QnOnt
Can't Make the Webinar? Register and receive a downloadable copy of the webinar
#MaterialScience#ContourScanning#MappingInnovation#verderscientificUSA#laboratoryequipment#science#webinar#freewebinar#learning
MD at Resonance3D 🚀Immerse Your Audience with Stunning 3D Model AR/VR 360 View Services 🌐✨" 👟👞👜🕶️🥽🧭🛹⚡ 🚀Grow your E-commerce Bussiness.
1moGreat opportunity!