0% found this document useful (0 votes)
22 views

CV_V unit notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

CV_V unit notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Shape from X

• The study of how shape can be inferred from cues is sometimes called shape from X,
because the individual instances are called shape from shading, shape from texture,
and shape from focus.
• you can clearly see the shape of the object from just the shading variation. The
surface normal changes across the object, the apparent brightness changes as a
function of the angle between the local surface orientation and the incident
illumination
• The problem of recovering the shape of a surface from this intensity variation is
known as shape from shading.
• Most shape from shading algorithms assume that the surface under consideration is
of a uniform albedo and reflectance, and that the light source directions are either
known or can be calibrated by the use of a reference object. Under the assumptions
of distant light sources and observer, the variation in intensity (irradiance equation)
becomes purely a function of the local surface orientation,

Photometric stereo
• Another way to make shape from shading more reliable is to use multiple light
sources that can be selectively turned on and off. This technique is called
photometric stereo, as the light sources play a role analogous to the cameras located
at different locations in traditional stereo.
• For each light source, we have a different reflectance map, R1(p, q), R2(p, q), etc.
Given the corresponding intensities I1, I2, etc. at a pixel, we can in principle recover
both an unknown albedo ρ and a surface orientation estimate (p, q).
Shape from texture
• The variation in foreshortening observed in regular textures can also provide useful
information about local surface orientation.
• Shape from texture algorithms require a number of processing steps, including the
extraction of repeated patterns or the measurement of local frequencies to compute
local affine deformations, and a subsequent stage to infer local surface orientation.
Shape from focus
• A strong cue for object depth is the amount of blur, which increases as the object’s
surface moves away from the camera’s focusing distance.
• A number of techniques have been developed to estimate depth from the amount of
defocus.
• The amount of blur increase in both directions as you move away from the focus
plane. Therefore, it is necessary to use two or more images captured with different
focus distance settings
• A strong cue for object depth is the amount of blur, which increases as the object’s
surface moves away from the camera’s focusing distance.
• A number of techniques have been developed to estimate depth from the amount of
defocus.
• The amount of blur increase in both directions as you move away from the focus
plane. Therefore, it is necessary to use two or more images captured with different
focus distance settings
• The magnification of the object can vary as the focus distance is changed or the
object is moved. This can be modeled either explicitly (making correspondence more
difficult) or using telecentric optics.
• The amount of defocus must be reliably estimated. A simple approach is to average
the squared gradient in a region, but this suffers from several problems, including the
image magnification problem mentioned above.

Surface representations
• surface representations are techniques used to model and analyze the shape and
appearance of objects in three-dimensional space. These representations help in
understanding and interpreting visual data.
Surface interpolation
• One of the most common operations on surfaces is their reconstruction from a set of
sparse data constraints, i.e., scattered data interpolation.
• When formulating such problems, surfaces may be parameterized as height fields
f(x), as 3D parametric surfaces f(x), or as non-parametric models such as collections
of triangles
• surfaces can be parameterized as height fields, often represented as z=f(x,y), where z
is the height (or depth) value at each point in the (x,y) plane. In this representation:
• The surface is defined by a function that gives the height z for each point (x,y). This
allows for the representation of any surfaces that can be described in terms of their
height relative to a base plane.
• When rendering, these surfaces can be visualized using techniques such as shading,
which considers the surface normal derived from the gradient of the height function.
• These surfaces can be rendered by evaluating the parametric equations over a grid of
u and v values, generating a mesh of points that represent the surface.
• Point Clouds: Collections of data points in 3D space, often obtained from depth
sensors or 3D scanning. Each point represents a location on the object's surface.
• Mesh Representations: Composed of vertices, edges, and faces, meshes provide a
more structured way to represent surfaces. They can capture detailed geometric
features. A triangle-based mesh model is a common representation used in 3D
graphics and modeling. It consists of vertices, edges, and faces, where each face is
typically a triangle.
• NURBS (Non-Uniform Rational B-Splines): A mathematical representation that can
define complex curves and surfaces smoothly. NURBS are widely used in CAD and
modeling applications.
• Implicit Surfaces: Defined by a scalar function, where points on the surface satisfy a
specific equation (e.g., f(x,y,z)=0). These can create smooth surfaces without needing
explicit connectivity.
• Voxel Grids: 3D grids where each voxel (volume element) contains information
about the space it occupies. Voxel representations are useful in volumetric rendering
and medical imaging.
• Depth Maps: 2D images where each pixel value represents the distance from the
camera to the nearest surface point.
• Level Sets: A numerical technique used to track shapes and interfaces. Level sets can
represent dynamic surfaces that change over time.

Point based representations


• Point-based representations in computer vision are methods that utilize discrete
points to capture and represent the structure of objects or scenes. These
representations are particularly useful in scenarios where traditional grid-based
methods (like pixel-based images) might be inefficient or inadequate.
1. Point Clouds
• Definition: A point cloud is a collection of data points in a 3D space, typically
obtained from 3D scanners, LiDAR, or stereo vision.
• Applications: Used in 3D modeling, object recognition, and scene reconstruction.
Point-based Features
• Keypoint Detection: Identifying distinct points that can be reliably matched across
different views (e.g., SIFT, SURF).
• Descriptors: Each keypoint can be described with a feature vector that captures local
appearance or geometry.
• Deep Learning with Point Clouds
• Neural Networks: Specialized architectures like PointNet and PointNet++ are
designed to process point clouds directly, learning features for tasks like classification
and segmentation.
• Transformations: Point-based methods often incorporate permutation invariance to
handle the unordered nature of point clouds.
Challenges
• Data Sparsity: Point clouds can be sparse, making it difficult to capture fine details.
• Noise: Real-world data is often noisy, requiring robust methods for processing and
analysis.
• Complexity in Processing: Unlike images, operations on point clouds (like
convolution) require specialized techniques.

Volumetric representations
• Volumetric representation in 3D reconstruction involves creating a three-dimensional
model by capturing the volume of an object rather than just its surface. This
approach is particularly useful in various fields, including computer vision, medical
imaging, and virtual reality.
• Voxel Grids: A volumetric representation typically uses voxels (3D pixels) to divide
space into a grid. Each voxel contains information about the material or density at
that location, allowing for detailed internal structures.
• Methods of Reconstruction:
• CT and MRI Scanning: Common in medical imaging, these techniques generate
volumetric data by slicing through an object and capturing internal structures.
• 3D Scanning: Techniques like laser scanning or structured light capture surface
points, which can then be converted into a volumetric model.
• Rendering Techniques:
• Ray Marching: Used for rendering volumetric data, where rays are cast through the
voxel grid, sampling density and color at each step.
• Direct Volume Rendering: Displays 3D data without converting it to a surface, useful
for visualizing complex structures like those in medical scans.
Image-based Rendering: View Interpolation
• Image-based rendering (IBR) is a technique used in computer graphics to generate
new views of a scene from a set of input images. One of the key methods within IBR
is view interpolation, which creates intermediate views between existing images.
Here's an overview of how it works and the techniques involved:
• View Interpolation in Image-Based Rendering
• Definition: View interpolation refers to the process of generating new images from a
given set of images taken from different viewpoints. It is particularly useful in
scenarios where capturing every possible viewpoint is impractical
Techniques for View Interpolation
• Linear Interpolation:
• For two images taken from slightly different viewpoints, linear interpolation
can be used. By blending the pixel values based on a parameter ttt (ranging
from 0 to 1), you can create a new view that appears between the two
original images.
• This method is simple but may not handle occlusions or depth variations well.
• Depth-Image Based Rendering (DIBR):
• This approach uses depth information along with the color images. By
reconstructing the scene geometry based on depth maps, you can project the
images from new viewpoints.
• DIBR can handle occlusions more effectively, allowing for more realistic
interpolated views.
• Multi-View Stereo (MVS):
• MVS techniques estimate depth and 3D structure from multiple images. Once
the depth is known, it can be used to render new views by projecting the 3D
points onto the desired image plane.
• This method often involves complex algorithms to ensure accuracy in depth
estimation.
• Image Warping:
• This technique involves transforming the images to align them based on their
perspective. By using homographies or projective transformations, images
can be warped to create a seamless transition between views.
• Warping can be combined with depth information for improved results.
• Texture Mapping and View Synthesis:
• Using texture mapping techniques, one can synthesize new views by sampling
texture from existing images based on the projected geometry. This method
can leverage both color and depth information.
• Algorithms like the Lumigraph allow for more complex light interactions,
creating convincing views.
• Machine Learning Approaches:
• Recent advances in deep learning have led to neural networks that can learn
to generate new views from a set of images. Techniques like generative
adversarial networks (GANs) and convolutional neural networks (CNNs) can
produce high-quality interpolations.
• These models learn the underlying patterns of the data, allowing for more
sophisticated rendering that can handle complex scenes and lighting.
Applications
 Virtual Reality and Augmented Reality: Providing immersive experiences by
generating real-time views from limited input images.
 Film and Animation: Creating seamless transitions and panoramic views without the
need for extensive re-shooting.
 Video Games: Enhancing graphics and enabling smooth camera movements in 3D
environments.
 Architectural Visualization: Allowing clients to see potential designs from various
angles without the need for full renders.

Layered Depth Images


Layered Depth Images (LDI) are a representation used in computer graphics and computer
vision to encode depth information in a way that efficiently handles the complexity of real-
world scenes with multiple layers of occlusion. The LDI allows for accurate, photorealistic
rendering of scenes with objects in front and behind one another, enabling more effective
image composition, depth handling, and occlusion management.
Here’s a breakdown of the key concepts:
1. Concept of Layered Depth Images (LDI):
 Depth Information: In a regular image, pixels are typically represented by color
values (RGB). In an LDI, each pixel stores not just a color value, but multiple depth
values corresponding to different layers of the scene.
 Multiple Layers: Each pixel in an LDI contains information about the nearest object
(frontmost layer) as well as other objects that may be occluded behind it. This
creates multiple "layers" at various depths for each pixel, rather than a single depth
value.
2. Structure:
Pixel Data: Each pixel contains a list or a stack of tuples, where each tuple stores:
The depth value (how far the object is from the camera).
The color or texture associated with that object.
Layered Representation: The layers are ordered in increasing depth, meaning the first
layer is the frontmost object, and subsequent layers are further back in the scene.
Advantages of LDIs:
 Handling Occlusion: Traditional 2D depth images or depth maps often fail in scenes
with complex occlusions (e.g., objects partially obscuring others). LDI allows multiple
depth values per pixel, which naturally handles such scenarios.
 Improved Rendering: LDIs are particularly useful in image-based rendering and view
synthesis, as they provide all the necessary depth and color information to create
new viewpoints of a scene.
 Efficiency in Rendering: For scenes with significant transparency, reflections, or
complex geometries, LDI can represent these phenomena more efficiently than
standard single-depth maps.
Applications:
 Image-based Rendering: LDIs are commonly used in applications like 3D model
reconstruction and view interpolation, where you need to generate new views of a
scene from a limited number of images.
 Virtual Reality (VR) and Augmented Reality (AR): LDIs can be employed to enhance
depth perception and reduce artifacts in 3D rendering.
 Scene Reconstruction: LDIs are useful in 3D reconstruction from images, where
multiple depth layers help reconstruct a more accurate scene.

Light Fields and Lumigraphs


Light Fields and Lumigraphs are advanced concepts in computer graphics and computer
vision that deal with the representation of 3D scenes, focusing on capturing and synthesizing
realistic light and view information. Both are closely related to concepts of image-based
rendering (IBR), allowing for more immersive visual experiences, such as photorealistic
rendering, virtual reality (VR), and augmented reality (AR). Here’s an explanation of each:
Light Fields
1. Concept:
A light field is a multi-dimensional function that describes the amount of light traveling in
every direction through every point in space. Instead of representing a scene as a single
image or a collection of images from different viewpoints, a light field encapsulates all the
information needed to reconstruct a scene from any viewpoint and perspective.
 The light field stores light as a 4D function L(x,y,θ,ϕ)L(x, y, \theta, \phi)L(x,y,θ,ϕ),
where:
o (x,y)(x, y)(x,y) represent the position in 3D space (typically in a 2D plane, e.g.,
a sensor or the image plane).
o (θ,ϕ)(\theta, \phi)(θ,ϕ) are the angles describing the direction of the
incoming light at each point.
This means a light field contains both spatial and directional information, allowing you to
simulate how light behaves in a scene.
2. Key Components:
 Spatial Information: Where light is coming from in 3D space.
 Angular Information: The direction in which light is traveling, across different viewing
angles.
3. Applications:
 View Synthesis: Light fields allow for generating new views of a scene from arbitrary
angles. If you have a set of light field data captured from multiple angles, you can
synthesize novel views, making it ideal for applications like virtual tours or immersive
video.
 Refocusing: Light field cameras (like the Lytro camera) can refocus after a photo is
taken because the depth information is encoded within the light field. This is due to
the fact that you capture the full directional data of the incoming light rays.
 Depth Perception: By capturing the light field, it's possible to infer depth information
from the variations in light ray directions, which can be used in 3D reconstruction
and 3D scene rendering.
4. Capture:
 Light Field Cameras: These specialized cameras, like the Lytro camera, capture light
fields directly by using micro-lenses placed in front of a traditional image sensor. Each
micro-lens records light coming from different directions, capturing more than just
the spatial intensity of the light, but its angular direction as well.
 Arrays of Cameras: Another common method to capture light fields is to use an array
of cameras positioned at different angles, capturing the scene from a grid of
viewpoints.
Lumigraphs
1. Concept:
Lumigraphs are a specific type of light field representation that focuses on storing the
appearance of a scene as a function of both position and viewing angle. While light fields
store radiance information in terms of spatial and angular data, lumigraphs are a variation
used primarily for the rendering of complex, highly realistic images based on captured
radiance data.
 A lumigraph is typically represented as a 5D function L(x,y,θ,ϕ,t)L(x, y, \theta, \phi,
t)L(x,y,θ,ϕ,t), where:
o (x,y)(x, y)(x,y) represents the spatial coordinates of the scene.
o (θ,ϕ)(\theta, \phi)(θ,ϕ) represent the direction of incoming light at each point
(viewing angle).
o ttt is often an additional parameter representing the time, or more
commonly, an additional factor like texture or reflectance (in some cases).
2. Difference from Light Fields:
 While light fields focus on capturing light traveling through 3D space and its
directional information, lumigraphs are more concerned with storing the actual
radiance (or intensity) as it is perceived from different angles at various points in the
scene.
 The concept of a lumigraph can be considered a more complete representation for
high-fidelity visual rendering, as it often stores not only the geometric details but
also how light interacts with surfaces, including specular reflections, diffuse lighting,
and other material properties.
3. Applications:
 Image-based Rendering (IBR): Lumigraphs enable high-quality image-based
rendering. This is important for synthesizing photorealistic images, where the stored
data can be used to generate images from novel viewpoints or with different lighting
conditions.
 Virtual Reality (VR) and Augmented Reality (AR): Lumigraphs provide an immersive
experience by allowing for realistic interactions with the scene, as they encode more
comprehensive lighting and depth information than traditional images or simple light
fields.
 Synthetic View Generation: As with light fields, lumigraphs allow for generating new
views from an existing set of images or captured data, but they offer better handling
of complex lighting effects, such as specular highlights, shadows, and reflections.
Environment Mattes
Environment Mattes (or Environmental Mattes) are a technique used in visual effects (VFX)
and computer graphics to help integrate and composite elements into a digital scene,
typically by defining areas where specific effects or elements can be applied or manipulated.
The concept of an environment matte usually refers to a mask or a stencil that isolates
regions of the scene for various processing tasks, such as background replacement, color
correction, or shadow adjustments.
Environment mattes are particularly useful in scenarios where you want to isolate or
manipulate certain parts of the environment without affecting the rest of the scene. This
process is frequently used in compositing for film and video production, CGI integration, and
3D rendering workflows.
Here’s a deeper breakdown of environment mattes and their applications:
Definition and Purpose:
An environment matte is essentially a mask that defines areas of a scene in relation to the
background, foreground, or other environmental elements. It is typically a grayscale image
(or sometimes a more complex mask in vector or alpha channel format) that indicates which
parts of a scene are part of the "environment" and which parts are foreground elements
(like a character, object, or specific effect).
Purpose:
 Background Isolation: Environment mattes help isolate the background
environment, making it possible to replace or alter the background without affecting
other parts of the scene.
 Effect Application: They allow certain environmental effects, like lighting, shadows,
or fog, to be applied selectively to specific areas.
 Compositing: In compositing workflows, environment mattes are used to separate
elements in the scene for easier manipulation, blending, or extraction of certain
visual features (such as sky, terrain, buildings, etc.).
 Depth and Layering: They assist in creating the illusion of depth by defining which
parts of the environment should appear in front or behind other elements.

2. Types of Environment Mattes:


Environment mattes can be used in a variety of ways depending on the scene's needs:
a) Alpha Mattes:
 The most common form of environment matte, where a grayscale image is used to
define transparent and opaque areas. In the context of compositing, the alpha
channel provides transparency (white = fully visible, black = fully transparent, and
shades of gray representing semi-transparency).
b) Z-Mattes:
 These are derived from depth information (or Z-buffer data) and represent the
distance of objects from the camera. Z-mattes are typically used for 3D rendering or
when compositing 3D objects into a scene. They help define the depth boundaries of
objects, which can be important for things like depth of field or depth-based effects
(e.g., fog, lighting).
c) RGB Mattes:
 Sometimes, an environment matte can be created using RGB channels to isolate
specific parts of a scene based on color information. For instance, you might use a
specific color in the environment (like green or blue) to create a chromakey matte for
background replacement.
d) Object Mattes:
 These define specific areas of interest (like objects or characters) within the
environment, useful when you need to isolate a certain object from its surroundings
for further manipulation. Object mattes are sometimes used in combination with
environment mattes to separate the foreground (e.g., a character) from the
background.

3. Applications of Environment Mattes:


a) Green Screen and Background Replacement:
 One of the most common uses of environment mattes is in green screen or blue
screen compositing. The matte helps isolate the subject (typically filmed against a
green or blue background) and replace the background with a new environment
(e.g., a digital set, a natural landscape, or a 3D environment).
 For instance, if you have a scene filmed in front of a green screen, an environment
matte can help separate the green background (often with chroma-keying) from the
subject, allowing you to insert the desired virtual background.
b) Lighting and Shadow Control:
 In 3D rendering, environment mattes are used to control how light interacts with a
scene. For example, you might have an environment matte that defines a skybox or
background in a 3D scene. By using this matte, you can adjust the lighting specifically
on the environment without affecting the foreground elements (like characters or
objects).
 Shadows cast on the environment can also be manipulated separately using
environment mattes.
c) Reflections and Refraction Effects:
 When compositing elements with reflections or refractions (like in water or glass), an
environment matte can define which parts of the environment contribute to the
reflection or refraction. This allows for the integration of realistic environmental
effects with minimal distortion.
d) Atmospheric Effects:
 Fog, rain, smoke, and other atmospheric effects can be applied selectively to parts of
the scene based on the environment matte. For example, you could apply a fog effect
in the background but leave the foreground clear.
e) Layered Rendering and Post-Processing:
 Environment mattes are also used in post-production workflows to isolate specific
layers of a scene, enabling specific effects like color grading or motion blur to be
applied selectively to different parts of the scene. For example, you could apply a
vignette effect only to the environment while leaving the foreground elements
untouched.

4. Creating Environment Mattes:


There are several ways to generate environment mattes:
a) Manual Creation:
 Artists can hand-paint or draw mattes, often using tools like Photoshop or Nuke. This
is time-consuming but provides a high degree of control, especially in complex
scenes.
b) Depth Information:
 Z-buffer data (depth maps) generated during 3D rendering can be used to create
environment mattes, as the distance of objects from the camera can help define
which elements are part of the environment.
c) Chroma Keying:
 If you're working with footage shot against a solid color background (such as green or
blue), the background color can be automatically keyed out to create a matte. This
process isolates the subject and allows for easy background replacement.
d) Software-Based Generation:
 Compositing Software: Tools like Nuke, After Effects, and Fusion have powerful
features to automatically generate environment mattes, often by using built-in tools
like roto-masking or edge detection. In these tools, you can refine the matte with
techniques like feathering and spilling suppression.
video based rendering
Video-based rendering (VBR) is a technique in computer graphics and visual effects that
utilizes pre-recorded video or image data to generate new, realistic visual content, often
involving novel views, perspectives, or dynamic scenes. Unlike traditional rendering
methods, which create images or videos from 3D models and scene descriptions, video-
based rendering directly leverages real-world video data to synthesize novel views and
interactions. This technique is commonly used in image-based rendering (IBR), view
synthesis, virtual reality (VR), and augmented reality (AR) applications.
Here's a detailed overview of video-based rendering, its techniques, applications, and key
concepts:
Key Concepts in Video-Based Rendering (VBR)
1. Image-Based Rendering (IBR):
o IBR refers to a broader class of rendering techniques that aim to generate
new images or videos by using a set of input images or videos as the base,
rather than using traditional 3D models.
o The goal of IBR is to synthesize realistic visuals without having to model the
entire 3D scene explicitly, saving time and resources in cases where the
environment is complex or captured from the real world.
2. Novel View Synthesis:
o Novel view synthesis is one of the key objectives of video-based rendering. It
involves creating new viewpoints of a scene or object from the original set of
captured images or video. This is particularly useful when you don't have
access to 3D models of the scene but still want to generate views from
arbitrary camera positions.
o For example, from a set of images or video clips captured from multiple
angles, a VBR system can generate smooth transitions between viewpoints or
simulate a camera moving through the scene.
3. Depth Information:
o Depth is critical in video-based rendering as it allows the system to
understand the 3D structure of the scene. Depth maps or stereo vision can
provide this information, which helps in reconstructing or warping images to
create new viewpoints.
o Depth maps can be generated through stereo pairs (from two cameras),
multi-view video, or using depth sensors like LiDAR or structured light
sensors.
4. Motion Compensation:
o Video-based rendering often involves motion compensation, where the
system tracks the motion of objects in a scene or the camera. This helps in
handling temporal coherence and ensures that the synthesized views
maintain consistent movement and realism.
5. View-Dependent Effects:
o Video-based rendering is highly effective in capturing and reproducing view-
dependent effects, like reflections, refractions, and shadows, which may vary
based on the observer's viewpoint. These effects are often difficult to
reproduce accurately with traditional rendering methods but can be more
easily handled when working with real video data.

Techniques in Video-Based Rendering


1. Multi-View Video (MVV) Rendering:
o Multi-view video involves capturing a scene from multiple cameras arranged
in different positions. By using these multiple views, VBR can interpolate and
generate new views between the existing ones.
o Depth Image-Based Rendering (DIBR) is a common method here. It uses
depth images or depth maps along with the corresponding color images to
generate novel views.
2. Light Field Rendering:
o Light fields are a powerful tool for video-based rendering. A light field
captures light in multiple directions and at various positions in space. With a
light field, a system can simulate the movement of the camera in 3D space
and generate novel views from captured data.
o Plenoptic cameras or micro-lens arrays are used to capture light fields, and
the data can be processed to produce video-based rendering for VR/AR,
where users can look around a scene from different angles.
3. Free-Viewpoint Video (FVV):
o Free-viewpoint video enables users to change their viewpoint freely within a
captured scene, typically using a combination of multiple video streams and
depth information. This technique is often used in sports broadcasting or
immersive entertainment where viewers can choose their preferred camera
angle.
o View interpolation and free-viewpoint synthesis allow the creation of
continuous views between multiple captured video sources, making it seem
like the scene is being captured from a dynamic, flexible camera.
4. Image Warping and Interpolation:
o Image warping involves distorting images to fit new viewpoints, often using
the scene's depth information. When synthesizing new views, the system will
warp and blend images to fill in the gaps or adjust the perspective to match
the desired viewpoint.
o For instance, if there’s a scene with a character standing in front of a
background, the system might use the depth of the character to warp the
image and generate a view as if the user were moving around the scene.
5. Video Texturing:
o Video texturing involves applying a video as a texture to a 3D model. For
example, a dynamic texture (like a person walking) could be mapped onto a
3D object, and as the viewpoint changes, the video texture will adjust
accordingly to match the new perspective.
o This can be useful for simulating real-world movements and events in 3D
environments, particularly for VR, AR, and gaming.
6. 3D Reconstruction from Video:
o 3D reconstruction is a technique that creates a 3D model from a series of 2D
video frames. Once a 3D model is reconstructed, VBR techniques can be used
to generate novel views of the scene from any angle.
o Common methods include using structure-from-motion (SfM) and multi-view
stereo to extract depth and geometric information from the video.

You might also like