HD logix, Inc. HD logix, Inc.

Transcription

HD logix, Inc. HD logix, Inc.
HDlogix, Inc.  26 Mayfield Ave.  Edison, NJ 08837  (732) 623-2067  www.hdlogix.com
ImageIQ3D – 3D Video from 2D Video in Real-Time
®
Utilizing many of the same patented and patent-pending technologies as HDlogix‘s ImageIQ -3D
ImageIQ literally creates a new dimension for video. Drawing on HDlogix‘s expertise with
superresolution that converts SD video to full-bandwidth HD video and ImageIQ‘s GPU-based, realtime optical flow and image structure analysis capabilities, it is now possible to convert 2D video into
3D video in real-time with no intervention. Unlike previous ―pseudo-3D‖ gimmicks, ImageIQ3D
reconstructs the geometry of the video scene from regular video in order to create true 3D stereo
video for any 3D video display – even ―autostereoscopic‖ 3D displays that do not require any glasses.
What is ImageIQ3D?
ImageIQ3D constructs a geometric representation and model of objects in a video scene, in real-time,
with no user or operator intervention. This information is used to convert regular video into left/right
stereo views and/or color plus depthmap views suitable for display on any existing 3D display.
Additionally, ImageIQ3D can generate full-color 3D stereo content from anaglyph (e.g. red/cyan), also
in real-time.









Any 2D, live broadcast can be generated as 3D, on-the-fly, without having to shoot with
stereoscopic cameras and equipment
Any and all user-generated content can be converted to 3D
Anaglyph/colored-glasses programs can be converted to full-color 3D stereo, without the original
full color version
All of the 10 different flavors of existing 3D content can be converted to and from each other – a
first in the industry – in real-time.
Any 2D and 3D content can be converted to play back on any 3D stereoscopic display – another
industry first.
Uncalibrated, unaligned camera pairs can be used for stereo video and image capture
Inexpensively transform telepresence solutions from HD to 3D-HD
Transform a webcam video chat into 3D telepresence without a calibrated stereo camera
Problems that cause disorientation, such as rapid depth changes are automatically modeled and
eliminated
3D Has Come a Long Way From the 1980s – or has it?
Almost everyone has had some experience with color anaglyph 3D –
examples include the gimmicky horror films of the ‗80s, another example
is the recent blue/yellow glasses many people experienced with the 2009
SuperBowl.
HDlogix, Inc.  26 Mayfield Ave.  Edison, NJ 08837  (732) 623-2067  www.hdlogix.com
As we all know, this method of 3D is prone to inducing headaches, and the experience is less than
ideal because both eyes do not receive a full-color image. There are, of course, more elegant, fullcolor 3D solutions – one is to use shutter-glasses, with a “3D-Ready” display. Another is to use
polarized glasses, with a polarized projection or active display. Other recent developments
include autostereoscopic video displays that use clever optics to allow 3D viewing without any
glasses at all. The hardware required for these has either been cheap and clunky (and headache
inducing), or if executed well, relegated to expensive niche markets like signage, CAD/CAM
visualization for mechanical modeling and medical imaging—or for venue-based cinema like iMax
®
3D . (Despite best intentions, some of these have also induced headaches despite being expensive
and well-executed). Recently, full-color stereo-3D capable displays have been on the mass market –
in fact, if you have bought an HDTV recently, it is entirely possible that it is capable of displaying 3D
video with an inexpensive add-on and shutter glasses – without headaches and compromises with
color.
Welcome to the 3D Zoo
Millions of customers worldwide already have “3D-Ready” displays, but they don‘t even know it.
There are thousands of hours of high-quality 3D content in movie libraries, plus almost every one of
the major movie studios have committed billions of dollars to 3D movie production for 2009 and ‗10 –
and not just animated productions. Why doesn‘t everyone know about 3D? The answer is that there
is no ready content for these displays. If there‘s so much 3D display technology in end-users‘ homes
and so much 3D content, how is this possible?
The reason is that there are many animals in the 3D Zoo, and they don‘t communicate: there are no
less than 4 different technologies for shooting and producing 3D films and videos, yet more ways to
store and transmit them and literally dozens of different technologies and products to display them as
3D video – and none of these will ―talk‖ to each other. If you have a red-cyan anaglyph source, you
can‘t display it in full color on a shutter-glasses stereoscopic display. Likewise, if you have a full-color
stereo source, you can‘t display it on a multichannel autostereoscopic display. If the content has
been converted for a multichannel autostereoscopic display (at $25,000 per minute of footage), it no
longer can be displayed with red/cyan glasses OR on a shutter–glasses display without a new
transmission medium – and this barely scratches the surface of the problem.
Figure 1. Lots of existing 3D video technologies are talking, but not a lot of them are listening to each other.
HDlogix, Inc.  26 Mayfield Ave.  Edison, NJ 08837  (732) 623-2067  www.hdlogix.com
Until ImageIQ3D, there was no way to make all of these formats, technologies, and display
technologies compatible with each other without expending tens of thousands of dollars per minute of
footage – offline, and with a slow turnaround . Most importantly, there has been no way to make 3D
actually work for the end-user without a lot of excuses, apologies, and headaches – until now.
Figure 2. ImageIQ3D is the Universal Translator for 3D and 2D video. With ImageIQ3D, everyone‘s talking 3D
video to each other – even if the original video is only 2D, and regardless of origination methods, archive
formats, and distribution standards/transmission methods.
Further, for 2D content that was created with only one camera, the only way to convert to 3D was
manually intensive, requiring expensive intervention by a veritable army of 3D modelers, stereoscopic
specialists, 3D rendering artists, and video engineers to identify precise scene cuts, and painstakingly
edit geometry and depthmaps, and correct their errors and problems – until now.
Today’s Video Architecture: The GPU/3D Accelerator
Some of the algorithms that ImageIQ3D uses have existed for years – but have only been possible to
perform in real-time recently with the advent of programmable graphics hardware: GPUs. GPUs
allow for highly-parallel, memory intensive processes – much more so than equivalent CPUs that are
ten times more expensive.
Additionally, ImageIQ3D‘s approach is very similar to the 2D
®
superresolution technology in ImageIQ , which is particularly well-suited for GPU implementation. As
a result, ImageIQ3D can run in real-time on very modest GPU hardware. Top-of-the-line video card
hardware is not required, and in fact laptops that have several-year-old video chipset hardware can
run ImageIQ3D without breaking a sweat.
HDlogix, Inc.  26 Mayfield Ave.  Edison, NJ 08837  (732) 623-2067  www.hdlogix.com
Overview of the ImageIQ3D Process
®
Like ImageIQ , ImageIQ3D performs sophisticated motion analysis called optical flow. Much
information about the 3D scene geometry can be gleaned from the relative motion of objects in the
video and how they occlude (reveal and hide) pixels in other objects as they move – as long as the
motion estimation is precise and accurate. Second, straight lines of buildings, the horizon, and other
objects gives clues about the vanishing points, which also help to solve the puzzle. A Generalised
Hough/Radon Transform helps identify these lines and other useful features. Finally, in most
photography there is a tendency for objects that are very near and very far from the camera‘s focal
plane to be blurred by an amount proportional to their distance from it. A Blind Point-SpreadFunction Estimator is used to estimate the out-of-focus character for each pixel, to complete the
information needed to estimate the depth of the video. Some of this information is always available,
sometimes not all of it is (for example, when nothing is moving in the video). ImageIQ3D uses a
superresolution-based statistical approach to achieve robust and consistent results even when there
is very little or partial information available.
Figure 3. The ImageIQ3D Process – a simplified view.
Ultimately, the goal is to produce an accurate depth map for each video frame – a representation of
the distance of each pixel in the video from the camera. Once an accurate depth map is calculated, it
is possible to easily convert to and from any 2D or 3D format!
ImageIQ3D: Depth-from-Motion via Optical Flow
Critical information about the objects and background making up a scene, and their relative distances
to the camera, can be calculated if these objects ever move, and if there is an accurate and precise
®
estimation of the true motion. ImageIQ3D uses the same optical flow engine as ImageIQ ‘s
superresolution process.
The ImageIQ3D optical flow computation system achieves real-time, per-pixel dense motion
estimation with a wide and precise spatial dynamic range – 0.01 to 500.00 pixels. A motion vector is
calculated for every pixel, in every image – the motion vector tells how much the pixel has moved.
One way to view a motion-vector field is to let hue represent the direction, and brightness to represent
the magnitude, as shown in Figure 4.
HDlogix, Inc.  26 Mayfield Ave.  Edison, NJ 08837  (732) 623-2067  www.hdlogix.com
Figure 4. Original frame (left), Hue-Saturation-Value representation of the optical flow field (right).
Not just the motion itself is important – how objects hide and reveal pixels from other objects and the
background behind them gives significant depth information. ImageIQ3D computes occlusions in
addition to optical flow. Figure 5 demonstrates the ImageIQ3D Depth from Motion process in action:
Figure 5. Using occlusion and motion to generate a depth map. Top Left image – reveal and hide occlusions
are marked in red and yellow. Top right image – optical flow. Bottom image – generated depth map from
motion and occlusions.
HDlogix, Inc.  26 Mayfield Ave.  Edison, NJ 08837  (732) 623-2067  www.hdlogix.com
Figure 6. The generated depthmap has been used to generate a synthetic left/right image pair. (One can get
the 3D effect by crossing one‘s eyes to fuse the left and right sides).
Figure 7. The depthmap was used to generate a synthetic red/cyan anaglyph (if one has a cheap pair of
red/cyan glasses, you can view the effect).
HDlogix, Inc.  26 Mayfield Ave.  Edison, NJ 08837  (732) 623-2067  www.hdlogix.com
ImageIQ3D: Scene Change Detection
What happens if the camera is panning, and then suddenly stops? Previous algorithms would lose all
depth information and the 3D Video would ―flatten out‖. The solutions to this problem are
conceptually simple – somehow accumulate depth for each pixel as you go along. Of course, pixels
move, and this requires a very accurate motion compensation to do properly. Another problem is that
accumulating depth values as a history can significantly corrupt the depth map for the current frame
unless the system knows when the shot has cut to a new scene, or even if all of the relevant pixels
have ―panned off the screen‖. Carrying over depth from a previous shot can result in serious
distortions, and in some cases, a violent motion sickness response in some viewers.
Clearly, a ―shot change detection‖ method is required, and this is a well traversed area of study and
practice – but for the 2D to 3D case, it‘s not enough to know if the editor cut away to an entirely
different scene. One must know, reliably, when each individual pixel has moved offscreen and no
longer has any history – and when new pixels appear, one has to know that too. Of course, if the
current shot cuts away, all depth assumptions have to be reset as well.
ImageIQ3D has a very robust ―scene-change detection‖ engine that provides exactly this capability
– for every pixel, individually. When everything has panned or zoomed offscreen, or the current
scene has cut or faded away, ImageIQ3D knows how to reset its assumptions – a very important part
of making the 2D to 3D process seamless and requiring no user intervention. This is also very
important to ensure that these changes don‘t cause viewers‘ eyes to cross (or cause them to throw
up) when errors due to scene changes cause left/right disparity issues.
ImageIQ3D: Depth-from-Vanishing Points via Radon Transform
Video does not always include motion. Sometimes, other cues are necessary to obtain depth. One
solution is to use geometric clues in the images themselves to assist – if one knows where the
predominant straight edges are, and has some information about the faces of objects in a scene,
some information about the depth of foreground objects and the background can be obtained. Like
MRI machines, ImageIQ3D performs a Hough/Radon transform to correlate image edges and
structure – except ImageIQ3D does it in real-time:
Figure 8. Not all images have good geometric depth cues. Original frame, marked up with predominant
straight edges in red (left), Generalised Hough/Radon on right. Crossings of the curves and bright yellow/white
dots indicate position and slope of significant straight lines. This frame is ambiguous, so other information (like
motion/occlusion) is needed to infer depth.
Depth from 2D is a specialized case of superresolution – using multiple pieces of information to fill in
an incomplete (or sometimes, overcomplete) estimation. In the classic superessolution case, one is
HDlogix, Inc.  26 Mayfield Ave.  Edison, NJ 08837  (732) 623-2067  www.hdlogix.com
trying to enlarge an image and fill in missing pixels with information from previous frames with motion
giving the critical ―clues‖. In this case, ImageIQ3D fills in depth information from previous frame
motion, plus multiple other sources – like geometric cues.
Figure 9. Other images have excellent geometric depth cues. Original frame marked up with straight edges in
red (left), Generalised Hough/Radon on right. This frame has several clearly distinguishable straight edges
indicated by convergence of crossing curves in the transform on the right. Vanishing points can be clearly
detected, and are used to constrain the depth map estimation.
Figure 10. Depth map obtained from geometric depth cues plus motion.
ImageIQ3D: Depth-from-Focus via PSF Estimation
Another way to increase the robustness of depth estimation is to include information about how much
different objects in the scene are blurred, relative to each other. In combination with motion,
occlusions, and geometric cues, a robust depth estimation can be obtained by performing Point
Spread Function (PSF) estimation. This process estimates the focus and motion blur for each pixel
in a scene.
The information from the Radon/Hough transform is not only used to estimate geometric features, but
also to find relevant edges which can be used to estimate the blur of objects in the scene. In
combination with the structure analysis performed by ImageIQ®‘s optical flow analysis, the blur of
each pixel (if it is near an edge feature) can be obtained. A robust regularization function is used to
propagate values to adjacent non-edge pixels. These per-pixel focus cues, like the geometric cues,
are incorporated into the overall model that builds the final depth map.
HDlogix, Inc.  26 Mayfield Ave.  Edison, NJ 08837  (732) 623-2067  www.hdlogix.com
ImageIQ3D: Putting it all Together
A great deal of the magic of ImageIQ3D is not just performing optical flow, Hough/Radon, and
intelligent, adaptive operations – but intelligently applying brute force. Like ImageIQ®, ImageIQ3D
treats 2D to 3D as a superresolution problem – instead of creating pixels in the X and Y directions by
using X, Y and Time information, ImageIQ3D creates new pixels in the Z direction using X, Y, and
Time information. Most of the intelligence is embedded in how all of this information is combined, and
how it constrains the final ―solution‖ – that solution being a consistent, reliable depth map that can be
used to translate any 2D or 3D video into any other suitable 3D video format.
ImageIQ3D: There’s One More Animal in the “3D Zoo” to Tame
A different set of problems are presented when converting anaglyph (colored-glasses) video to fullcolor stereo – but, the toolset that ImageIQ3D uses lends itself extremely well to this circumstance as
well. Consider a green/magenta anaglyph video:
Figure 11. A frame from a green/magenta anaglyph 3D film.
In this case, the left eye is coded into the green channel of the RGB image, and the right eye is coded
into the red and blue channels. The full-color stereo version can be reconstructed, as long as there is
a robust method of optical flow that knows about occlusions -- this sounds familiar! Conceptually, it‘s
simple – estimate the optical flow between the Green, and the Red/Blue – and motion compensate
the green toward the Red/Blue – add them together, and this becomes the full-color right eye image.
Next, estimate the optical flow between the Red/Blue, and the Green – and motion compensate the
Red/Blue toward the Green – add them, and this becomes the full-color left eye image.
HDlogix, Inc.  26 Mayfield Ave.  Edison, NJ 08837  (732) 623-2067  www.hdlogix.com
Figure 12. ImageIQ3D color anaglyph to full-color stereo conversion process.
More properly stated, this is actually a problem of disparity estimation (not optical flow) between the
right and left images -- but either way, the right and left images are using different colors. This makes
solving this problem very difficult because the green color for the left eye, and the magenta for the
right eye, cannot easily be compared, because their colors and brightness (and pixel values) are
different. However, the Optical Flow engine in ImageIQ3D does not use block matching, or colors,
but uses actual object structure, per-pixel, to determine motion and optic flow – so in this case, it‘s
perfectly suited to the problem at hand.
Figure 13. The same green/magenta anaglyph 3D film, reconstructed as a full-color 3D stereo pair. The
original full-color movie was NOT used to construct this stereo pair.
In short, this means that a player incorporating ImageIQ3D can not only convert from 2D to 3D, but
take any legacy 3D format (including color anaglyph) and convert to full-color, full-stereo 3D, in realtime, with no operator or user intervention or tuning – on commodity, off-the-shelf, inexpensive GPU
hardware.
HDlogix, Inc.  26 Mayfield Ave.  Edison, NJ 08837  (732) 623-2067  www.hdlogix.com
ImageIQ3D: Many Deployment Options
ImageIQ3D has been developed as a consumer DVD player application for Windows and MacOS,
and as a batch-mode processor running on Linux, and is ready for low-BOM and parts-countsensitive consumer electronics applications. To find out how you can leverage ImageIQ3D in your
network, workflow or consumer electronics solutions contact HDlogix at sales@hdlogix.com.
®
ImageIQ is a registered trademark of HDlogix, Inc. ImageIQ3D is a trademark of HDLogix, Inc.
©2009 HDlogix, Inc.