Stereo Vision



INTRODUCTION

Stereovision is the method of using multiple cameras to percieve targets most often for the purpose of calculating distance to those targets/obstacles/to the surrounding environment. In stereovision, the cameras' fields of vision must overlap slightly. Stereovision is simplest when all cameras are on the same vertical (y-plane) and depth (z-plane). In the case of the AstroBot project, the robot employs a binocular vision system (two cameras only) that meets those specifications. The hardware used is the Point Grey Bumblebee2 device.

Distance /depth (Z) is calculated using the above equation. The unit of the depth will be measured in real world linear measurements (i.e. meters).

(b) is the inter-camera distance or baseline ; this measurement is predetermined and constant. The greater the inter-camera distance, the more accurate the system (so long as the cameras' fields of view still overlap slightly). The unit of the baseline will be real linear measurements (i.e. centimeters).

(r) is the focal length. The focal length can be calculated by placing a reference object of known linear width at a known distance from the cameras. Measure the length of the object as seen by the camera (unit in pixels). The ratio between pixels and real linear distance is used later in conversion of units for (a) and (c). The distance at which this ratio is calibrated is the focal length; therefore the unit of the focal length should be in real linear measurements (i.e. centimeters).

(a) and (c) are the optical displacement of the target; these measurements' units are in pixels initially. To find (a), measure the distance between the center of camera one's view and the center of the target. (c) is found the same way in the case of camera two. Convert these distances from pixels into centimeters using the reference ratio/focal length so that distance(Z) will have the correct units.

This whole idea works around the principles of optical displacement. The closer an object is, the more it will appear to move when the viewer changes perspectives. In vice versa, the farther away an object is, the less it will appear to move when the viewer changes perspectives in the same way. The reason the sun is visible for so long in the day despite the earth's motion is because the sun is far enough away that the optical displacement is small. Cars on the highway move much slower relative to a stationary viewer than the earth does relative to a stationary sun, but those cars move out of view much faster than the sun moves out of the earth's view. The optical displacement of those moving cars is much greater than the sun's optical displacement, because the distance between the target (cars) and the viewer is so much smaller than in the other case where the sun is the target and the earth is the viewer. Distance calculation via optical displacement measurement does not require target dimensions to be known, making it an ideal method for any kind of exploratory mission.

TEMPLATE MATCHING AstroBot employs a template matching algorithm (OpenCV) that allows the robot to locate its charging station so that it may navigate to the station and dock. This algorithm has not yet been implemented into the main machine, however is being processed currently. Template matching requires two images: a reference and a template image. The template image is always dimensionally smaller than the reference image. Often (and in this case) the template image is taken directly from an example ideal reference image (i.e. cropping). The template image is systematically compared to every pixel combination in the reference image, starting from the top left and ending in the bottom right corner. If an area of the reference image is similar enough to the template image, that area on the larger reference image then qualifies as a match and a region of interest (ROI) is drawn. The x and y coordinates of the target are returned. The above formula is then used to calculate distance to the target. This value is returned.

Image template matching requires good contrast within images, and usually works best with color rather than greyscale. The algorithm under construction will not identify a target unless the predetermined criteria is met. Several introductory tests have been performed to evaluated the current version of the script. When the charging station was in the camera's view, the algorithm was able to identify the charging station correctly the majority of the time if not flawlessly. Likewise when the algorithm was presented with images without the charging station in view, no target was identified.



DISPARITY MAPS Disparity maps are visual outputs of how AstroBot perceives its environment in terms of distance. The maps are created by looking for a pixel or small set of pixels from one camera and trying to match that pixel/set of pixels to anywhere on the same y-plane of the other camera's view. The fact that the search for this pixel match is exclusive to the same y-plane is very important. A match between the two cameras' views means only that one area in the real world is the same area in the real world, though it is viewed by two different perspectives, and that these perspectives are therefore viewing the same thing. Because the inter-camera distance (baseline) is a known constant value, the optical displacement (difference in the x-coordinates of that pixel area from one camera's view to the other) also recognized as the values (a) and (c) in the above formula, can then be used in that formula to calculate distance to that area of the environment. When this calculation is performed for every pixel of the images, the result is two gray-scale images/disparity maps that indicate a visual representation of the perceived distance.

For this project, two disparity maps are outputted, one for each camera. Scales for color gradients are defined the following way. Camera One displays distant objects as being white and close objects as being black. Camera Two displays distant objects as being black and close objects as being white. For both cameras, all areas in between are variants of gray. The reason the gradient scales are opposite from each other is because when one camera's view subtracts the other, the sign is positive say, but then when the second camera subtracts the first, the sign is the opposite. Thus with opposite signs, the colors will be different.