0% found this document useful (0 votes)
32 views

Optimal Visual Sensor Planning: Jian Zhao and Sen-Ching S. Cheung

The document discusses optimal visual sensor planning for visual sensor networks. It introduces challenges in designing visual sensor networks due to complex environments, occlusion, diverse sensor properties, and performance metrics. It then presents a general visibility model that incorporates 3D environments, realistic camera models, occlusion, and traffic models to optimize target performance like visual tagging. An optimization algorithm based on binary integer programming is used to find the optimal camera placement given the visibility model.

Uploaded by

Samson Cheung
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Optimal Visual Sensor Planning: Jian Zhao and Sen-Ching S. Cheung

The document discusses optimal visual sensor planning for visual sensor networks. It introduces challenges in designing visual sensor networks due to complex environments, occlusion, diverse sensor properties, and performance metrics. It then presents a general visibility model that incorporates 3D environments, realistic camera models, occlusion, and traffic models to optimize target performance like visual tagging. An optimization algorithm based on binary integer programming is used to find the optimal camera placement given the visibility model.

Uploaded by

Samson Cheung
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Optimal Visual Sensor Planning

Jian Zhao and Sen-ching S. Cheung


Center for Visualization and Virtual Environments University of Kentucky, Lexington, KY 40507 Email: [email protected], [email protected]

Abstract Visual sensor networks are becoming more and more common. They have a wide-range of commercial and military applications from video surveillance to smart home and from trafc monitoring to anti-terrorism. The design of such a visual sensor network is a challenging problem due to the complexity of the environment, self and mutual occlusion of moving objects, diverse sensor properties and a myriad of performance metrics for different applications. As such, there is a need to develop a exible sensor-planning framework that can incorporate all the aforementioned modeling details, and derive the sensor conguration that simultaneously optimizes the target performance and minimizes the cost. In this paper, we tackle this optimal sensor problem by developing a general visibility model for visual sensor networks and solving the optimization problem via Binary Integer Programming (BIP). Our proposed visibility model supports arbitrary-shaped 3D environments and incorporates realistic camera models, occupant trafc models, self occlusion and mutual occlusion. Using this visibility model, a novel BIP algorithms are proposed to nd the optimal camera placement for tracking visual tags in multiple cameras. Experimental performance analysis is performed using Monte-Carlo simulations.

deployed in urban or indoor environments characterized by complicated topologies, stringent placement constraints and constant ux of occupant or vehicular trafc. Second, from infra-red to range sensing, from static to pan-tilt-zoom or even robotic cameras, there are a myriad of visual sensors and many of them have overlapping capabilities. Given a xed budget with limited power and network connectivity, the choice and placement of sensors become critical to the continuous operation of the visual sensor network. Third, the performance of the network depends highly on the nature of the specic tasks in the application. For example, biometric and object recognition require the objects to be captured at a specic pose; triangulation requires visibility of the same object from multiple sensors; object tracking can tolerate certain degree of occlusion using a probabilistic tracker. The earliest investigation in this area can be traced back to the art gallery problem in computational geometry. Though the upper bound exists [1], the minimum number of cameras needed to cover a given area is a NP-complete problem [2]. Heuristic solutions over 3D environments are recently proposed in [3], [4] but their sophisticated visibility models can solve only small-scale problems. Alternatively, the optimization can be tackled in the discrete domain [5], [6]. The optimal camera conguration is formulated as a Binary Integer Programming (BIP) problem over the discrete lattice points. These work however assume a less sophisticated modeling in a 2-D space rather than a true 3-D environment and the loss in precision due to discretization has not been properly analyzed. In this paper, we continue our earlier work in [7] in developing a binary integer-programming based framework for determining the optimal visual sensor conguration. Our primary focus will be on optimizing the performance of the network for visual tagging. Our proposed visibility model supports arbitrary-shaped 3D environments and incorporates realistic camera models, occupant trafc models, self occlusion and mutual occlusion. In section II we develop the visibility model for the visual tagging problem based on the probability of observing a tag from multiple visual sensors. Using this metric, we formulate in Section III the search of the optimal sensor placements as a Binary Integer Programming (BIP) problem. Experimental results demonstrating this algorithm using simulations are presented in Section IV. We conclude the paper by discussing future work in Section V.

I. I NTRODUCTION In recent years we have seen widespread deployment of smart camera networks for a variety of applications. Proper placement of cameras in such a distributed environment is an important design problem. The reason is that the placement has a direct impact on the appearance of objects in the cameras which dictates the performance of all subsequent computer vision tasks. For instance, one of the most important tasks in distributed camera network is to visually identify and track common objects across disparate camera views. It is a difcult problem because the proper identication of semantically rich features or visual tags like faces or gaits depends highly on the pose of these features relative to the camera view. Using multiple cameras can alleviate this visual tagging problem but the actual number of cameras and their placement become a non-trivial design problem. To properly design a camera network that can accurately identify and understand visual tags, one needs a visual sensor planning tool a tool that analyzes the physical environment and determines the optimal conguration for the visual sensors so as to achieve specic objectives under a given set of resource constraints. Determining the optimal sensor conguration for a large-scale visual sensor networks is technically a very challenging problem. First, visual line-of-sight sensors are amenable to occlusion by both static and dynamic objects. This is particularly problematic as these networks are typically

II. V ISIBILITY M ODEL Given a camera network, we model the visibility of a tag based on three random parameters P , v P and s , as well as two xed environmental parameters K and w. P denes the 3D coordinates of the center of the tag and v P is the pose vector of the tag. We assume the tag is perpendicular to the ground plane and its center lies on a horizontal plane . Note the dependency of V on v P allows us to model selfocclusion the tag is being occluded by the person who is wearing it. The tag will not be visible to a camera if the pose vector is pointing away from the camera. We model the worst-case mutual occlusion by considering a xed occlusion angle measured at the center of the tag on the plane. Mutual occlusion is said to occur if the projection of the line of sight on the plane falls within the range of the occlusion angle. In other words, we model the occlusion as a cylindrical wall of innite height around the tag partially blocking a xed visibility angle of at random starting position s . w is half of the edge length of the tag which is a known parameter. The shape of the environment is encapsulated in the xed parameter set K which contains a list of oriented vertical planes that describe the boundary wall and obstacles. Our visibility measurement is based on the projected size of a tag on the image plane of the camera. The projected size of the tag is very important as the image of the tag has to be large enough to be automatically identied at each camera view. Due to the camera projection of the 3-D world to the image plane, the image of the square tag can be an arbitrary quadrilateral. While it is possible to precisely calculate the area of this image, it is sufcient to use an approximation for our visibility calculation: we measure the projected length of the line segment l at the intersection between the tag and the horizontal plane . The actual 3-D length of l is 2w, and since the center of the tag always lie on l, the projected length of l is representative of the overall projected size of the tag. Given a single camera with the camera center at C, we can dene the visibility function for one camera to be the projected length ||l || on the image plane of the line segment l across the tag if the above conditions are satised, and zero otherwise. Figure 1 shows the projection of l, delimited by P l1 and Pl2 , onto the image plane . Based on the assumptions that all the tag centers has the same elevation and all tag planes are vertical, we can analytically derive the formulae for P l1 , Pl2 as Pli = C vC , O C (Pli C) vC , Pli C (1)

Pl2
s

Pl1

Occlusion Tag Orientation vP Environment, K

P Pl2 Pl1 O C

vC

Image Plane

Fig. 1.

Projection of a single tag onto a camera.

due to the environment. We represent this requirement as the binary function chkObstacle(P, C, K) which returns 1 if occlusion occurs and 0 otherwise. Field of View: Similar to determining environmental occlusion, we declare the tag to be in the eld of view if the image P of the tag center is within the nite image plane . Using a similar derivation as in (1), the image P can be computed as P = C vC ,OC (P C). We then convert P vC ,P C to local image coordinates to determine if P is indeed within . We encapsulate this condition using the binary function chkFOV(P, C, vC , , O) takes camera intrinsic parameters, tag location, pose vector as input, and returns a binary value indicating whether the center of the tag is within the cameras eld of view. Self Occlusion: As illustrated in Figure 1, the tag is self occluded if the angle between the light of sight to the camera C P and the tag pose v P exceeds . We can represent this 2 condition as a step function U ( ||). 2 Mutual Occlusion: As illustrated in Figure 1, mutual occlusion occurs when the tag center or half the line segment l is occluded. The angle is suspended at P on the plane. Thus, occlusion occurs if the projection of the light of the sight C P on the plane at P falls within the range of [s , s + ). We represent this condition using the binary function chkOcclusion(P, C, v P , s ) which returns one for no occlusion and zero otherwise. Combining both ||l || and the four visibility conditions, we dene the projected length of an oriented tag with respect to camera as I(P, vP , s |K, ) follows: I(P, vP , s |w, K, ) = ||l || chkOcclusion(P, C, vP , s ) chkObstacle(P, C, K) chkFOV(P, C, vC , , O) U || (2) 2 where includes all camera parameters. Most vision algorithms requires the tags to be big enough for detection. Thus, a threshold version is usually more convenient: Ib (P, vP , s |w, T, K, ) = 1 0 if I(P, vP , s |w, K, ) > T otherwise (3)

where , indicates inner product,The projected length ||l || is simply ||Pl1 Pl2 ||. After computing the projected length of the tag, we proceed to check four visibility conditions as follows: Environmental Occlusion: We assume that environmental occlusion occurs if the line segment connecting camera center Specically, intersection between the line of sight P C and each obstacle in K is computed. If there is no intersection within the conned environment or the points of intersection are higher than the height of the camera, no occlusion occurs

To extend the single-camera case to multiple cameras, we note that the visibility of the tag from one camera does not affect the other and thus, each camera can be treated independently. Assume that the specic application requires a tag to be visible by k camera. The tag at a particular location and orientation is visible if the sum of the Ib () values from all the cameras exceed k at that location. III. O PTIMAL C AMERA P LACEMENT In this section, we propose an binary integer program that nds the best placement given a target number of cameras. We rst discretize the space of possible camera conguration space, including possible location, yaw and pitch angles into an uniform lattice gridC of N c camera grid points, denoted as {i : i = 1, . . . , Nc }. We also discretize the tag space which includes possible tag position P , orientation v P and occlusion s into a uniform lattice gridP with N p tag grid points {i : i = 1, 2, . . . , Np }. The goal of the algorithm FIX CAM is to maximize the average visibility, for a given number of cameras. We rst dene a set of binary variables on the tag grid {x j : j = 1, . . . , Np } indicating whether a tag on the j th tag point in gridP is visible at two or more cameras. We also assume a prior distribution { j : j = 1, . . . , Np , j j = 1} that describes the probability of having a person at that tag grid point. We dene binary variables on the camera grid {b i : i = 1, . . . , Nc } to be one to indicate the placement of a camera. The cost function dened to be the average visibility over the discrete space is given as follows:
Np

c a maximum number of cameras or j=1 bj m. For each camera location (x, y), we keep the following constraint to ensure only one camera is used at each spatial location or all at (x, y) bi 1.

IV. E XPERIMENTAL R ESULTS We rst demonstrate the performance of FIX CAM based on simulation results. All the simulations assume a room of dimension 10m 10m with a single obstacle inside and a square tag with edge length w = 20 cm long. For the camera and lens models, we assume a pixel width of 5.6 m, focal length of 8 cm and the eld of view of 60 degrees. These parameters closely resembles the real cameras that we use in the real-life experiments. The threshold T for visibility is set to ve pixels which we nd to be an adequate threshold for our color-tag detector. The visibility is dened to be having k = 2 cameras looking at a tag. While we use a discrete space for the optimization, we compute the average visibility for a given camera conguration with Monte Carlo sampling using three orders number of sample points than that of the discrete lattice. Table I shows the results of the average visibility under different number of cameras. The optimal average visibility over the discrete space is shown in the second column. The average visibility estimated by the Monte Carlo method is shown in the third column and the last column shows the computation time on a Xeon 2.1Ghz machine with 4 Gigabyte of memory. The BIP solver is based on the software in [8]. The gap between the optimal solution from FIX CAM is due to discretization. Fixing the number of cameras at eight and varying the density of the grids, Figure 2 shows that the resulting camera planning improves and the gap between the continuous and discrete measurements dwindles. The drawback of using a denser grid is a signicant increase in computational complexity it takes hours to complete the simulation using the highest density. One solution is to use the approximate solution discussed in our earlier work [7].
TABLE I Performance of FIX CAM

max
bi j=1

j xj

(4)

The relationship between the camera placement variables b i s and visibility performance variables x j s can be described by the following constraints. For each tag grid point j , we have
Nc

bi Ib (j |w, T, K, i ) (Nc + k)xj < k


i=1 Nc

(5)

bi Ib (j |w, T, K, i ) kxj 0
i=1

(6)

These two constraints effectively dene the binary = 1, Inequality (6) becomes variable xj : if xj Nc bi Ib (j |w, T, K, i ) k which means that a feasible i=1 solution of b i s must have the tag visible at k or more cameras. Nc Inequality (5) becomes i=1 bi Ib (j |w, T, K, i ) < Nc + k which is always satised the largest possible value from the left-hand size is N c . If xj = 0, Inequality (5) becomes Nc i=1 bi Ib (j |w, T, K, i ) < k which implies that the tag is not visible by k or more cameras. Inequality (6) is always satised as it becomes Nc bi Ib (j |w, T, K, i ) 0. i=1 Two additional constraints are needed to complete the formulation: as the cost function focuses only on visibility, we need to constrain the number of cameras to be less than

No. cameras Eleven Ten Nine Eight

Discrete 0.99 0.98 0.97 0.96

Continuous 0.9205 0.9170 0.9029 0.8981

Time(s) 2.00 1.90 10.01 3.57

Next, we show how one can incorporate realistic occupant trafc patterns into the FIX CAM algorithm. The previous experiments assume an uniform trafc distribution over the entire tag space it is equally likely to nd a person at each spatial location and at each orientation. This model does not reect many real-life scenarios. For example, consider a hallway inside a shopping mall: while there are people browsing at the window display, most of the trafc ows from one end of the hallway to the other end. By incorporating an appropriate trafc model, the performance should be improved under the

Visibility

Grid Density

(a) Random Walk Fig. 2. This gure shows the convergent behavior of FIX CAM as the tag grid density increases.

(b) visibility = 0.8395

same resource constraint. In the FIX CAM framework, a trafc model can be incorporated into the optimization by using nonuniform weights j in the cost function (4). In order to use a reasonable trafc distribution, we employ a simple random walk model to simulate a hallway environment. We imagine that there are openings on the either sides of the top portion of the environment. At each of the tag grid point, which is characterized by both the orientation and the position of a walker, we impose the following transitional probabilities: a walker has a 50% chance of moving to the next spatial grid point following the current orientation unless it is obstructed by an obstacle, and has a 50% chance of changing orientation. In the case of changing orientation, there is a 99% chance of choosing the orientation to face the tag grid point closest to the nearest opening while the rest of the orientations share the remaining 1%. At those tag grid points closest to the openings, we create a virtual grid point to represent the event of a walker exiting the environment. The transitional probabilities from the virtual grid point back to the real tag points near the openings are all equal. The stationary distribution j is then computed by nding the eigenvector with eigenvalue one of the transitional probability matrix of the entire environment. Figure 3(a) shows this hallway environment. The four hollow circles indicate the tag grid points closest to the openings. The result of the optimization under the constraint of using four cameras is shown in Figure 3(b). Figures 3(a) and 3(c) show the oor plan with the blue arrows indicating the optimal camera plans. Figures 3(b) and 3(d) show the coverage of the environment by calculating the local average visibility at different spatial locations. Clearly the optimal conguration favors the heavy trafc hallway area. If the uniform distribution is used instead, we obtain the conguration in Figure 3(c) and the visual map in Figure 3(d). The average visibility drops from 0.8395 to 0.7538 as there is a mismatch of the trafc pattern. The performance of FIX CAM under other experimental conditions such as mutual occlusion, camera elevations and tag elevations, as well as comparing with other schemes can be found in [7]. V. C ONCLUSION In this paper, we have described a binary integer programming framework in modeling, measuring and optimizing

(c) Uniform Fig. 3.

(d) visibility= 0.7538

Figures 3(a) and 3(b) use the specic trafc distribution for optimization and obtain a higher as compared to using an uniform distribution in gures 3(c) and 3(d).

placement of multiple cameras. There are many interesting issues in our proposed framework and visual tagging in general that deserve further investigation. The incorporation of models for different visual sensors such as omnidirectional and PTZ cameras or even non-visual sensors and other output devices such as projectors is certainly a very interesting topic. The optimality of our greedy approach can benet from a detailed theoretical studies. Last but not the least, the use of visual tagging in other application domains such as immersive environments and surveillance visualization should be further explored. R EFERENCES
[1] V. Chvatal, A combinatorial theorem in plane geometry, Journal of Combinatorial Theory Series B, vol. 18, pp. 3941, 1975. [2] D. Lee and A. Lin, Computational complexity of art gallery problems, IEEE Transactions on Information Theory, vol. 32, pp. 276282, 1986. [3] T. Bodor, A. Dremer, P. Schrater, and N. Papanikolopoulos, Optimal camera placement for automated surveillance tasks, Journal of Intelligent and Robotic Systems, vol. 50, pp. 257295, November 2007. [4] A. Mittal and L. S. Davis, A general method for sensor planning in multisensor systems: Extension to random occlusion, International Journal of Computer Vision, vol. 76, no. 1, pp. 3152, 2008. [5] Mohammad Al Hasan, Krishna K. Ramachandran, and John E. Mitchell, Optimal placement of stereo sensors, Optimization Letters, vol. 2, pp. 99111, 2008. [6] E. Horster and R. Lienhart, On the optimal placement of multiple visual sensors, in VSSN 06: Proceedings of the 4th ACM international workshop on Video surveillance and sensor networks, New York, NY, USA, 2006, pp. 111120, ACM Press. [7] J. Zhao, S.-C. Cheung, and T. Nguyen, Optimal camera network congurations for visual tagging, IEEE Journal on Selected Topics of Signal Processing, vol. 2, no. 4, September 2008. [8] Tobias Achterberg, Constraint Integer Programming, Ph.D. thesis, Technische Universit t Berlin, 2007, http://opus.kobv.de/ a tuberlin/volltexte/2007/1611/.

You might also like