ARTICLES PLATFORM

Sponsored Links

Researchers from the computer vision, computer graphics and image processing communities have been studying the problems associated with the analysis and synthesis of faces in motion for more than two decades.

As computers evolve towards becoming more human oriented machines, human to computer interfaces, behavior learning robots, and disabled-adapted computer environments will use facial expression analysis to be able to react to human action.

Many computer applications require real time and easy to use face animation parameter generation, which means that the solutions using motion capture equipment prove to be too tedious for many practical purposes. Hence, real time capabilities and low computing cost for both analysis and synthesis are required. Current trends tend to use speech analysis or synthesized speech from text as a source of real time animation data. But these techniques cannot provide realistic data for face detection.

To obtain realistic and natural 3D Face Animation, we need to understand the complete human face behavior and those image based methods that are cost effective techniques for human face movement understanding. If we want to express motion in real space, we must relate motion measured in terms of pixel co-ordinates to the real/ virtual world co-ordinates. That is, we need to relate the world reference frame to the image reference frame. Simply knowing the pixel separation in an image does not allow us to determine the distance of those points in the real world. We must derive some equations to link the world reference frame to the image reference frame in order to find the relationship between the co-ordinates of points in the 3D space and the co-ordinate of points in the image. We introduce the camera reference frame because there is no direct relationship between the previously mentioned reference frames. Then, we can find linking the camera reference frame with the image reference frame (LinkI), another equation linking the world reference frame with the camera reference frame (LinkE). Identifying the LinkI and LinkE is equivalent of finding the camera’s characteristics, also known as the camera’s intrinsic and extrinsic parameters helpful for the face motion analysis.

If we intend to perform robust expression and face emotion analysis, it is important to control the location of the face on the image plane. It is also important to know which orientation the face has with regard to the camera. The ‘find a face’ problem reduces to the detection of skin on the image. Although many solutions have been developed that will be discussed later in the report, the most generalized methods of skin detection use a probabilistic approach where the colorimetric characteristics of human skin are taken into account. First a probabilistic density functions P(rgb/skin) is usually generated to the given space color (RGB,YUV,HSV, or others). It indicates which the probability of belonging to the skin surface is. It is difficult to create this function, as well as to decide which the threshold will be to decide whether the studied pixel belongs to the skin or not. In some approaches, researchers study in detail the color models used and also give the probability function for the pixels that do not belong to the skin allowing regions with non homogeneous skin color characteristics to be found.

Determining the exact orientation of the head is a complicated task. In general, we use two different ways to determine the head pose: either using static methods or using dynamic approaches. Static methods search for the specific features of the face (eyes, lip corners, nostrils etc) on a frame by frame basis, and determine the user’s head orientation by finding the correspondences between the projected co-ordinates of these features and real world co-ordinates. They may use template matching techniques to find the specific features. This method works fine although it requires very accurate spotting of the relevant features. Unfortunately, this action has to be redone at each frame and it is somewhat tedious. Another possibility is to use 3D data, for instance, from a generic 3D head model, to accurately determine the pose of the head on the image.

To introduce time considerations by taking advantage of previous results, dynamic methods have been developed. These methods perform face tracking by analyzing video sequences as a more or less smooth sequence of frames. They use the pose information retrieved from one frame to analyze and derive the pose information on the next one. One of the most extended techniques involves the use of Kalman filters to predict analytical data, as well as the pose parameters themselves.

Most face detection methods require a training data set of face images, and the databases originally developed for face recognition experiments can be used as training sets for face detection. Since these databases were constructed to empirically evaluate recognition algorithms in certain domains, we first review the characteristics of these databases and their applicability to face detection. Although numerous face detection algorithms have been developed, most have not been tested on data sets with a large number of images. Moreover, most results of experiments reported using different types of test sets. In order to fairly compare methods, a few benchmarks test sets have recently been compiled. There are still few issues that need to be carefully considered in performance evaluation even when the methods use the same test set. One issue is that the researchers have different interpretations of what a ‘successful detection’ is. Another issue is that different training sets are used, particularly for appearance based methods which needs to be addressed for proper face detection.

Finally, it is concluded that although faces have tremendous variability, face detection remains a problem (face versus non face) and a robust face detection system should be effective under full variation in lighting conditions, orientation, pose and partial occlusion.