Image-based facial animation system for interactive services



Eye Animation:


    1. Basics

        Eye animation systems consist of an eye control unit (ECU) and a rendering engine to synthesize animations. The ECU consists of models controlling gaze patterns, blinks and the dynamics of human eye movements and sends generated eye control parameters to the rendering engine. Optionally eye animation systems have a unit extracting audio features from the spoken output, which are sent to the ECU.

        We distinguish between two gaze states mutual gaze (MG) and gaze away (GA). We define MG as the state, in which the direction of the gaze is located within the facial area of the interlocutor consisting of the mouth and eye area. If the gaze is not in this defined area, then the system is in GA. If the speaker switches from MG to GA, a gaze shift (GS) is performed.



    1. Eye Control Unit

        First the characteristics of eye globe rotations and models to generate the different types of eye movements are briefly explored. The eye control unit (ECU) contains models controlling eye blinks and movements.

        This unit also selects the appropriate model for listening and talking mode. Overview of the ECU: While in listening mode two independent models control the eyes, in talking mode one model is designed. Both modes share the mutual gaze model and models of eye movements. Whereas the eyelid position is controlled by the eye blink models as well as the models of eye movements, the eye globe is only steered by the latter. Head motion and audio features are input parameters (dashed arrows).
        1. Eye Movement Physiology

            Donders' law states that each time the eye looks in a particular direction, it only assumes one 3D orientation. If the eye looks straight forward with the head straight and fixed, which is denoted as the primary position, torsion is not induced. All other gaze positions have their own unique torsion component. Listing's law (LL), which is considered to be one of the most important principles in eye movement physiology, states exactely what those torsion values are. It states when the head is upright, stationary and the eyes fixating a distant object, all rotation axes lie within the same plane, denoted as Listing's plane (LP). Although LL is only fulfilled under the mentioned conditions, violations like binocular convergence only induce small changes of the orientation of LP. Binocular convergence refers to the angle that results when the viewer turns both eyes to fixate a target. Eye movements, which neither start nor end in the primary position, fulfill LL only by a rule called half-angle, which we take into account. Then the plane of rotation is tilted by half the angle between the momentary and primary position. In our system two types of eye movements are executed: saccades and vestibulo-ocular reflex (VOR), which compensates head motion. In order to fixate the retina onto an object during head rotation, the VOR executes eye movements compensating head motions. For perfect compensation the rotation axis of the eye and head need to be parallel. It has been shown, however, that the axis of eye rotation during VOR neither meets the needs for perfect 3D gaze stabilization nor LL. It is a compromise of both constraints. Since the variations are negligible small, we assume a perfect compensation. Furthermore, the latency of VOR is less than 14ms and therefore negligibly small in our system. The generation of saccades is based on the work of Lee et al.. We improve their model by including Listing's law and head tilts, which have been previously described, as well as eyelid movements. Typically, the duration of the saccade is proportional to its magnitude and the velocity is given by a measured velocity function. Vertical saccades and eyelid movements are coupled. For this, we define multiple saccadic magnitude thresholds in vertical direction (up and down). If the saccadic magnitude is larger than an empirically selected threshold, then the appropriate eyelid is selected from the database.
        1. Talking Mode

            In talking mode, two independent models controlling gaze shifts and blinks cannot be designed, since eye movements and blinks are coupled. Hence, we propose an algorithm that iteratively determines an animation path, which contains information for eye movements and blinks for each frame of the animation.

            Firstly, each frame of the animation path is labeled with its corresponding observation o, which is extracted from the spoken output. The follwoing observations are automatically detected in our system: 'word boundary or pause' (WB), 'slow speech rate' (SSR), 'word prominence' (WP), 'filling word' (FW), and 'other' (OT). Some speakers use filling words such as 'ehm' while talking.Secondly, for the entire animation the gaze patterns MG and GA are determined and stored in the animation path. For this, to each frame a random number from a uniform probability distribution between zero and one is assigned. Now each frame has an observation o and random number. The animation system starts in the default state MG. A gaze shift is executed, if the random number is smaller than the conditional probability of performing a GS given o. Then the state is switched from MG to GA. Since the duration of remaining in GA is independent of o, the duration is determined by modeling the lognormal distribution, which is deterined in an experiment. A second consecutive gaze shift is executed with a probability of 34%. After these one or two GS, the model returns to MG. At the end of the utterance, the talking-head is looking to the interlocutor (MG). Note that, gaze shifts due to head motion are added. While the system remains in MG, the POR is varied by a finite state machine (FSM). Thirdly, saccades are generated. Preliminary studies indicate that there are no statistical dependencies between the saccadic magnitude of previously executed saccades, GA duration and the current saccade. If necessary, the head motion in the background sequence is taken into account. Fourthly, eye blinks, which are simultaneously executed with a gaze shift, are added to the animation path. Since the magnitudes of large saccades are known, the probability of executing a blink can be calculated. Finally, additional eye blinks are added to the animation path. While the model synthesizing new gaze patterns uses the conditional probability of a GS given o, eye blinks cannot be generated by only taking the observation o into account. The temporal dependency of blinks must be considered, since eye blinks fulfil the biological purpose to regularly wet the cornea and remove irritants from the surface of the cornea and therefore humans do regularly blink. In order to generate eye blinks we design a sophiticated FSM with three states and calculate the corresponding transition probabilities.


    1. Rendering Engine

        Since modeling human eyes is a difficult task, a sophisticated rendering engine needs to be designed. The iris contains specular reflections that need to be correctly modeled to achieve life-like looking eyes. In the image-based approach, however, the position of the specular reflections depends on the head's position in the recorded sequence. Hence, eye images cannot be normalized and rendered in a different position. Therefore, a rendering engine is developed which combines a 3D model and image-based rendering. In order to animate a talking-head, the following data has to be initially prepared: The eye globe is modelled by a half sphere with eye texture, which consists of a high-resolution image of the human eye without specular reflections. Moreover, textures with specular lights need to be generated. The eye socket and eyelid is modeled by a number of images stored in a database in which the person executes a blink. The surface of the eye area including the eye socket is approximated by a 3D eye model, which is acquired by a 3D laser scan. The eye animation is rendered in two steps: Firstly, the eye globes, which are synthesized by texturing half spheres, are rotated according to the eye control parameters. Afterwards specular lights are added at the appropriate positions on the eye globe by taking the eye pose and virtual spot light positions into account. The eye globes are combined with the eye socket image, which is retrieved from the database according to the eye control parameters. Different durations of eye blinks are generated by repeating or removing images from the recorded blink in the database. The rendered image is denoted as an eye sample. Secondly, image rendering overlays the eye sample over a background video sequence by warping the sample into the correct pose. In order to conceal illumination differences between an image of the background video and the eye sample, the samples are blended in the background sequence using alpha-blending.


    1. Demos (If the codec is required by your windows media player software, please install the codec package!)

        All the following clips are not used for previously training the models of the eye control unit. In the video clips, the speaker on the one hand utters typical sentences used by a virtual operator in a dialog system and on the other hand the speaker describes his new apartment. Eye animations are generated by using the spoken output and the speakers head movements as input parameters to the eye animation system. Note that we did not produce any videos from synthesized speech (TTS), since humans can still easily distinguish between real spoken output and synthesized speech. Our animation system, however, can create also animotions to synthesized speech. Only eye movements and blinks are varied, because we are only focusing on this part. We only test the talking mode, since we focused on this mode.

        First we present a sample of a recording: originalVideo

        The designed ECU has the advantage of giving the opportunity to transfer eye parameters between persons. Since the measured eye movement distributions highly vary between individuals, a talking-head may appear more or less lively depending on the selected data. From the analysis of the eye movements of two human subjects, we created two animations. One of the human subjects was more lively while talking, whereas the second subject was more serious and looking to the interlocutor. Note, that both subjects discussed current affairs.

        Animations with the statistics of two different persons: less lively vs. more lively

        In order to create some unnatural animations we increased the measured probabilities. The next two clips show very unlikely human eye movements: extreme1 and extreme2



        Puzzle:


        Tell them apart - original or animation?

        1 sequence: a1 vs. a2 vs. a3

        2 sequence: b1 vs. b2

        3 sequence: c1 vs. c2

        4 sequence: d1 vs. d2

        5 sequence: e1 vs. e2



        If you are interested to know the answers, contact: aweissenATtnt.uni-hannover.de!



    1. Contact

      Prof. Dr.-Ing. Jörn Ostermann

      M.Sc. Kang Liu

      Leibniz Universität Hannover

      Institut für Informationsverarbeitung

      Appelstr. 9A, D-30167 Hannover

      Tel. +49(0)511 7625316

      Fax +49(0)511 7625333

      Email: kang@tnt.uni-hannover.de



    Leibniz Universitaet Hannover

      

    TNT Home | LUH | Search | Administrator
    Updated 18/04/2008
    Webpage design M.Sc. Kang Liu