Non Rigid Face Tracking


I rewrote the Non Rigid Face Tracking code by Jason Saragih (, which is published in the book “Mastering OpenCV with Practical Computer Vision Projects”. Copyright Packt Publishing 2012. (

Mainly because I want to understand how it works, and partly because I wanted to clean up the code (which is arguably kind of messy).

Basically the algorithm goes like this:

1.Apply Active Shape Model, which consists of:

  • Performing General Procrustes Analysis to align all the landmark points obtained from MUCT Face database in order to find a canonical shape. With this we obtain the rigid basis transformation matrix.
  • Apply PCA to the points, in order to reduce features dimensionality, thus only k eigenvectors are preserved.
  • Find and preserve 3 standard deviations of the found parameters, thus preserving 99.7% of the modes.

2. Learn the correlative patch models:

  • Basically for each landmark points, we try to learn the patch from the sample images, searching for best response map for each patches.
  • This is done by using stochastic gradient descent as the optimisation function, thus randomly choosing n-images to learn each patch.

Here’s a visualised sample of learned patch models, from about 1000 random samples:

Screen Shot 2014-04-21 at 17.33.51

3. Utilizing Viola Jones / Haar Cascade Classifier to detect face position:

  • During training period, we apply the cascade classifier on the sample images, in order to obtain average offset of the landmark points

4. Tracking face

  • With cascade classifier, we obtain the approximate bounding box where the faces are, and we could approximately overlay the shape model on top of it
  • Refine the location of the points, by utilising the patch model. Calculate response for each corresponding patches for a given search window, and use the positions that gives the best response.

And here’s a sample result of applying them onto the casts from


And here’s with predefined connections between the points:



And lastly, with triangle subdivisions:



You can get my example codes here:

It requires OpenCV, Intel TBB and Boost libraries. The face database can be downloaded from


5 thoughts on “Non Rigid Face Tracking

    • Thanks, I’m doing this because often I find it’s not hard to understand the papers, but converting them into codes take lot more than just simple translations, especially in C++. Hope that I can contribute back. But wait, if your comment is actually (“You a great a**hole”), then I apologise, I try to be selfless as much as I can, but I can also be grating at times. If I forgot to reference your work in my posts, do inform me. Have a nice day.

      • Anonymous

        Sorry, english isn’t my native language, so a made a mistake. “are great” will be more correct.
        Maybe you can explain is there a big difference between ASM/AAM and DPM ? They look very similar.

      • Don’t worry, English is also not my native language. I might be wrong, but from what I recall, ASM / AAM basically defines the shape constraint based on landmarks, thus putting landmark points on face to determine where the nose, eye, should go, etc. And we learn the average shape / placement of those landmarks from our training image.

        With deformable patch model, what we are doing is that, after learning the average shape / landmark placements, we do additional thing in order to find more accurate landmark position from the given image. So we try to learn the image patch around each landmark, in order to minimise the differences.

        Then in the final tracking / detection phase, we can move the average landmarks’ position around to find a new position (within a small search window) that is more accurate. It works well with faces, because although there are variations in faces, it’s rare for human to, say, move their lips to extreme positions.

        Think about it this way, ASM / AAM is your basic drawing outline of where the mouth, eyes should be placed on human face.

        But when you superimposed them on, say my face, it might be off, my mouth might be smaller, I might be squinting my eyes at that time. With DPM, you try to learn the optimal / average patch (window) of, say, the edges of the lips. and you try to find a new location for the edges of the lips, so that it fits the patch better.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s