This project uses a model of a human body to track people in a
still image or video clip.
The basic structure of the system is shown below.
| Frame from video | Silhouette | |
![]() |
Background subtraction ![]() |
![]() |
Silhouette matching |
![]() |
Apply pose ![]() |
![]() |
| Body model | Matched model |
The silhouette of the body model is matched to that extracted from the video by a process of simulated annealing. The matching occurs incrementally, starting with the root of the body model hierarchy (the rigid body position) and moving downwards to include movement of limbs and then movement of all flexible joints (including hands, feet and neck).
Currently the matching process is not optimised for speed and takes about 30 seconds per frame. I believe that it would be relatively easy to reduce this by at least a factor of ten, by use of more optimised algorithms.
There is currently a weak prior model on the pose. This helps to move the body into more reasonable positions in the cases when occlusion or the use of silhouettes leads to ambiguity (such as when an arm passes behind the main body).
A silhouette sequence was extracted from this video. The tracking system was used to find a corresponding model position for each silhouette. The resulting animation is below or can be downloaded (666K).
The model silhouette is in the left panel with the corresponding video silhouette in the right panel. Note that, because a dynamic model is not used, there is no constraint on smooth motion. This leads to ambiguity errors, such as when the legs pass by each other.