GIST (Gesture Interpretation using Spatio-Temporal analysis) project is
an attempt to recognize and interpret sign gestures of American Sign Language
from a video sequence based on an integrated method of motion segmentation,
shape, size and color. A multi-scale motion segmentation based on Ahuja's
New Transform is applied to a video sequence to get motion regions and
their correspondence across frames. Regions of interest, such as fingertip,
palm and elbow, are extracted from motion segmented images by formulating
and solving a constraint satisfaction problem. From these joints, pixel
trajectories are extracted. A spatio-temporal analysis based on time-delay
neural network is applied to classify these patterns. The ultimate goal
of GIST is to allow content-based video retrieval based on video clips
and better understanding of motion segmentation.
Summary:
Interpret Gestures of American Sign Language
Based Motion Segmentation, Shape, Size and Color
Better Understanding of Motion Segmentation
Content-Based Video Retrieval
Testbed: 206 Video Sequences of American Sign
Language Gesture
American Sign Language Demonstrations (Quicklime
Movies):
M.-H. Yang and N. Ahuja,
"Recognizing Hand Gestures Using Motion Trajectories",
In
Proceedings of
the 1999 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR 99), Fort Collins, June, 1999. [Abstract].
[Gzipped Postscript].
[PDF].
M.-H. Yang and N. Ahuja "Extraction and
Classification of Motion Patterns for Hand Gesture Recognition",
In Proceedings of the 1998 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR 98), Santa Barbara, June, 1998. [Abstract].
[Gzipped
Postscript].
M.-H. Yang and N. Ahuja "Extracting Gestural
Motion Trajectory", In Proceedings of the 1998 IEEE International
Conference on Automatic Face and Gesture Recognition (FG98), Nara,
Japan, April, 1998. [Abstract].
[Gzipped
Postscript].
Motion Segmentation
Ahuja's New Transform
Multi-Scale
Region Based
Example:
Feature Detection
A Constraint-Based Approach Based on
Motion
Spatial Relationship
Shape
Color
Size
Salient Features
Palm
Head
Example:
Motion Trajectory
Using Extracted Palms
Affine Transform
Example: Motion Trajectory of Fingertip Region(20
frames)
Spatio-Temporal Analysis
TDNN (Time-Delay Neural Network)
ART (Adaptive Resonance Theory)
HMM (Hidden Markov Model)
Content-Based Video Retrieval
Indexing Video Sequence Based on Spatio-Temporal
Analysis