Work Abstract
The recognition of activities performed by humans, in a non-intrusive and non-cooperative way, is a very relevant task in the development of Ambient Intelligence applications aimed at improving the quality of life by realizing digital environments that are adaptive, sensitive and reactive to the presence (or absence) of the users and to their behavior.
We propose an activity recognition approach based on the use of RGB-D cameras, and in particulare the Kinect sensor, for data acquisition. Angle information is used to encode the human body posture, i.e. the relative position of its different parts; such information is extracted from skeleton data (joint orientations). Our approach is evaluated on a well-known dataset (CAD-60 – Cornell Activity Dataset) for comparison with the state of the art; moreover, due to the lack of datasets including skeleton orientations, a new benchmark named OAD (Office Activity Dataset) has been internally acquired. The tests confirm the efficacy of the proposed model and its feasibility for scenarios of varying complexity.
Dataset
We release a new database of human activities that take place in a single environment (office) from several perspectives based on the action being performed.
This dataset contains 14 different activities: drinking, getting up, grabbing an object from a shelf, pour a drink, scrolling book pages, sitting, stacking items, take objects from a shelf, talking on the phone, throwing something in the bin, waving hand, wearing coat, working on computer, writing on paper.
Data was collected from 10 different subjects (five males and five females) aged between 20 and 35, one subject left-handed. The volunteers received only basic information (e.g. \pour yourself a drink”) in order to be as natural as possible while performing actions. Each subject performs each activity twice, therefore we have collected overall 280 sequences.
The device used for data acquisition is the Microsoft Kinect V2 whose SDK
allows to track 25 different joints (19 of which have their own orientation). For testing, we adopted a leave-one-out cross-validation with rotation of the test subject.
Data Format
Skeleton data consists of 25 joints. The first 18 joints of each frame have both orientation and position. The lasts 7 joints have only position.
Each row follows the following format:
Frame#,ORI(1),P(1),ORI(2),P(2),...,ORI(18),P(18),P(19),...,P(25)
Where:
- Frame# is an integer starting from 0
-
ORI(i) is a 3x3 matrix that represents the orientation of ith joint: 0 1 2 3 4 5 6 7 8 ORI(i) is stored as 0,1,2,3,4,5,6,7,8 and is followed by CONF.
-
P(i) is the position (x,y,z) of ith joint followed by CONF.
-
CONF is a confidence value (0, 1 or 2) according to the specifications defined in: https://msdn.microsoft.com/en-us/library/microsoft.kinect.jointtrackingstate.aspx
Joint number -> Joint Name
1 -> SpineBase 2 -> SpineMid 3 -> Neck 4 -> ShoulderLeft 5 -> ElbowLeft 6 -> WristLeft 7 -> HandLeft 8 -> ShoulderRight 9 -> ElbowRight 10 -> WristRight 11 -> HandRight 12 -> HipLeft 13 -> KneeLeft 14 -> AnkleLeft 15 -> HipRight 16 -> KneeRight 17 -> AnkleRight 18 -> SpineShoulder 19 -> Head 20 -> FootLeft 21 -> FootRight 22 -> HandTipLeft 23 -> ThumbLeft 24 -> HandTipRight 25 -> ThumbRight
If you use this dataset, please cite the following paper:
Franco A., Magnani A., Maio D. (2017) Joint Orientations from Skeleton Data for Human Activity Recognition. In: Battiato S., Gallo G., Schettini R., Stanco F. (eds) Image Analysis and Processing – ICIAP 2017. ICIAP 2017. Lecture Notes in Computer Science, vol 10484. Springer, Cham
Dataset download: CLICK HERE!
For questions, comments and suggestions, please contact
antonio.magnani@unibo.it