Activity Recognition

Work Abstract

The recognition of activities performed by humans, in a non-intrusive and non-cooperative way, is a very relevant task in the development of Ambient Intelligence applications aimed at improving the quality of life by realizing digital environments that are adaptive, sensitive and reactive to the presence (or absence) of the users and to their behavior.

We propose an activity recognition approach  based on the use of RGB-D cameras, and in particulare the Kinect sensor, for data acquisition. Angle information is used to encode the human body posture, i.e. the relative position of its different parts; such information is extracted from skeleton data (joint orientations). Our approach is evaluated on a well-known dataset (CAD-60 РCornell Activity Dataset) for comparison with the state of the art; moreover, due to the lack of datasets including skeleton orientations, a new benchmark named OAD (Office Activity Dataset) has been internally acquired. The tests confi rm the efficacy of the proposed model and its feasibility for scenarios of varying complexity.


We release a new database of human activities that take place in a single environment (office) from several perspectives based on the action being performed.

This dataset contains 14 different activities: drinking, getting up, grabbing an object from a shelf, pour a drink, scrolling book pages, sitting, stacking items, take objects from a shelf, talking on the phone, throwing something in the bin, waving hand, wearing coat, working on computer, writing on paper.
Data was collected from 10 different subjects (five males and five females) aged between 20 and 35, one subject left-handed. The volunteers received only basic information (e.g. \pour yourself a drink”) in order to be as natural as possible while performing actions. Each subject performs each activity twice, therefore we have collected overall 280 sequences.

The device used for data acquisition is the Microsoft Kinect V2 whose SDK
allows to track 25 different joints (19 of which have their own orientation). For testing, we adopted a leave-one-out cross-validation with rotation of the test subject.

Data Format

Skeleton data consists of 25 joints. The first 18 joints of each frame have both orientation and position. The lasts 7 joints have only position.

Each row follows the following format:



  • Frame# is an integer starting from 0
  • ORI(i) is a 3x3 matrix that represents the orientation of ith joint:
    0 1 2
    3 4 5
    6 7 8
    ORI(i) is stored as 0,1,2,3,4,5,6,7,8 and is followed by CONF.
  • P(i) is the position (x,y,z) of ith joint followed by CONF.
  • CONF is a confidence value (0, 1 or 2) according to the specifications defined in:

Joint number -> Joint Name

1 -> SpineBase
2 -> SpineMid
3 -> Neck
4 -> ShoulderLeft
5 -> ElbowLeft
6 -> WristLeft
7 -> HandLeft
8 -> ShoulderRight
9 -> ElbowRight
10 -> WristRight
11 -> HandRight
12 -> HipLeft
13 -> KneeLeft
14 -> AnkleLeft
15 -> HipRight
16 -> KneeRight
17 -> AnkleRight
18 -> SpineShoulder
19 -> Head
20 -> FootLeft
21 -> FootRight
22 -> HandTipLeft
23 -> ThumbLeft
24 -> HandTipRight
25 -> ThumbRight

If you use this dataset, please cite the following paper:
Franco A., Magnani A., Maio D. (2017) Joint Orientations from Skeleton Data for Human Activity Recognition. In: Battiato S., Gallo G., Schettini R., Stanco F. (eds) Image Analysis and Processing – ICIAP 2017. ICIAP 2017. Lecture Notes in Computer Science, vol 10484. Springer, Cham


Dataset download: CLICK HERE!

For questions, comments and suggestions, please contact