Header image  

Reserach on Wearable, Ubiquitous,
Entertainment Computing

   Last Updated 09/28/2015
to japanese

Project: Constructing Mobile Phone-Based Context Recognition Systems for Real World Use

If you want PDF files, please contact me.



This project solves the following problems that could occur when context recognition systems is used in real world. Without solving these problems, context-aware applications are exclusive in lab environment and would not widely dominate in real environment.

  1. In general, a recognition model is constructed with data and ground truth on the developer side, then an instruction manual is made expressing the gestures in text, figures, or movies. An end-user reads the instructions, imagines the gesture, then perform it. However, the end-user may imagine a different gesture to the one the developers assume. We investigate how user gestures change according to the descriptions of the instructions.
  2. A lot of interactions with objects occur in daily life, however, the effect of the differences in the motion of a gesture has not been examined. We investigated the effects on recognition accuracy of changing the number and kinds of gestures.
  3. Mobile gesture recognition systems generally require several samples of training data before recognition takes place, however, recognition accuracy deteriorates as time passes since the trajectory of the gestures change due to fatigue or forgetfulness. We propose a method of finding appropriate data for training.


As a technical breakthrough for 3) in the above section, we proposed two methods of finding the appropriate point when data are collected for training in real time. One of the methods calculates the average and variance of distance between current data and adjacent 10 samples. The method determines that the gesture motion has converged and stops collecting training data when the mean and variance have converged. The right table shows average distance between the extracted training data and testing data for seven daystest. Small distance means that the extracted training data has a good performance. Comparison 1 uses the first five samples on the first day for training as conventional methods do. Comparison 2 uses the data for training at the point where the gesture converges in batch processing. Comparison 3 uses the data for the first five samples and updates the training data every day, which would perform well but forces the user into harder tasks. The parenthetic values indicate the number of training data obtained until each method stopped collecting data. For the table, the proposed method performed better than comparison 1, which is a conventional method. Comparing with comparison 2, the distance is lengthened, but the number of training data shown as the parenthetic values is smaller, moreover the distance is close to that of comparison 3. Summarizing the results, our method found better training data from view point of recognition performance and the number of training data collected.

As an innovative application for 1), We have implemented an application that was enhanced with gesture recognition technology: gesture-based browser shown in the figure. This browser uses several of the gestures used in this project. For example, tilt action involves page-up/down, swing action invokes go back/forward, rap action invokes stop loading/reload, shift action invokes scroll/zoom, and drawing a circle action invokes add bookmark. We do not think all the functions should be assigned to gestures. However, assigning many functions to the limited number of buttons forces users to select functions frofigure1m a pop-up menu after pressing a “menu” button. A first level has six to eight choices at most, forcing users to a select “more” button to go to the next pop-up to select other functions. Such functions can be invoked with a single action by using a gesture. Recent smartphones have a 5-inch display; for example, the GALAXY S III by SAMSUNG has a 4.8-inch display and the ONE X by HTC has a 4.7-inch display. Users sometimes use both hands to touch the display since the thumb usually does not reach the far side of the display. Even in this case, gesture-based interaction can be done with one hand (except for tapping and knocking) and does not interrupt other activities, such as eating and writing.
figure2From the experimental investigation for 2), we have confirmed that developers and end-users communicated worse with ambiguous instructions as shown in the right figure, however, the specific word that has several meanings depending on the situation make the instruction troublesome. For example, instructing throwing gesture, the subjects who originally threw overhand changed to underhand with word “ball”. Similar situation occurred for word “door” since there are several types of doors such as with or without door knob and door handle.


Data set

In this project, we open the dataset we gathered. This data include 3-axis acceleration data and 3-axis gyro sensor data for 27 types of gestures by 7 people. Concrete types of gestures are:


You can download the raw data here.


In addition, we provide the gesture data used in the paper entitled "Determining a Number of Training Data for Gesture Recognition Considering Decay in Gesture Movements." This data contains CSV data on (1)5 participants x 2 types of gestures x 200 times x 7 days (2) 5 participants x 10 types of gestures x 100 times.

You can download the raw data here.


Our Publications related to this project

  1. Kazuya MURAO, Tsutomu TERADA, Ai YANO, and Ryuichi MATSUKURA, ``Evaluating Gesture Recognition by Multiple-Sensor-Containing Mobile Devices,'' Proc. of the 15th International Symposium on Wearable Computers (ISWC '11), pp. 55--58 (June 2011).
  2. Kazuya MURAO and Tsutomu TERADA, ``Evaluating Effect of Concreteness in Instructions for Gesture Recognition,'' Proc. of the 15th International Symposium on Wearable Computers (ISWC '11), pp. 121--122 (June 2011).
  3. Tetsuya YAMAMOTO, Tsutomu TERADA, and Masahiko TSUKAMOTO, ``Designing Gestures for Hands and Feet in Daily Life,'' Proc. of the International Symposium on Emerging Research Projects, Applications and Services (ERPAS 2011), pp. 285--288 (Dec. 2011).
  4. Nobuo KAWAGUCHI, Hodaka WATANABE, Tianhui YANG, Nobuhiro OGAWA, Yohei IWASAKI, Katsuhiko KAJI, Tsutomu TERADA, Kazuya MURAO, Hisakazu HADA, Sozo INOUE, Yasuyuki SUMI, Yoshihiro KAWAHARA, Nobuhiko NISHIO, ``HASC2012corpus: Large Scale Human Activity Corpus and Its Application,'' Proc. of the 2nd International Workshop of Mobile Sensing: From Smartphones and Wearables to Big Data, pp. 10--14 (Feb. 2012).
  5. Kazuya MURAO, Tsutomu TERADA, Ai YANO, and Ryuichi MATSUKURA, ``Evaluation Study on Sensor Placement and Gesture Selection for Mobile Devices,'' Proc. of the 11th International Conference on Mobile and Ubiquitous Multimedia (MUM 2012), No. 7, pp. 1--8 (Dec. 2012). PDF
  6. Gaku YOSHIDA, Kazuya MURAO, Tsutomu TERADA, and Masahiko TSUKAMOTO, ``Method of Determining Training Data for Gesture Recognition considering Decay of Gesture Movements,'' Proc. of the 10th IEEE Workshop on Context Modeling and Reasoning 2013 (CoMoRea 2013), pp. 14--19 (Mar. 2013). PDF


This project is supported by Microsoft Research Asia Windows Phone Academic Program (2012).