Watch Your Mouth: Silent Speech Recognition with Depth Sensing

Silent speech recognition is a promising technology that decodes human speech without requiring audio signals, enabling private human-computer interactions. In this paper, we propose Watch Your Mouth, a novel method that leverages depth sensing to enable accurate silent speech recognition. By leveraging depth information, our method provides unique resilience against environmental factors such as variations in lighting and device orientations, while further addressing privacy concerns by eliminating the need for sensitive RGB data. We started by building a deep-learning model that locates lips using depth data. We then designed a deep learning pipeline to efficiently learn from point clouds and translate lip movements into commands and sentences. We evaluated our technique and found it effective across diverse sensor locations: On-Head, On-Wrist, and In-Environment. Watch Your Mouth outperformed the state-of-the-art RGB-based method, demonstrating its potential as an accurate and reliable input technique.

System Overview

Data Collection

System Evaluation

Implementation

STEP1: Create Environment

    conda create -n <YourEnvName> python=3.8
pip install tqdm textblob editdistance einops

To ensure compatibility between PyTorch and CUDA, you should first determine the version of CUDA installed on your system. You can do this by running the following command in your terminal:

    nvcc --version

Once you have your CUDA version, visit the PyTorch official installation guide to find the PyTorch version that corresponds to your CUDA version, for example:

    conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

STEP2: Clone Repository

    git clone https://github.com/hilab-open-source/WatchYourMouth.git

    cd WatchYourMouth/modules
pip install .

STEP3: Download the Dataset

STEP4: Start Training

Update the dataset path to the location where your dataset is stored in this line, and start training with the following code:

    cd WatchYourMouth/
python train_sentences.py

Citation

    @inproceedings{wang2024watch,
  title={Watch Your Mouth: Silent Speech Recognition with Depth Sensing},
  author={Wang, Xue and Su, Zixiong and Rekimoto, Jun and Zhang, Yang},
  booktitle={Proceedings of the CHI Conference on Human Factors in Computing Systems},
  pages={1--15},
  year={2024}
}

Abstract