ICCV 2023 HoloAssist: an egocentric human interaction dataset for interactive ai assistants in the real world

The codebase provides guidelines for using the HoloAssist dataset and running the benchmarks.

[Project Website][paper]

Download the data and annotations

We release the dataset under the [CDLAv2] license, a permissive license.

Dataset Structure

Once the dataset is downloaded and decompressed. You will see the dataset structure as follows. Each folder contains data for one recording session. Within each folder, you will see the data for different modalities. The text files with "_synced" in the names are synced according to the RGB modality as each modality has different sensor rate and we use the synced modalities in the experiments.

We collected our dataset using PSI studio. More detailed information regarding the data format is in here.

  .
  ├── R007-7July-DSLR
  │   └── Export_py
  │       │── AhatDepth
  │       │   ├── 000000.png
  │       │   ├── 000001.png
  │       │   ├── ...
  │       │   ├── AhatDepth_synced.txt
  │       │   ├── Instrinsics.txt
  │       │   ├── Pose_sync.txt
  │       │   └── Timing_sync.txt
  │       ├── Eyes
  │       │   └── Eyes_sync.txt 
  │       ├── Hands
  │       │   ├── Left_sync.txt
  │       │   └── Right_sync.txt 
  │       ├── Head
  │       │   └── Head_sync.txt 
  │       ├── IMU
  │       │   ├── Accelerometer_sync.txt
  │       │   ├── Gyroscope_sync.txt
  │       │   └── Magnetometer_sync.txt
  │       ├── Video
  │       │   ├── Pose_sync.txt
  │       │   ├── Instrinsincs.txt
  │       │   └── VideoMp4Timing.txt
  │       ├── Video_pitchshift.mp4
  │       └── Video_compress.mp4
  ├── R012-7July-Nespresso/
  ├── R013-7July-Nespresso/
  ├── R014-7July-DSLR/
  └── ...

Annotation Structure

We have released both the annotations in the raw format and the processed format. We also provide the train, validation and test splits.

In the raw annotations, each annotation follows

{
    "id": int, original label id,
    "label": "Narration", "Conversation", "Fine grained action",  or "Coarse grained action", 
    "start": start time in seconds, 
    "end": end time in seconds, 
    "type":"range",
    "attributes":{
        Different from different label task. See below.
    },
},

Attributes for Narration

    "id": int, original label id,
    "label": "Narration",  
    "start": start time in seconds, 
    "end": end time in seconds, 
    "type":"range",
    "attributes": {
        "Long-form description": Use multiple sentences and make this as long as is necessary to be exhaustive. There are a finite number of scenarios across all videos, so make sure to call out the distinctive changes between videos, in particular, mistakes that the task performer makes in the learning process that are either self-corrected or corrected by the instructor.
    }, 

Attributes for Conversation

    "id": int, original label id,
    "label": "Narration",  
    "start": start time in seconds, 
    "end": end time in seconds, 
    "type":"range",
    "attributes": {
        "Conversation Purpose":"instructor-start-conversation_other",
        "Transcription":"*unintelligible*",
        "Transcription Confidence":"low-confidence-transcription",
    }, 

Attributes for Fine grained action

    "id": int, original label id,
    "label": "Fine grained action",  
    "start": start time in seconds, 
    "end": end time in seconds, 
    "type":"range",
    "attributes": {
        "Action Correctness":"Correct Action",
        "Incorrect Action Explanation":"none",
        "Incorrect Action Corrected by":"none",
        "Verb":"approach",
        "Adjective":"none",
        "Noun":"gopro",
        "adverbial":"none"
    }, 

Attributes for Coarse grained action

    "id": int, original label id,
    "label": "Coarse grained action",  
    "start": start time in seconds, 
    "end": end time in seconds, 
    "type":"range",
    "attributes": {
        "Action sentence":"The student changes the battery for the GoPro.",
        "Verb":"exchange",
        "Adjective":"none",
        "Noun":"battery"
    }, 

Citation

If you find the code or data useful. Please consider cite the paper at

@inproceedings{wang2023holoassist,
  title={Holoassist: an egocentric human interaction dataset for interactive ai assistants in the real world},
  author={Wang, Xin and Kwon, Taein and Rad, Mahdi and Pan, Bowen and Chakraborty, Ishani and Andrist, Sean and Bohus, Dan and Feniello, Ashley and Tekin, Bugra and Frujeri, Felipe Vieira and others},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={20270--20281},
  year={2023}
}