Geospatial Annotation with LabelMe and Segment Anything

In this episode I sat down with Kentaro Wada, a computer vision engineer at Mujin and creator of LabelMe, to explore the evolution of image annotation workflows. We discuss how his need to label data for a robotics challenge led to building one of the most widely used open-source annotation tools, and how it has evolved alongside the shift from traditional computer vision to deep learning. Kentaro explains the impact of foundation models like Segment Anything (SAM), and how annotation is rapidly moving toward a prompt-and-verify paradigm where models do the heavy lifting and humans focus on quality control. We also dive into his recent work integrating SAM into LabelMe, the challenges of applying these models to satellite imagery, and why approaches like bounding-box prompting outperform text in that domain. Finally, we cover new support for large, multi-channel geospatial data, practical deployment considerations, and what this means for scaling annotation in real-world machine learning systems.

* https://labelme.io/
* https://www.wkentaro.com/

Bio: Kentaro Wada was born in Japan in 1994. He received his B.Sc. (2016) and M.Sc. (2018) from Mechanical Engineering and Computer Science Department in The University of Tokyo (UTokyo). In his research at UTokyo, he was working on learning-based scene understanding for robotic manipulation at JSK Laboratory supervised by Prof. Masayuki Inaba and Prof. Kei Okada. He received his PhD in 2022, at Dyson Robotics Laboratory in Imperial College London supervised by Prof. Andrew Davison. During his PhD, he worked on object-level semantic scene understanding, a general scene representation useful for robotic manipulation, and showed several novel capabilities of robots. He joined Mujin, Inc. in 2022 as a computer vision engineer, and is working on advancing robots' capabilities in the real-world environment.

🚀 TIMELINE
0:00 – Kentaro is a CV engineer building vision systems for industrial automation that rely heavily on labeled data.
0:56 – He created LabelMe, an image annotation tool, originally out of necessity during a robotics challenge at university.
2:03 – This was during the shift from traditional CV to deep learning, increasing demand for annotation tools.
2:29 – LabelMe was open-sourced and remains actively developed and used.
3:24 – Annotation is shifting from manual work to AI-assisted methods using models like SAM.
4:25 – SAM enables click, box, and text-prompt-based segmentation; newer versions expand this significantly.
5:07 – Workflow shift: models generate annotations, humans verify and refine.
6:01 – AI-assisted annotation is becoming mainstream.
6:33 – SAM 3 allows one annotation to scale across many similar objects, enabling large-scale labeling.
7:21 – Geospatial support emerged after enabling large image handling.
7:32 – Satellite imagery posed challenges due to size and domain differences.
8:24 – Text prompts perform poorly on satellite data; bounding boxes work better.
8:50 – Large image support unlocked satellite annotation use cases.
9:34 – Added support for TIFF and multi-channel imagery.
10:12 – LabelMe is a Python-based desktop app with both CLI install and standalone executable.
10:50 – Open-source version is free; packaged app is paid with a trial.
11:18 – Ends with plans to demo geospatial features.

Видео Geospatial Annotation with LabelMe and Segment Anything канала Robin Cole

Комментарии отсутствуют