BodySLAM™ is a high-performance deep learning runtime engine for human pose estimation that analyzes regular 2D RGB video and returns 3D motion data in real time. It can process multiple 2D camera feeds and recognize human motion, gestures, and activity. Capable of tracking 63 distinct body parts, BodySLAM enables developers to create local solutions in a wide range of industries, such as robotics, health care, retail, transportation, entertainment, and more.

Human-Machine Interaction Diagram

2D Video

With their existing consumer hardware, such as smart phones, tablets, and cameras, users record video from one camera or from multiple cameras to be processed by BodySLAM in real time. A series of convolutional neural networks (CNN) analyzes 2D video, tracking up to 63 body parts, including each finger and toe, and returns 3D human motion data for use in multiple applications.


BodySLAM automatically separates the people from the background, allowing for the implementation of synthetic environments without the use of a physical green screen.

2D Skeletons

Our powerful 2D CNN creates a 2D skeleton on top of the people in the video, despite possibly adverse lighting conditions, various clothing types, body shapes, and camera viewpoints. It can track and recognize individuals alone or in a group as well as multiple individuals in a large group. The 2D skeleton serves as the landmark from which the 3D pose is extracted and filled out.

3D Skeletons

After analysis by the 3D CNN, BodySLAM returns a 3D skeleton to the user for further instructions.

Human Tracking

BodySLAM assigns individuals unique tracking IDs, enabling users to identify and track people in 3D space over time.

Activity Recognition

A deep learning system, BodySLAM recognizes and understands human activity, such as walking, sitting, falling, or picking up an object, enabling it to make decisions based on that information.

Gesture Recognition

At present, smart assistants on phones and tablets can recognize voice commands only. BodySLAM enables gesture recognition, meaning smart assistants can now follow non-verbal commands, such as hand gestures and pointing, as well.


For information about our Denoiser SDK, have a look at our product brief.