Skill324 repo starsupdated 6d ago

robot-perception

The robot-perception skill provides configuration, calibration, and processing pipelines for robot sensors including cameras, LiDAR, and depth sensors. It covers camera intrinsic and extrinsic calibration, RGB and point cloud processing, object detection and tracking, multi-sensor fusion and synchronization, model deployment on robot hardware, and debugging of perception failures like latency and frame drops.

View source Repository: robotics-agent-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/arpitg1304/robotics-agent-skills /tmp/robot-perception && cp -r /tmp/robot-perception/skills/robot-perception ~/.claude/skills/robot-perception

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Robot Perception Skill

## When to Use This Skill
- Setting up and configuring camera, LiDAR, or depth sensors
- Building RGB, depth, or point cloud processing pipelines
- Calibrating cameras (intrinsic, extrinsic, hand-eye)
- Implementing object detection, segmentation, or tracking for robots
- Fusing data from multiple sensor modalities
- Streaming sensor data with proper threading and buffering
- Synchronizing multi-sensor rigs
- Deploying perception models on robot hardware (GPU, edge)
- Debugging perception failures (latency, dropped frames, misalignment)

## Sensor Landscape

### Sensor Types and Characteristics

```
Sensor Type        Output              Range       Rate     Best For
─────────────────────────────────────────────────────────────────────────
RGB Camera         (H,W,3) uint8       ∞           30-120Hz Object detection, tracking, visual servoing
Stereo Camera      (H,W,3)+(H,W,3)    0.3-20m     30-90Hz  Dense depth from passive stereo
Structured Light   (H,W) float + RGB   0.2-10m     30Hz     Indoor manipulation, short range
ToF Depth          (H,W) float + RGB   0.1-10m     30Hz     Indoor, medium range
LiDAR (spinning)   (N,3) or (N,4)     0.5-200m    10-20Hz  Outdoor navigation, mapping
LiDAR (solid-st.)  (N,3)              0.5-200m    10-30Hz  Automotive, outdoor
IMU                (6,) or (9,)        N/A         200-1kHz Orientation, motion estimation
Force/Torque       (6,) float          N/A         1kHz+    Contact detection, force control
Tactile            (H,W) or (N,3)      Contact     30-100Hz Grasp quality, texture
Event Camera       Events (x,y,t,p)    ∞           μs       High-speed tracking, HDR scenes
```

### Common Sensor Hardware

```
Device             Type               SDK/Driver           ROS2 Package
──────────────────────────────────────────────────────────────────────────
Intel RealSense    Structured Light   pyrealsense2         realsense2_camera
Stereolabs ZED     Stereo + IMU       pyzed                zed_wrapper
Luxonis OAK-D      Stereo + Neural    depthai              depthai_ros
FLIR/Basler        Industrial RGB     PySpin/pypylon       spinnaker_camera_driver
Velodyne           Spinning LiDAR     velodyne_driver      velodyne
Ouster             Spinning LiDAR     ouster-sdk           ros2_ouster
Livox              Solid-state LiDAR  livox_sdk            livox_ros2_driver
USB Webcam         RGB                OpenCV VideoCapture  usb_cam / v4l2_camera
```

## Camera Models and Calibration

### Pinhole Camera Model

```
                    3D World Point (X, Y, Z)
                           |
                    [R | t] — Extrinsic (world → camera)
                           |
                    Camera Point (Xc, Yc, Zc)
                           |
                    K — Intrinsic (camera → pixel)
                           |
                    Pixel (u, v)

K = [ fx   0   cx ]      fx, fy = focal lengths (pixels)
    [  0  fy   cy ]      cx, cy = principal point
    [  0   0    1 ]

Projection:  [u, v, 1]^T = K @ [R | t] @ [X, Y, Z, 1]^T
```

### Intrinsic Calibration

```python
import cv2
import numpy as np
from pathlib import Path

class IntrinsicCalibrator:
    """Camera intrinsic calibration using checkerboard pattern"""

    def __init__(self, board_size=(9, 6), square_size_m=0.025):
        self.board_size = board_size
        self.square_size = square_size_m

        # Prepare object points (3D coordinates of checkerboard corners)
        self.objp = np.zeros((board_size[0] * board_size[1], 3), np.float32)
        self.objp[:, :2] = np.mgrid[
            0:board_size[0], 0:board_size[1]
        ].T.reshape(-1, 2) * square_size_m

    def collect_calibration_images(self, camera, num_images=30,
                                    min_coverage=0.6):
        """Collect calibration images with good spatial coverage.

        IMPORTANT: Move the board to cover all regions of the image,
        including corners and edges. Tilt the board at various angles.
        Bad coverage = bad calibration, especially at image edges.
        """
        obj_points = []
        img_points = []
        coverage_map = np.zeros((4, 4), dtype=int)  # Track board positions

        while len(obj_points) < num_images:
            frame = camera.capture()
            gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

            found, corners = cv2.findChessboardCorners(
                gray, self.board_size,
                cv2.CALIB_CB_ADAPTIVE_THRESH |
                cv2.CALIB_CB_NORMALIZE_IMAGE |
                cv2.CALIB_CB_FAST_CHECK
            )

            if found:
                # Sub-pixel refinement — critical for accuracy
                criteria = (cv2.TERM_CRITERIA_EPS +
                           cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
                corners = cv2.cornerSubPix(
                    gray, corners, (11, 11), (-1, -1), criteria)

                # Track coverage
                center = corners.mean(axis=0).flatten()
                grid_x = int(center[0] / gray.shape[1] * 4)
                grid_y = int(center[1] / gray.shape[0] * 4)
                grid_x = min(grid_x, 3)
                grid_y = min(grid_y, 3)
                coverage_map[grid_y, grid_x] += 1

                obj_points.append(self.objp)
                img_points.append(corners)

        coverage = (coverage_map > 0).sum() / coverage_map.size
        if coverage < min_coverage:
            print(f"WARNING: Only {coverage:.0%} coverage. "
                  f"Move board to uncovered regions.")

        return obj_points, img_points, gray.shape[::-1]

    def calibrate(self, obj_points, img_points, image_size):
        """Run calibration and return camera matrix + distortion coeffs"""
        ret, K, dist, rvecs, tvecs = cv2.calibrateCamera(
            obj_points, img_points, image_size, None, None)

        if ret > 1.0:
            print(f"WARNING: High reprojection error ({ret:.3f} px). "
                  f"Check image quality and board dete