Key Design Concepts
Core Data Model
DUSTrack organizes point tracking data using three hierarchical concepts: Annotations, Labels, and Layers.
Annotations
An annotation is a single data point consisting of a frame number (integer) and a 2D pixel location (x, y coordinates as floats).
Example: 5: [120.3, 240.7] means at frame 5, the point is located at pixel coordinates (120.3, 240.7).
Annotations are the atomic units of tracking data—each represents one observation of a point at one moment in time.
Labels
A label is the identifier for one anatomical point being tracked throughout the video. Each label contains a collection of annotations across different frames. Labels are numeric strings for keyboard efficiency: "0", "1", "2", …, "9": press number keys to switch between labels during annotation. When tracking more than 10 points, access extended ranges ("10"-"19", "20"-"29", etc.) accessed via Shift+, and Shift+.
While numbers are used to for efficiency of manual annotations, we recommend creating a separate .txt file containing Human-readable names for each anatomical landmark. Create a dlc_trackermap.txt file to map numeric labels to anatomical names:
point0 - muscle_boundary
point1 - fascia
point2 - bone_surface
Data structure: Each label stores a dictionary mapping frame numbers to pixel coordinates, representing the pixel location of the same anatomical landmark in three different frames:
"0": {5: [120.3, 240.7], 10: [122.1, 241.5], 15: [123.8, 242.1]} # label "0": 3 annotations
Layers
A layer is a complete annotation set stored in one file. Each layer contains multiple labels (typically tracking different anatomical landmarks).
Complete example:
layer.data = {
"0": {5: [120.3, 240.7], 10: [122.1, 241.5], 15: [123.8, 242.1]}, # label "0" with 3 annotations
"1": {5: [130.5, 250.2], 10: [131.8, 251.0]}, # label "1" with 2 annotations
}
In this structure:
Layer: The entire
layer.datadictionary plus metadata (video path, etc.)Labels:
"0"and"1"(two different anatomical points)Annotations: Individual frame-coordinate pairs like
5: [120.3, 240.7]
Understanding Layers and File Naming
Layers in DUSTrack represent individual annotation sessions or annotators. Each layer corresponds to a single file and is managed internally by the VideoAnnotation class. The layer naming convention typically uses the annotator’s initials.
When you save your annotations using the s keyboard shortcut, DUSTrack generates a JSON file following the pattern: {video_name}_annotations_{layer_name}.json.
DeepLabCut (DLC) uses the .h5 file format. DUSTrack’s annotations are formatted into a .h5 file when preparing this data to train a ResNet model using DLC. DLC’s model predictions are also in .h5 formatand can be directly loaded into DUSTrack (see Working with Layers).
Tip: Ese the VideoAnnotation class independently for analysis scripts to read annotation stored in .json or .h5 files.
Why multiple layers?
DUSTrack supports simultaneous layers to enable:
Iterative Refinement Workflow
Layer
"manual": Initial manual annotations on sparse framesLayer
"dlc_iter1": First DLC model predictionsLayer
"dlc_iter2": Refined model after correcting mislabeled framesLayer
"lkmovavg_0.500": Post-processed results with jitter reduction
Comparison Between Sources
Compare manual ground truth vs. automated predictions
Evaluate different post-processing window sizes
Visually assess inter-rater reliability between annotators
Non-destructive Editing
Keep original annotations while experimenting with refinements
Roll back to previous versions if needed
Current Layer vs. Overlay Layer
The GUI displays two layers simultaneously:
Current Layer (opaque markers): The layer you’re actively editing. Press
-/=to cycle through layers.Overlay Layer (translucent trace): A reference layer for comparison. Press
[/]to cycle through overlays or set toNone.
Common patterns:
Manual refinement: Current =
"manual", Overlay ="dlc_iter1"(copy good predictions withckey)Quality assessment: Current =
"lkmovavg_0.500", Overlay ="dlc_iter1"(compare smoothed vs. raw)
Working with Layers
Creating layers:
import dustrack
# Initialize with single layer
tracker = dustrack.open('video.mp4', "manual")
# If video_annotations_manual.json file exists in the same folder as video.mp4,
# then annotations from this file will be loaded into the "manual" layer.
# Otherwise, an "empty" layer will be created.
# Initialize with multiple layers
tracker = dustrack.open('video.mp4', ["manual", "dlc_iter1"])
# Initialize with specific file paths
tracker = dustrack.open('video.mp4', {
'manual': 'path/to/manual_annotations.json',
'predictions': 'path/to/dlc_predictions.h5'
})
Copying between layers (keyboard shortcuts):
c: Copy current label’s annotation from overlay to current layer (current frame only)Alt+C: Copy annotations at marked frames of interest from overlayCtrl+Alt+C: Copy annotations in selected interval from overlay
The Buffer Layer: Every DUSTrack session includes a special "buffer" layer for temporary storage and experimentation. It serves as a scratch space.
Derived / dense layers: a few layers are produced from other layers by built-in operations and are rendered as continuous lines (not dots) by default:
dlc_iteration-N_M— DLC inference for iterationNat stepM. Produced by Train DLC model; added live to the session after training completes.dlccorr— Applied-manual-corrections layer. Produced by Apply manual corrections: the active sparse manual layer’s edits are spliced into the current DLC overlay’s per-frame trace, yielding a dense annotation. Excluded from DLC training input.lkmovavg_<window>— Lucas-Kanade RSTC jitter-reduction output. Produced by Reduce jitter on a dense source layer (typical input:dlccorrafter corrections).
The line-vs-dot rendering picks itself: any layer whose name starts with dlc_, equals dlccorr, or contains lkmovavg renders as a line; everything else (manual annotations) renders as dots.
Video Frame Indexing
DUSTrack uses zero-based frame indexing following Python conventions:
First frame:
frame 0For a 100-frame video: frames
0to99
Frame numbers in the GUI:
The current frame number appears in the interface state panel
Click on trajectory plots (right-click) to jump to specific frames
Frame markers show your current position in the x and y trajectory plots
Temporal navigation: DUSTrack treats videos as sequences of discrete frames. When you:
Annotate frame 10, then frame 50, you’re creating a sparse annotation
Run optical flow interpolation, it fills frames 11-49 based on the boundary conditions
Load DLC predictions, you typically get dense annotations (one prediction per frame)
Frames of Interest
When working with videos, you often need to focus on (e.g. jump back and forth between) specific frames rather than sequentially reviewing all frames. DUSTrack’s “frames of interest” feature lets you mark and rapidly navigate between important frames.
What are Frames of Interest?
Frames of interest are user-marked frames that deserve special attention. Common use cases:
Evaluating model predictions: Mark frames where DLC predictions look questionable
Assessing annotation consistency: Mark frames to compare across multiple layers
Sparse manual annotation: Mark candidate frames for labeling before running optical flow interpolation
Quality control: Mark frames for systematic review during iterative refinement
Workflow Example
To review DLC predictions:
Set current layer to
"manual", overlay to"dlc_iter1"Step through video and press
mon frames with poor predictionsUse
Alt+,/Alt+.to cycle between marked framesPress
cto copy good predictions from overlay, or manually correct bad ones
Without frames of interest, reviewing predictions in a 1000-frame video means clicking through linearly or manually noting frame numbers. With frames of interest, you create a focused review queue that streamlines quality assessment and refinement.
Event Intervals for Optical Flow Interpolation
DUSTrack provides an interval selection system that works seamlessly with the overlay layer concept to enable optical flow interpolation between sparse annotations.
What are Event Intervals?
An event interval is a range of frames defined by a start frame and end frame. This interval specifies where optical flow algorithms should interpolate point positions based on boundary conditions from the overlay layer.
How It Works
The typical workflow combines three elements:
Overlay layer containing sparse annotations (e.g., manual labels at frames 10 and 50)
Event interval marking the range to interpolate (frames 10-50)
Optical flow interpolation filling in frames 11-49 using Lucas-Kanade RSTC
The z-z-a Pattern
The most common operation is the z, z, a sequence:
First
z: Hover your mouse over the trajectory plot at the start frame and presszto mark interval startSecond
z: Hover over the end frame and presszto mark interval endA gray shaded region appears showing the selected interval
Press
a: Triggers Lucas-Kanade RSTC interpolation for the current labelUses overlay layer positions at start/end frames as boundary conditions
Fills intermediate frames in the current layer
Example workflow:
import dustrack
# Setup: manual layer has labels at frames 10 and 50
tracker = dustrack.open('video.mp4', "manual")
# In GUI:
# 1. Set overlay to "manual" (to use as reference)
# 2. Create new layer for interpolated results
# 3. Hover over trajectory plot at x=10, press 'z'
# 4. Hover over trajectory plot at x=50, press 'z'
# 5. Press 'a' to interpolate current label
# 6. Result: frames 11-49 now have smoothly interpolated positions
Overlay Layer as Boundary Conditions
The event interval system is designed to work with the overlay layer:
The overlay layer provides the source annotations (start and end points)
The current layer receives the interpolated results
This allows non-destructive interpolation: original annotations remain unchanged
Common pattern: Use manual annotations as overlay, create a new layer for interpolated results:
Current layer:
"manual_interpolated"(empty or partially filled)Overlay layer:
"manual"(sparse annotations)Select interval spanning two manual annotations
Press
ato fill intermediate frames in current layer
Interpolating Multiple Labels
Alt+A: Interpolate all labels in the selected interval (not just current label)Useful when you’ve manually labeled multiple points at the interval boundaries
Design Rationale
Without event intervals, you’d need to:
Manually annotate every frame (tedious for 1000+ frame videos)
Write custom scripts to specify frame ranges
Risk overwriting original annotations during interpolation
With event intervals + overlay system:
Visually select ranges on trajectory plots
Preserve original data in overlay layer
Rapidly fill sparse annotations with optical flow