What I completed in Milestone 2
This milestone focuses on building a reproducible dataset + labeling pipeline for my capstone project: eye image clarity assessment on RV1126 (RKNN runtime).
The goal is not “training the best model yet”, but making sure the entire data pipeline is solid, auditable, and ready for deployment-oriented iteration.
Dataset choice: MPIIGaze (subset)
I chose MPIIGaze because it provides large-scale, in-the-wild eye-region images with diverse illumination and head poses.
To keep iteration fast and the repo lightweight, I sampled a subset:
- Participants: p00–p04
- Raw images: 10,000
Raw data stays local (not committed to Git).
Labeling strategy: 5-level ordinal clarity + normalized score
MPIIGaze does not contain blur/clarity labels, so I generated labels via a synthetic degradation strategy:
- 5-level ordinal labels: 0..4 (very_blurry → very_sharp)
- During inference, the model can output a normalized clarity score in [0,1]
- For labels:
score = label_bin / 4 - For softmax outputs later:
score = E[class] / 4
This gives me both:
- stable supervision for training (classification)
- a continuous score for UI/thresholding later (normalized score)
Pipeline outputs (sanity-checked)
After preprocessing + synthetic blur augmentation:
- processed items: 50,000
- unique relpaths: 50,000 (no overwrite/collision)
- split rule: session_id = pXX/dayYY (prevents leakage)
- split sizes:
- train: 40,235
- val: 5,270
- test: 4,495
Also produced:
calib_list.txt(relative paths) for RKNN INT8 calibrationsplits.jsonfor reproducibility
Important bug fix
I previously discovered a critical issue: output filenames could collide across sessions, causing processed images to be overwritten silently.
The fix is to include session_id (pXX/dayYY) in the processed output path and enforce unique relpaths at runtime.
This ensures:
- every manifest record maps to a real unique image
- baselines and labels remain consistent with the stored file
Next: Milestone 3
Milestone 3 will focus on:
1) training a small clarity model on PC
2) exporting to ONNX
3) converting to RKNN (FP / INT8)
4) running board-side inference on RV1126 and comparing against traditional metrics
And attention please, all of my ideas can be found in CashStolen/AI-embedded-system
