The Result
8 distilled ADAS inference graphs — lane, object, sign, light, pedestrian, depth, radar, decision — executed in 71 cycles total. 1-cycle bank switch. Verified in RTL simulation.
What VAIDAS Is
VAIDAS = VAI + ADAS — Virtual AI Inference for Advanced Driver Assistance Systems. It applies WioWiz's VAI principle (weights as ROM, 1-cycle model switch) to the ADAS domain.
Traditional ADAS chips process models sequentially: load lane detection weights, infer, unload, load object detection weights, infer, unload... The weight transfer overhead dominates latency.
VAIDAS eliminates this. 8 models stay resident in 8 weight banks. Switching is a mux select, not a memory reload.
VAIDAS targets control-critical inference, not perception pretraining or floating-point accuracy benchmarks.
VAIDAS 8-Bank Architecture: Parallel processing with priority-based fusion
Why not GPUs / FP4?
GPUs optimize throughput. VAIDAS optimizes worst-case latency. In safety systems, p99 latency matters more than TOPS/W.
The 8-Bank Architecture
Each bank is optimized for a specific ADAS function. Bank 7 doesn't average results — it applies weighted priority fusion where pedestrian detection can override everything.
| Bank | Function | Model Source | Priority |
|---|---|---|---|
| 0 | Lane Detection | TuSimple-derived | Medium |
| 1 | Object Detection | YOLO-derived | Medium |
| 2 | Traffic Sign | ResNet-derived | Low |
| 3 | Traffic Light | ResNet-derived | High |
| 4 | Pedestrian Detection | YOLO-derived | CRITICAL |
| 5 | Depth Estimation | MiDaS-derived | Medium |
| 6 | Vehicle Tracking | Custom MLP | High |
| 7 | Decision Fusion | Fusion MLP | Final |
Why 8 Banks? Not 4, not 16 — 8 is the sweet spot. Covers all critical ADAS functions. Fits in single chip with reasonable area. Matches weight buffer banking (8 SRAM banks for parallel loading). Allows 1-cycle bank switching during inference.
The Math: Why 71 Cycles, Not 800
Traditional NPU (Reload Every Switch)
Load Model A ....... 100 cycles Compute .............. 10 cycles Load Model B ....... 100 cycles Compute .............. 10 cycles ... × 8 models ... ───────────────────────────── Total: 800+ cycles
VAIDAS (VAI Architecture)
Load 8 Banks (boot) . ONE TIME
Compute Bank 0 ....... 8 cycles
Switch ............... 1 cycle
Compute Bank 1 ....... 8 cycles
... × 8 models ...
─────────────────────────────
Total: 71 cycles
At 100 MHz clock:
Traditional: 8+ μs for 8 models | VAIDAS: 710 ns for 8
models
11× architectural speedup — not from better algorithms, from
eliminating the reload.
The SoC Architecture
VAIDAS SoC: Ibex RISC-V + WZ-NPU with 8-Bank VAI Architecture
The RTL Core: 1-Cycle Bank Switch
// vai_wrom_manager.v — The VIA Core
module vai_wrom_manager #(
parameter NUM_BANKS = 8,
parameter BANK_DEPTH = 4096,
parameter DATA_WIDTH = 8
)(
input wire clk,
input wire [2:0] bank_sel, // 3 bits → 8 banks
input wire [11:0] read_addr,
output wire [7:0] read_data // ACTIVE IN 1 CYCLE
);
reg [DATA_WIDTH-1:0] wrom [0:NUM_BANKS-1][0:BANK_DEPTH-1];
// No state machine. No handshake. No reload.
// Just a mux.
assign read_data = wrom[bank_sel][read_addr];
endmodule
This is the invention. Not a cache. Not prefetch. Direct addressing across banks with combinational selection.
Verification: What We Proved
| Test | Method | Result |
|---|---|---|
| wz_systolic_8x8 MAC accuracy | Golden model comparison | 64/64 match ✓ |
| 8-bank 1-cycle switch | Verilator waveform | 8 switches in 8 cycles ✓ |
| Ibex → AHB → NPU bus | Handshake timing | rvalid/rready verified ✓ |
| Full SoC integration | Behavioral simulation | PASS ✓ |
| CARLA demo integration | Video pipeline | GO/SLOW/STOP decisions ✓ |
# Simulation command
$ cd ~/widar/verticals/auto/vaidas
$ make sim_npu
# Output: 8 PASS, 0 FAIL
Demo: CARLA + VAIDAS Integration
VAIDAS was integrated with CARLA simulator for end-to-end demonstration. Real feature extraction (YOLOv8, MiDaS) → VAIDAS NPU decisions → Vehicle control.
Tesla Model 3 • YOLOv8 Detection • GO/SLOW/STOP Overlay
Pipeline:
CARLA 3D World → Camera Frame → Feature Extractors
→ 8×8 Feature Vectors → VAIDAS NPU
→ Decision (GO/SLOW/STOP) → Vehicle Control
Result:
Vehicle responds to VAIDAS decisions
Speed: 0 → 27 → 0 km/h based on detected obstacles
The Philosophy
Others optimize algorithms. We optimize silicon behavior.
Algorithms change. Latency must not.
VAIDAS is not an AI accelerator. It is deterministic math silicon that executes inference graphs. The weights are data. The compute is fixed. The latency is architectural.
If a model cannot be fixed-point, bounded-latency, and deterministic — it doesn't belong in safety silicon.
VAIDAS is to ADAS what hard real-time schedulers are to avionics.
What's Next
Indian Dashcam Dataset: Training VAIDAS on real Indian road conditions — chaotic traffic, pedestrians everywhere, unmarked lanes, two-wheelers cutting across. This is the ultimate stress test for ADAS.
- Scale to 64×64 systolic array — 4,096 MACs for production-grade inference
- Indian road dataset collection — Real dashcam videos from Bangalore, Delhi, Mumbai traffic
- Lane detection for unmarked roads — Indian roads don't have clear lane markings
- Two-wheeler priority detection — Most common road users in India
- Multi-chip scaling — VAIDAS clusters for Level 4+ autonomy
⚠️ Technical Notes
- Open models (YOLO, MiDaS, TuSimple, ResNet) used for research and benchmarking
- Weights retrained, adapted, and quantized by WioWiz for INT8 fixed-point execution
- VAIDAS executes distilled, fixed-point inference graphs — not full floating-point models
- Architecture is model-agnostic — any conforming inference graph can be loaded
- VAIDAS is pre-silicon; verification conducted via Verilator simulation