VAIDAS: Real-Time ADAS Inference on VAI Architecture

WIOWIZ Technologies • February 2026 • 8 min read

We asked: Can we run 8 ADAS models with deterministic latency on a single chip? Not by optimizing algorithms — by eliminating the reload problem architecturally.

The Result

8 distilled ADAS inference graphs — lane, object, sign, light, pedestrian, depth, radar, decision — executed in 71 cycles total. 1-cycle bank switch. Verified in RTL simulation.

What VAIDAS Is

VAIDAS = VAI + ADAS — Virtual AI Inference for Advanced Driver Assistance Systems. It applies WioWiz's VAI principle (weights as ROM, 1-cycle model switch) to the ADAS domain.

Traditional ADAS chips process models sequentially: load lane detection weights, infer, unload, load object detection weights, infer, unload... The weight transfer overhead dominates latency.

VAIDAS eliminates this. 8 models stay resident in 8 weight banks. Switching is a mux select, not a memory reload.

VAIDAS targets control-critical inference, not perception pretraining or floating-point accuracy benchmarks.

VAIDAS 8-Bank Architecture: Parallel processing with priority-based fusion

Why not GPUs / FP4?

GPUs optimize throughput. VAIDAS optimizes worst-case latency. In safety systems, p99 latency matters more than TOPS/W.

Weight Banks

ADAS Models

Cycle Switch

Total Cycles

The 8-Bank Architecture

Each bank is optimized for a specific ADAS function. Bank 7 doesn't average results — it applies weighted priority fusion where pedestrian detection can override everything.

Bank	Function	Model Source	Priority
0	Lane Detection	TuSimple-derived	Medium
1	Object Detection	YOLO-derived	Medium
2	Traffic Sign	ResNet-derived	Low
3	Traffic Light	ResNet-derived	High
4	Pedestrian Detection	YOLO-derived	CRITICAL
5	Depth Estimation	MiDaS-derived	Medium
6	Vehicle Tracking	Custom MLP	High
7	Decision Fusion	Fusion MLP	Final

Why 8 Banks? Not 4, not 16 — 8 is the sweet spot. Covers all critical ADAS functions. Fits in single chip with reasonable area. Matches weight buffer banking (8 SRAM banks for parallel loading). Allows 1-cycle bank switching during inference.

The Math: Why 71 Cycles, Not 800

Traditional NPU (Reload Every Switch)

Load Model A ....... 100 cycles
Compute .............. 10 cycles
Load Model B ....... 100 cycles
Compute .............. 10 cycles
... × 8 models ...
─────────────────────────────
Total: 800+ cycles

VAIDAS (VAI Architecture)

Load 8 Banks (boot) . ONE TIME
Compute Bank 0 ....... 8 cycles
Switch ............... 1 cycle
Compute Bank 1 ....... 8 cycles
... × 8 models ...
─────────────────────────────
Total: 71 cycles

At 100 MHz clock:
Traditional: 8+ μs for 8 models | VAIDAS: 710 ns for 8 models
11× architectural speedup — not from better algorithms, from eliminating the reload.

The SoC Architecture

VAIDAS SoC: Ibex RISC-V + WZ-NPU with 8-Bank VAI Architecture

The RTL Core: 1-Cycle Bank Switch

// vai_wrom_manager.v — The VIA Core
module vai_wrom_manager #(
    parameter NUM_BANKS  = 8,
    parameter BANK_DEPTH = 4096,
    parameter DATA_WIDTH = 8
)(
    input  wire        clk,
    input  wire [2:0]  bank_sel,      // 3 bits → 8 banks
    input  wire [11:0] read_addr,
    output wire [7:0]  read_data      // ACTIVE IN 1 CYCLE
);
    reg [DATA_WIDTH-1:0] wrom [0:NUM_BANKS-1][0:BANK_DEPTH-1];
    
    // No state machine. No handshake. No reload.
    // Just a mux.
    assign read_data = wrom[bank_sel][read_addr];
endmodule

This is the invention. Not a cache. Not prefetch. Direct addressing across banks with combinational selection.

Verification: What We Proved

Test	Method	Result
wz_systolic_8x8 MAC accuracy	Golden model comparison	64/64 match ✓
8-bank 1-cycle switch	Verilator waveform	8 switches in 8 cycles ✓
Ibex → AHB → NPU bus	Handshake timing	rvalid/rready verified ✓
Full SoC integration	Behavioral simulation	PASS ✓
CARLA demo integration	Video pipeline	GO/SLOW/STOP decisions ✓

# Simulation command
$ cd ~/widar/verticals/auto/vaidas
$ make sim_npu

# Output: 8 PASS, 0 FAIL

Demo: CARLA + VAIDAS Integration

VAIDAS was integrated with CARLA simulator for end-to-end demonstration. Real feature extraction (YOLOv8, MiDaS) → VAIDAS NPU decisions → Vehicle control.

Tesla Model 3 • YOLOv8 Detection • GO/SLOW/STOP Overlay

Pipeline:
CARLA 3D World → Camera Frame → Feature Extractors
     → 8×8 Feature Vectors → VAIDAS NPU
     → Decision (GO/SLOW/STOP) → Vehicle Control

Result:
Vehicle responds to VAIDAS decisions
Speed: 0 → 27 → 0 km/h based on detected obstacles

The Philosophy

Others optimize algorithms. We optimize silicon behavior.

Algorithms change. Latency must not.

VAIDAS is not an AI accelerator. It is deterministic math silicon that executes inference graphs. The weights are data. The compute is fixed. The latency is architectural.

If a model cannot be fixed-point, bounded-latency, and deterministic — it doesn't belong in safety silicon.

VAIDAS is to ADAS what hard real-time schedulers are to avionics.

What's Next

Indian Dashcam Dataset: Training VAIDAS on real Indian road conditions — chaotic traffic, pedestrians everywhere, unmarked lanes, two-wheelers cutting across. This is the ultimate stress test for ADAS.

Scale to 64×64 systolic array — 4,096 MACs for production-grade inference
Indian road dataset collection — Real dashcam videos from Bangalore, Delhi, Mumbai traffic
Lane detection for unmarked roads — Indian roads don't have clear lane markings
Two-wheeler priority detection — Most common road users in India
Multi-chip scaling — VAIDAS clusters for Level 4+ autonomy

⚠️ Technical Notes

Open models (YOLO, MiDaS, TuSimple, ResNet) used for research and benchmarking
Weights retrained, adapted, and quantized by WioWiz for INT8 fixed-point execution
VAIDAS executes distilled, fixed-point inference graphs — not full floating-point models
Architecture is model-agnostic — any conforming inference graph can be loaded
VAIDAS is pre-silicon; verification conducted via Verilator simulation

#semiconductor #ADAS #NPU #autonomous #verification #RTL #VAI #India

ADAS & Autonomous

Real-Time ADAS Inference on VAI Architecture