Abstract
This paper documents a direct comparison between sequential and non-sequential approaches to drift detection. A large Transformer encoder fails dramatically on the task, while a simpler XGBoost model trained on the same data succeeds.
What it establishes
- More model capacity is not automatically better for drift detection.
- Sequential inductive bias can be a liability when the signal is effectively instantaneous.
- Architecture-task mismatch can produce collapse, not just underperformance.
Why it matters
The paper strengthens a central ABIS argument: the right representation and the right mathematical framing matter more than using the most fashionable model class.