For Whisper, input mel length is 3000 (30s @ hop=160), but encoder internal length is 1500 due to conv stride=2. The one-shot enc+dec path should accept encoder_input_features with shape [B,3000,128] ...