"Invalid Argument (80070057)" crash with Transformer-based models (D-FINE/RT-DETR) due to Int64 indices in Gather/Scatter ops

### Description

We are encountering a persistent E_INVALIDARG (80070057) crash when running **D-FINE (RT-DETR based)** ONNX models using DirectMLExecutionProvider on Windows. The same model runs perfectly on CpuExecutionProvider.

After extensive debugging and graph surgery, we identified the root cause as **DirectML's incompatibility with Int64 indices** in operators like Gather, ScatterND, TopK, and NonZero, which are prevalent in Transformer-based architectures exported from PyTorch.

### Reproduction Steps

1. **Model**: D-FINE (RT-DETR architecture) exported from PyTorch.

   - Contains dynamic shapes and Gather/Scatter ops with Int64 indices.

2. **Environment**:

   - ONNX Runtime: 1.23.0
   - Provider: DirectML
   - OS: Windows 10/11
   - GPU: NVIDIA/Intel (Issue is backend-agnostic)

3. **Code**:

   ```C#
   var options = new SessionOptions(); options.AppendExecutionProvider_DML(0); var session = new InferenceSession("dfine_model.onnx", options); // Crash here or during Run()
   ```

### Observed Behavior

The initialization or execution fails with:

```tex
[E:onnxruntime:, inference_session.cc:2545] Exception during initialization: 
... \DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2853) ... 
Exception(1) tid(...) 80070057
```

### Analysis & Attempts

We have tried the following mitigation strategies, but none yielded a fully working graph due to conflicting constraints:

1. **Baseline (Float32 + Int64 Indices)**: Crashes with 80070057 on DirectML. (Works on CPU).
2. **Global Int64 -> Int32 Downcast**:
   - We force-converted all Int64 tensors to Int32.
   - **Result**: Crashes ONNX Validator with [ErrorCode:InvalidGraph].
   - **Reason**: Operators like Reshape, Resize, and Expand **mandate** Int64 for their shape/scales inputs according to ONNX spec. Downcasting them makes the graph invalid.
3. **Selective Patching (The Deadlock)**:
   - We tried to cast *only* the inputs for Gather/Scatter to Int32, while keeping Reshape inputs as Int64.
   - **Result**: Type mismatch or Topology error.
   - Since PyTorch exports dynamic shape calculations (e.g., Shape -> Gather -> Concat -> Reshape), the data flow creates a dependency chain where a tensor must be Int64 for Reshape but Int32 for Gather. Inserting Casts breaks the constant folding or shape inference in complex subgraphs.

### Request

DirectML seems to lag behind CPU implementation regarding **Int64 support for indexing operators**.Could the DirectML team:

1. **Native Support**: Add native support for Int64 indices in Gather, Scatter, TopK, NonZero.
2. **Auto-Cast**: Or implement an automatic graph optimization pass in DmlExecutionProvider to implicitly downcast Int64 indices to Int32 for supported operators, similar to how TensorRT handles mixed types.

This is a major blocker for deploying modern Transformer-based Vision models (DETR family) on Windows via DirectML.

**Attachments:**

- Visual Studio Debug Log:

```tex
Unhandled exception. Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:Fail] Load model from E:\OpenSource\Github\YoloDotNet_Self\test\assets\Models\dfine_obj365_directml_nuclear.onnx failed:Type Error: Type (tensor(int64)) of output arg (unsqueeze) of node (node_unsqueeze) does not match expected type (tensor(int32)).
at Microsoft.ML.OnnxRuntime.InferenceSession.Init(String modelPath, SessionOptions options, PrePackedWeightsContainer prepackedWeightsContainer)
at Microsoft.ML.OnnxRuntime.InferenceSession..ctor(String modelPath, SessionOptions options)
at YoloDotNet.ExecutionProvider.DirectML.DirectMLExecutionProvider.InitializeYolo(Object model, Int32 gpuId) in E:\OpenSource\Github\YoloDotNet_Self\YoloDotNet.ExecutionProvider.DirectML\DirectMLExecutionProvider.cs:line 86
at YoloDotNet.ExecutionProvider.DirectML.DirectMLExecutionProvider..ctor(String model, Int32 gpuId, OnnxMetadataOverride metadataOverride) in E:\OpenSource\Github\YoloDotNet_Self\YoloDotNet.ExecutionProvider.DirectML\DirectMLExecutionProvider.cs:line 43
```

```tex
[E:onnxruntime:, inference_session.cc:2545 onnxruntime::InferenceSession::Initialize::<lambda_73d8de3ce9bc7d47058d99ebffb3c8e5>::operator ()] Exception during initialization: E:_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2853)\onnxruntime.DLL!00007FFBADFEDC2C: (caller: 00007FFBADFFD699) Exception(1) tid(9eec) 80070057 Unhandled exception. Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:RuntimeException] Exception during initialization: E:_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2853)\onnxruntime.DLL!00007FFBADFEDC2C: (caller: 00007FFBADFFD699) Exception(1) tid(9eec) 80070057 parameter error.
at Microsoft.ML.OnnxRuntime.InferenceSession.Init(String modelPath, SessionOptions options, PrePackedWeightsContainer prepackedWeightsContainer)
at Microsoft.ML.OnnxRuntime.InferenceSession..ctor(String modelPath, SessionOptions options)
at YoloDotNet.ExecutionProvider.DirectML.DirectMLExecutionProvider.InitializeYolo(Object model, Int32 gpuId) in E:\OpenSource\Github\YoloDotNet_Self\YoloDotNet.ExecutionProvider.DirectML\DirectMLExecutionProvider.cs:line 86
at YoloDotNet.ExecutionProvider.DirectML.DirectMLExecutionProvider..ctor(String model, Int32 gpuId, OnnxMetadataOverride metadataOverride) in E:\OpenSource\Github\YoloDotNet_Self\YoloDotNet.ExecutionProvider.DirectML\DirectMLExecutionProvider.cs:line 43
```

```
Unhandled exception. Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:InvalidGraph] Load model from E:\OpenSource\Github\YoloDotNet_Self\test\assets\Models\dfine_obj365_dml_global.onnx failed:This is an invalid model. Type Error: Type 'tensor(int32)' of input parameter (val_577) of operator (Reshape) in node (node_view) is invalid.
at Microsoft.ML.OnnxRuntime.InferenceSession.Init(String modelPath, SessionOptions options, PrePackedWeightsContainer prepackedWeightsContainer)
at Microsoft.ML.OnnxRuntime.InferenceSession..ctor(String modelPath, SessionOptions options)
at YoloDotNet.ExecutionProvider.DirectML.DirectMLExecutionProvider.InitializeYolo(Object model, Int32 gpuId) in E:\OpenSource\Github\YoloDotNet_Self\YoloDotNet.ExecutionProvider.DirectML\DirectMLExecutionProvider.cs:line 86
at YoloDotNet.ExecutionProvider.DirectML.DirectMLExecutionProvider..ctor(String model, Int32 gpuId, OnnxMetadataOverride metadataOverride) in E:\OpenSource\Github\YoloDotNet_Self\YoloDotNet.ExecutionProvider.DirectML\DirectMLExecutionProvider.cs:line 43
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Invalid Argument (80070057)" crash with Transformer-based models (D-FINE/RT-DETR) due to Int64 indices in Gather/Scatter ops #727

Description

Reproduction Steps

Observed Behavior

Analysis & Attempts

Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"Invalid Argument (80070057)" crash with Transformer-based models (D-FINE/RT-DETR) due to Int64 indices in Gather/Scatter ops #727

Description

Description

Reproduction Steps

Observed Behavior

Analysis & Attempts

Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions