-
Notifications
You must be signed in to change notification settings - Fork 321
Feature | Add SqlBulkCopyOptions.CacheMetadata flag #3939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature | Add SqlBulkCopyOptions.CacheMetadata flag #3939
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new SqlBulkCopyOptions.CacheMetadata flag to optimize repeated bulk copy operations to the same destination table by caching metadata after the first query, thus eliminating redundant metadata discovery queries on subsequent WriteToServer calls. This addresses issue #3627 where metadata queries were executed multiple times per operation, causing significant performance degradation in high-frequency bulk insert scenarios.
Changes:
- Adds new
CacheMetadataenum value toSqlBulkCopyOptions(bit 7) - Implements metadata caching logic in
SqlBulkCopywith automatic invalidation on table name changes - Adds public
InvalidateMetadataCache()method for manual cache invalidation - Includes comprehensive test coverage in both functional and manual test suites
- Provides XML documentation for the new API members
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlBulkCopyOptions.cs | Adds CacheMetadata enum value (1 << 7) |
| src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlBulkCopy.cs | Implements caching logic with fields, cache check, cache storage, invalidation method, and dispose cleanup |
| src/Microsoft.Data.SqlClient/tests/FunctionalTests/SqlBulkCopyCacheMetadataTest.cs | Adds unit tests for flag values, combinations, and API methods |
| src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/CacheMetadata.cs | Adds integration tests for caching behavior, invalidation, destination changes, DataTable support, and option combinations |
| src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/SqlBulkCopyTest.cs | Wires up the new manual tests |
| src/Microsoft.Data.SqlClient/tests/ManualTests/Microsoft.Data.SqlClient.ManualTesting.Tests.csproj | Adds CacheMetadata.cs to project and introduces an unrelated project reference change |
| doc/snippets/Microsoft.Data.SqlClient/SqlBulkCopyOptions.xml | Documents CacheMetadata option with warnings about schema changes |
| doc/snippets/Microsoft.Data.SqlClient/SqlBulkCopy.xml | Documents InvalidateMetadataCache method |
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/SqlBulkCopyTest.cs
Show resolved
Hide resolved
...crosoft.Data.SqlClient/tests/ManualTests/Microsoft.Data.SqlClient.ManualTesting.Tests.csproj
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlBulkCopy.cs
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlBulkCopyOptions.cs
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlBulkCopy.cs
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlBulkCopy.cs
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/CacheMetadata.cs
Outdated
Show resolved
Hide resolved
| { | ||
| string initialQuery = string.Format(initialQueryTemplate, dstTable); | ||
|
|
||
| DataTable sourceData = new(); |
Copilot
AI
Feb 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disposable 'DataTable' is created but not disposed.
…amsharma2700/sqlbulkcopy_metadata_optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/CacheMetadata.cs
Show resolved
Hide resolved
…amsharma2700/sqlbulkcopy_metadata_optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlBulkCopy.cs:3262
ResetWriteToServerGlobalVariablessets_sqlDataReaderRowSource = null;twice. This looks like an accidental duplicate assignment and may be masking a missing reset of a different field. Remove the duplicate (or reset the intended field) to keep the state reset logic clear.
_rowSourceType = ValueSourceType.Unspecified;
_sqlDataReaderRowSource = null;
_sqlDataReaderRowSource = null;
}
| </summary> | ||
| <remarks> | ||
| <para> | ||
| Call this method when you know the destination table schema has changed and you want to force the next <see cref="M:Microsoft.Data.SqlClient.SqlBulkCopy.WriteToServer(System.Data.Common.DbDataReader)" /> operation to refresh the metadata from the server. |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remarks mention only WriteToServer(DbDataReader) as the operation that will refresh metadata, but InvalidateMetadataCache() affects all WriteToServer* overloads. Consider rephrasing to refer to the next bulk copy/WriteToServer operation generically (or reference multiple overloads) to avoid implying it’s DbDataReader-specific.
| Call this method when you know the destination table schema has changed and you want to force the next <see cref="M:Microsoft.Data.SqlClient.SqlBulkCopy.WriteToServer(System.Data.Common.DbDataReader)" /> operation to refresh the metadata from the server. | |
| Call this method when you know the destination table schema has changed and you want to force the next bulk copy operation (for example, a subsequent call to one of the <see cref="M:Microsoft.Data.SqlClient.SqlBulkCopy.WriteToServer" /> overloads) to refresh the metadata from the server. |
| if (IsCopyOption(SqlBulkCopyOptions.CacheMetadata) && _cachedMetadata != null) | ||
| { | ||
| metaDataSet = metaDataSet.Clone(); | ||
| } |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloning metaDataSet here may drop Always Encrypted metadata: _SqlMetaDataSet’s copy ctor doesn’t copy cekTable (see AlwaysEncryptedHelperClasses.cs), and _SqlMetaData.Clone()/SqlMetaDataPriv.CopyFrom() don’t copy isEncrypted/baseTI/cipherMD. Using the cloned set for WriteBulkCopyMetaData / LoadColumnEncryptionKeys could therefore send no CEK table and treat encrypted columns as unencrypted. The clone used for CacheMetadata should preserve the encryption-related fields (or use an alternative approach that avoids mutating the cached set without losing AE metadata).
| if (IsCopyOption(SqlBulkCopyOptions.CacheMetadata) && _cachedMetadata != null) | |
| { | |
| metaDataSet = metaDataSet.Clone(); | |
| } | |
| // Cloning _SqlMetaDataSet for cached metadata is safe only when there are | |
| // no encrypted columns. The clone/copy routines do not currently copy the | |
| // Always Encrypted metadata (e.g. cekTable on the set and isEncrypted/baseTI/cipherMD | |
| // on individual columns). Using a cloned set in that case would cause encrypted | |
| // columns to be treated as unencrypted. | |
| bool hasEncryptedColumns = false; | |
| for (int encryptedCheckIndex = 0; encryptedCheckIndex < metaDataSet.Length; encryptedCheckIndex++) | |
| { | |
| _SqlMetaData encryptedCheckMetadata = metaDataSet[encryptedCheckIndex]; | |
| if (encryptedCheckMetadata != null && encryptedCheckMetadata.IsEncrypted) | |
| { | |
| hasEncryptedColumns = true; | |
| break; | |
| } | |
| } | |
| if (IsCopyOption(SqlBulkCopyOptions.CacheMetadata) && _cachedMetadata != null && !hasEncryptedColumns) | |
| { | |
| metaDataSet = metaDataSet.Clone(); | |
| } |
paulmedynski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed the code and some of the tests. We can discuss my comments before I review the remainder of the tests.
| ]]></format> | ||
| </example> | ||
| </Close> | ||
| <InvalidateMetadataCache> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ClearCachedMetadata is a better choice. You even use the term clears in the docs below.
But I'm not convinced we need this method, so stay tuned...
| Call this method when you know the destination table schema has changed and you want to force the next <see cref="M:Microsoft.Data.SqlClient.SqlBulkCopy.WriteToServer(System.Data.Common.DbDataReader)" /> operation to refresh the metadata from the server. | ||
| </para> | ||
| <para> | ||
| The cache is automatically invalidated when the <see cref="P:Microsoft.Data.SqlClient.SqlBulkCopy.DestinationTableName" /> property is changed to a different table name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This belongs on the DestinationTableName docs, not here.
| <CacheMetadata> | ||
| <summary> | ||
| <para> | ||
| When specified, <b>CacheMetadata</b> caches destination table metadata after the first bulk copy operation, allowing subsequent operations to the same table to skip the metadata discovery query. This can improve performance when performing multiple bulk copy operations to the same destination table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to stick to our new 100 char max per line.
| // Check if we have valid cached metadata for the current destination table | ||
| if (IsCopyOption(SqlBulkCopyOptions.CacheMetadata) && | ||
| _cachedMetadata != null && | ||
| string.Equals(_cachedDestinationTableName, _destinationTableName, StringComparison.Ordinal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't possible for the destination table name to differ from the cached name. In fact, you don't need the cached name at all.
| if (IsCopyOption(SqlBulkCopyOptions.CacheMetadata)) | ||
| { | ||
| _cachedMetadata = result; | ||
| _cachedDestinationTableName = _destinationTableName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a race condition here where the DestinationName setter has just cleared _cachedDestinationTableName, this line copies the current _destinationTableName, and then the setter immediately changes _destinationTableName. You can avoid this by eliminating _cachedDestinationTableName entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely avoid race conditions. But also, SqlBulkCopy (like SqlConnection) isn't thread safe and should be documented as such.
| } | ||
|
|
||
| /// <include file='../../../../../../doc/snippets/Microsoft.Data.SqlClient/SqlBulkCopy.xml' path='docs/members[@name="SqlBulkCopy"]/InvalidateMetadataCache/*'/> | ||
| public void InvalidateMetadataCache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced the public API needs this. If the caller knows the schema has changed, they can create a new SqlBulkCopy instance. Allowing this mutation creates race conditions and maintenance burden. Thoughts?
| AllowEncryptedValueModifications = 1 << 6, | ||
|
|
||
| /// <include file='../../../../../../doc/snippets/Microsoft.Data.SqlClient/SqlBulkCopyOptions.xml' path='docs/members[@name="SqlBulkCopyOptions"]/CacheMetadata/*'/> | ||
| CacheMetadata = 1 << 7, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't understand why the codebase uses this 1 << 7 notation - we're all expected to bit-shift in our heads? Why not use binary literal notation 0b_0100_0000, or hexadecimal 0x40, or even a decimal 64?
This change is consistent with the existing values, so I'm not asking for a change. Just ranting.
| } | ||
|
|
||
| [Fact] | ||
| public void CacheMetadata_DoesNotOverlapExistingFlags() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't find a new enum member that we add next week that has the same underlying value. You can get all of the underlying values and look for duplicates in that list.
| public void InvalidateMetadataCache_CanBeCalledWithoutError() | ||
| { | ||
| using SqlBulkCopy bulkCopy = new(new SqlConnection()); | ||
| bulkCopy.InvalidateMetadataCache(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these tests be in the UnitTests project, and check the private members for side effects? The members would need to become internal in that case.
| public class SqlBulkCopyCacheMetadataTest | ||
| { | ||
| [Fact] | ||
| public void SqlMetaDataSet_Clone_ProducesIndependentCopy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These appear to all be unit tests for the _SqlMetaDataSet class. I think they belong in a file and class named appropriately.
And feel free to rename the class to SqlMetaDataSet if you want.
Description
Add SqlBulkCopyOptions.CacheMetadata to skip redundant metadata queries on repeated WriteToServer calls.
Issue
SqlBulkCopy executes a metadata discovery query against the destination table on every WriteToServer call, even when the same instance is used repeatedly for the same table. This query (which retrieves column types, collations, and encryption metadata) adds a network roundtrip that's unnecessary when the schema hasn't changed.
Related issue: #3627
Solution
Add a new opt-in flag
SqlBulkCopyOptions.CacheMetadatathat caches the metadata result after the first query and reuses it for subsequentWriteToServercalls to the same table.Usage
Behavior
Why Opt-In
The metadata query provides schema information used by the TDS protocol. Caching assumes the destination table schema won't change between calls. Making this opt-in ensures
backwards compatibility and puts the responsibility for schema stability on the caller.
Changes
Testing
Build/Tests are passing.