Skip to content

Conversation

@samsharma2700
Copy link
Contributor

@samsharma2700 samsharma2700 commented Feb 5, 2026

Description

Add SqlBulkCopyOptions.CacheMetadata to skip redundant metadata queries on repeated WriteToServer calls.

Issue

SqlBulkCopy executes a metadata discovery query against the destination table on every WriteToServer call, even when the same instance is used repeatedly for the same table. This query (which retrieves column types, collations, and encryption metadata) adds a network roundtrip that's unnecessary when the schema hasn't changed.

Related issue: #3627

Solution

Add a new opt-in flag SqlBulkCopyOptions.CacheMetadata that caches the metadata result after the first query and reuses it for subsequent WriteToServer calls to the same table.

Usage

	using var bulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.CacheMetadata, null);
  	bulkCopy.DestinationTableName = "MyTable";

  	bulkCopy.WriteToServer(batch1);  // executes metadata query, caches result
  	bulkCopy.WriteToServer(batch2);  // reuses cache, no query
  	bulkCopy.WriteToServer(batch3);  // reuses cache, no query

  	// If schema changes, manually invalidate:
  	bulkCopy.InvalidateMetadataCache();

Behavior

  • Cache is scoped to the SqlBulkCopy instance (not per-connection or global)
  • Cache is automatically invalidated when DestinationTableName changes
  • Cache is cleared on Dispose()/Close()
  • New public method InvalidateMetadataCache() allows manual invalidation

Why Opt-In

The metadata query provides schema information used by the TDS protocol. Caching assumes the destination table schema won't change between calls. Making this opt-in ensures
backwards compatibility and puts the responsibility for schema stability on the caller.

Changes

  • SqlBulkCopyOptions.cs : Added CacheMetadata = 1 << 7
  • SqlBulkCopy.cs : Cache logic in CreateAndExecuteInitialQueryAsync(), auto-invalidation in DestinationTableName setter, InvalidateMetadataCache() method
  • XML documentation for new API surface
  • Functional tests (10 tests)
  • Manual/integration tests (6 tests)

Testing

Build/Tests are passing.

@samsharma2700 samsharma2700 requested a review from a team as a code owner February 5, 2026 23:44
Copilot AI review requested due to automatic review settings February 5, 2026 23:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new SqlBulkCopyOptions.CacheMetadata flag to optimize repeated bulk copy operations to the same destination table by caching metadata after the first query, thus eliminating redundant metadata discovery queries on subsequent WriteToServer calls. This addresses issue #3627 where metadata queries were executed multiple times per operation, causing significant performance degradation in high-frequency bulk insert scenarios.

Changes:

  • Adds new CacheMetadata enum value to SqlBulkCopyOptions (bit 7)
  • Implements metadata caching logic in SqlBulkCopy with automatic invalidation on table name changes
  • Adds public InvalidateMetadataCache() method for manual cache invalidation
  • Includes comprehensive test coverage in both functional and manual test suites
  • Provides XML documentation for the new API members

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlBulkCopyOptions.cs Adds CacheMetadata enum value (1 << 7)
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlBulkCopy.cs Implements caching logic with fields, cache check, cache storage, invalidation method, and dispose cleanup
src/Microsoft.Data.SqlClient/tests/FunctionalTests/SqlBulkCopyCacheMetadataTest.cs Adds unit tests for flag values, combinations, and API methods
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/CacheMetadata.cs Adds integration tests for caching behavior, invalidation, destination changes, DataTable support, and option combinations
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/SqlBulkCopyTest.cs Wires up the new manual tests
src/Microsoft.Data.SqlClient/tests/ManualTests/Microsoft.Data.SqlClient.ManualTesting.Tests.csproj Adds CacheMetadata.cs to project and introduces an unrelated project reference change
doc/snippets/Microsoft.Data.SqlClient/SqlBulkCopyOptions.xml Documents CacheMetadata option with warnings about schema changes
doc/snippets/Microsoft.Data.SqlClient/SqlBulkCopy.xml Documents InvalidateMetadataCache method

{
string initialQuery = string.Format(initialQueryTemplate, dstTable);

DataTable sourceData = new();
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disposable 'DataTable' is created but not disposed.

Copilot uses AI. Check for mistakes.
@priyankatiwari08 priyankatiwari08 self-assigned this Feb 9, 2026
Copilot AI review requested due to automatic review settings February 9, 2026 20:16
…amsharma2700/sqlbulkcopy_metadata_optimization
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Copilot AI review requested due to automatic review settings February 10, 2026 17:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlBulkCopy.cs:3262

  • ResetWriteToServerGlobalVariables sets _sqlDataReaderRowSource = null; twice. This looks like an accidental duplicate assignment and may be masking a missing reset of a different field. Remove the duplicate (or reset the intended field) to keep the state reset logic clear.
            _rowSourceType = ValueSourceType.Unspecified;
            _sqlDataReaderRowSource = null;
            _sqlDataReaderRowSource = null;
        }

</summary>
<remarks>
<para>
Call this method when you know the destination table schema has changed and you want to force the next <see cref="M:Microsoft.Data.SqlClient.SqlBulkCopy.WriteToServer(System.Data.Common.DbDataReader)" /> operation to refresh the metadata from the server.
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remarks mention only WriteToServer(DbDataReader) as the operation that will refresh metadata, but InvalidateMetadataCache() affects all WriteToServer* overloads. Consider rephrasing to refer to the next bulk copy/WriteToServer operation generically (or reference multiple overloads) to avoid implying it’s DbDataReader-specific.

Suggested change
Call this method when you know the destination table schema has changed and you want to force the next <see cref="M:Microsoft.Data.SqlClient.SqlBulkCopy.WriteToServer(System.Data.Common.DbDataReader)" /> operation to refresh the metadata from the server.
Call this method when you know the destination table schema has changed and you want to force the next bulk copy operation (for example, a subsequent call to one of the <see cref="M:Microsoft.Data.SqlClient.SqlBulkCopy.WriteToServer" /> overloads) to refresh the metadata from the server.

Copilot uses AI. Check for mistakes.
Comment on lines +625 to +628
if (IsCopyOption(SqlBulkCopyOptions.CacheMetadata) && _cachedMetadata != null)
{
metaDataSet = metaDataSet.Clone();
}
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloning metaDataSet here may drop Always Encrypted metadata: _SqlMetaDataSet’s copy ctor doesn’t copy cekTable (see AlwaysEncryptedHelperClasses.cs), and _SqlMetaData.Clone()/SqlMetaDataPriv.CopyFrom() don’t copy isEncrypted/baseTI/cipherMD. Using the cloned set for WriteBulkCopyMetaData / LoadColumnEncryptionKeys could therefore send no CEK table and treat encrypted columns as unencrypted. The clone used for CacheMetadata should preserve the encryption-related fields (or use an alternative approach that avoids mutating the cached set without losing AE metadata).

Suggested change
if (IsCopyOption(SqlBulkCopyOptions.CacheMetadata) && _cachedMetadata != null)
{
metaDataSet = metaDataSet.Clone();
}
// Cloning _SqlMetaDataSet for cached metadata is safe only when there are
// no encrypted columns. The clone/copy routines do not currently copy the
// Always Encrypted metadata (e.g. cekTable on the set and isEncrypted/baseTI/cipherMD
// on individual columns). Using a cloned set in that case would cause encrypted
// columns to be treated as unencrypted.
bool hasEncryptedColumns = false;
for (int encryptedCheckIndex = 0; encryptedCheckIndex < metaDataSet.Length; encryptedCheckIndex++)
{
_SqlMetaData encryptedCheckMetadata = metaDataSet[encryptedCheckIndex];
if (encryptedCheckMetadata != null && encryptedCheckMetadata.IsEncrypted)
{
hasEncryptedColumns = true;
break;
}
}
if (IsCopyOption(SqlBulkCopyOptions.CacheMetadata) && _cachedMetadata != null && !hasEncryptedColumns)
{
metaDataSet = metaDataSet.Clone();
}

Copilot uses AI. Check for mistakes.
@samsharma2700 samsharma2700 added this to the 7.0.0-preview5 milestone Feb 10, 2026
@paulmedynski paulmedynski self-assigned this Feb 10, 2026
Copy link
Contributor

@paulmedynski paulmedynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the code and some of the tests. We can discuss my comments before I review the remainder of the tests.

]]></format>
</example>
</Close>
<InvalidateMetadataCache>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ClearCachedMetadata is a better choice. You even use the term clears in the docs below.

But I'm not convinced we need this method, so stay tuned...

Call this method when you know the destination table schema has changed and you want to force the next <see cref="M:Microsoft.Data.SqlClient.SqlBulkCopy.WriteToServer(System.Data.Common.DbDataReader)" /> operation to refresh the metadata from the server.
</para>
<para>
The cache is automatically invalidated when the <see cref="P:Microsoft.Data.SqlClient.SqlBulkCopy.DestinationTableName" /> property is changed to a different table name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This belongs on the DestinationTableName docs, not here.

<CacheMetadata>
<summary>
<para>
When specified, <b>CacheMetadata</b> caches destination table metadata after the first bulk copy operation, allowing subsequent operations to the same table to skip the metadata discovery query. This can improve performance when performing multiple bulk copy operations to the same destination table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to stick to our new 100 char max per line.

// Check if we have valid cached metadata for the current destination table
if (IsCopyOption(SqlBulkCopyOptions.CacheMetadata) &&
_cachedMetadata != null &&
string.Equals(_cachedDestinationTableName, _destinationTableName, StringComparison.Ordinal))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't possible for the destination table name to differ from the cached name. In fact, you don't need the cached name at all.

if (IsCopyOption(SqlBulkCopyOptions.CacheMetadata))
{
_cachedMetadata = result;
_cachedDestinationTableName = _destinationTableName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a race condition here where the DestinationName setter has just cleared _cachedDestinationTableName, this line copies the current _destinationTableName, and then the setter immediately changes _destinationTableName. You can avoid this by eliminating _cachedDestinationTableName entirely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely avoid race conditions. But also, SqlBulkCopy (like SqlConnection) isn't thread safe and should be documented as such.

}

/// <include file='../../../../../../doc/snippets/Microsoft.Data.SqlClient/SqlBulkCopy.xml' path='docs/members[@name="SqlBulkCopy"]/InvalidateMetadataCache/*'/>
public void InvalidateMetadataCache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced the public API needs this. If the caller knows the schema has changed, they can create a new SqlBulkCopy instance. Allowing this mutation creates race conditions and maintenance burden. Thoughts?

AllowEncryptedValueModifications = 1 << 6,

/// <include file='../../../../../../doc/snippets/Microsoft.Data.SqlClient/SqlBulkCopyOptions.xml' path='docs/members[@name="SqlBulkCopyOptions"]/CacheMetadata/*'/>
CacheMetadata = 1 << 7,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't understand why the codebase uses this 1 << 7 notation - we're all expected to bit-shift in our heads? Why not use binary literal notation 0b_0100_0000, or hexadecimal 0x40, or even a decimal 64?

This change is consistent with the existing values, so I'm not asking for a change. Just ranting.

}

[Fact]
public void CacheMetadata_DoesNotOverlapExistingFlags()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't find a new enum member that we add next week that has the same underlying value. You can get all of the underlying values and look for duplicates in that list.

public void InvalidateMetadataCache_CanBeCalledWithoutError()
{
using SqlBulkCopy bulkCopy = new(new SqlConnection());
bulkCopy.InvalidateMetadataCache();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these tests be in the UnitTests project, and check the private members for side effects? The members would need to become internal in that case.

public class SqlBulkCopyCacheMetadataTest
{
[Fact]
public void SqlMetaDataSet_Clone_ProducesIndependentCopy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These appear to all be unit tests for the _SqlMetaDataSet class. I think they belong in a file and class named appropriately.

And feel free to rename the class to SqlMetaDataSet if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants