Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions docs/arch/adr-fhir-ingested-data-size-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
## ADR: FHIR Ingested Data Size Calculation

Pull Requests: [Initial Change](https://github.com/microsoft/fhir-server/pull/4856)


### Problem Statement
- Persist the decompressed size of each resource.
- Calculate total data size using ingested volume of resources and total index size

### Context
To support AHDS pricing strategy shift from used storage to ingested volume

### Implementation Details

#### Schema Changes, Resource Persistence Logic, Data Backfill
Add DecompressedSize column to Resource table:
- Column: DecompressedSize INT NULL
- Stores the uncompressed size of each resource in bytes
- Nullable to support gradual rollout and historical data backfill

Parameter table entries:
- FHIR_TotalDataSize: Stores ( total ingested data size + total index size) in GB
- FHIR_TotalIndexSize: Stores total index size in GB
- Both entries include timestamp of last calculation

Modify all resource write operations to:
- Calculate decompressed size before compression
- Pass DecompressedSize value to data layer.
- Populate the new column for all new/updated resources

Historical Data Backfill
- Create a one-time migration script to calculate and populate DecompressedSize for all historical records.
- Execute updates in batches to minimize performance impact.

#### Background Calculation Job
Implement a periodic background job that runs every 4 hours to:

Calculate metrics:
- Sum of decompressed resource sizes (ingested volume)
- Sum of compressed resource sizes (actual storage)
- Total database used space (from SQL Server DMVs)
- Total index size = Total used space - Compressed resource size
- Total data size = Decompressed resource size + Total index size

Persist results:
- Update Parameters table with new metrics
- Include timestamp for each update

Emit notification:
- Publish TotalDataSizeNotification event containing:
- DateTimeOffset: Timestamp of calculation
- TotalDataSizeInGB: Total ingested volume + Total index size (decimal)
- TotalIndexSizeInGB: Index overhead only (decimal)

### Implementation Phases

- Phase 1: Schema Changes, Resource Persistence Logic
- Phase 2: Data Backfill
- Phase 3: Background Calculation Job

### Status
Proposed

### Performance Metrics

**Historical Data Backfill Performance:**
- Estimated completion time: 8 hour per 1TB of existing data on 32vCores
- Processing occurs in batches to minimize performance impact during schema upgrade

**Background Calculation Job Performance:**
- Small database (3TB): Approximately 2 minutes per calculation cycle
- Large database (128TB): Approximately 4 hours per calculation cycle
- Job frequency: Runs every 4 hours to maintain current metrics
- Database size correlation: Calculation time scales linearly with database size

### Consequences
- Background job adds periodic database load every 4 hours
- Failure in job does not impact core FHIR server functionality
- Falure in job results in stale data size metrics until next successful run

Large diffs are not rendered by default.

6,494 changes: 6,494 additions & 0 deletions src/Microsoft.Health.Fhir.SqlServer/Features/Schema/Migrations/102.sql

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -111,5 +111,6 @@ public enum SchemaVersion
V99 = 99,
V100 = 100,
V101 = 101,
V102 = 102,
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ namespace Microsoft.Health.Fhir.SqlServer.Features.Schema
public static class SchemaVersionConstants
{
public const int Min = (int)SchemaVersion.V94;
public const int Max = (int)SchemaVersion.V101;
public const int Max = (int)SchemaVersion.V102;
public const int MinForUpgrade = (int)SchemaVersion.V94; // this is used for upgrade tests only
public const int SearchParameterStatusSchemaVersion = (int)SchemaVersion.V6;
public const int SupportForReferencesWithMissingTypeVersion = (int)SchemaVersion.V7;
Expand Down Expand Up @@ -37,6 +37,7 @@ public static class SchemaVersionConstants
public const int SearchParameterOptimisticConcurrency = (int)SchemaVersion.V95;
public const int SearchParameterMaxLastUpdatedStoredProcedure = (int)SchemaVersion.V96;
public const int SearchParameterLastUpdatedIndex = (int)SchemaVersion.V98;
public const int DecompressedSize = (int)SchemaVersion.V102;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in chat please change to DecompressedLength, as it is in line with SQL naming convention.


// It is currently used in Azure Healthcare APIs.
public const int ParameterizedRemovePartitionFromResourceChangesVersion = (int)SchemaVersion.V21;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,24 @@
CREATE PROCEDURE dbo.CaptureResourceIdsForChanges @Resources dbo.ResourceList READONLY
CREATE PROCEDURE dbo.CaptureResourceIdsForChanges
@Resources dbo.ResourceList READONLY,
@Resources_Temp dbo.ResourceList_Temp READONLY
AS
set nocount on
-- This procedure is intended to be called from the MergeResources procedure and relies on its transaction logic
INSERT INTO dbo.ResourceChangeData
( ResourceId, ResourceTypeId, ResourceVersion, ResourceChangeTypeId )
SELECT ResourceId, ResourceTypeId, Version, CASE WHEN IsDeleted = 1 THEN 2 WHEN Version > 1 THEN 1 ELSE 0 END
FROM @Resources
WHERE IsHistory = 0

IF EXISTS (SELECT 1 FROM @Resources_Temp)
BEGIN
INSERT INTO dbo.ResourceChangeData
( ResourceId, ResourceTypeId, ResourceVersion, ResourceChangeTypeId )
SELECT ResourceId, ResourceTypeId, Version, CASE WHEN IsDeleted = 1 THEN 2 WHEN Version > 1 THEN 1 ELSE 0 END
FROM @Resources_Temp
WHERE IsHistory = 0
END
ELSE
BEGIN
INSERT INTO dbo.ResourceChangeData
( ResourceId, ResourceTypeId, ResourceVersion, ResourceChangeTypeId )
SELECT ResourceId, ResourceTypeId, Version, CASE WHEN IsDeleted = 1 THEN 2 WHEN Version > 1 THEN 1 ELSE 0 END
FROM @Resources
WHERE IsHistory = 0
END
GO
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ CREATE PROCEDURE dbo.MergeResources
,@TransactionId bigint = NULL
,@SingleTransaction bit = 1
,@Resources dbo.ResourceList READONLY
,@Resources_Temp dbo.ResourceList_Temp READONLY
,@ResourceWriteClaims dbo.ResourceWriteClaimList READONLY
,@ReferenceSearchParams dbo.ReferenceSearchParamList READONLY
,@TokenSearchParams dbo.TokenSearchParamList READONLY
Expand All @@ -33,8 +34,43 @@ DECLARE @st datetime = getUTCdate()
,@DummyTop bigint = 9223372036854775807
,@InitialTranCount int = @@trancount
,@IsRetry bit = 0

DECLARE @Mode varchar(200) = isnull((SELECT 'RT=['+convert(varchar,min(ResourceTypeId))+','+convert(varchar,max(ResourceTypeId))+'] Sur=['+convert(varchar,min(ResourceSurrogateId))+','+convert(varchar,max(ResourceSurrogateId))+'] V='+convert(varchar,max(Version))+' Rows='+convert(varchar,count(*)) FROM @Resources),'Input=Empty')
,@HasDecompressedSize bit = 0

-- Create working table and populate from appropriate source
DECLARE @WorkingResources TABLE
(
ResourceTypeId smallint NOT NULL
,ResourceSurrogateId bigint NOT NULL
,ResourceId varchar(64) COLLATE Latin1_General_100_CS_AS NOT NULL
,Version int NOT NULL
,HasVersionToCompare bit NOT NULL -- in case of multiple versions per resource indicates that row contains (existing version + 1) value
,IsDeleted bit NOT NULL
,IsHistory bit NOT NULL
,KeepHistory bit NOT NULL
,RawResource varbinary(max) NOT NULL
,IsRawResourceMetaSet bit NOT NULL
,RequestMethod varchar(10) NULL
,SearchParamHash varchar(64) NULL
,DecompressedSize INT NULL
)

IF EXISTS (SELECT 1 FROM @Resources_Temp)
BEGIN
SET @HasDecompressedSize = 1
INSERT INTO @WorkingResources
(ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, HasVersionToCompare, KeepHistory, DecompressedSize)
SELECT ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, HasVersionToCompare, KeepHistory, DecompressedSize
FROM @Resources_Temp
END
ELSE
BEGIN
INSERT INTO @WorkingResources
(ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, HasVersionToCompare, KeepHistory, DecompressedSize)
SELECT ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, HasVersionToCompare, KeepHistory, NULL
FROM @Resources
END

DECLARE @Mode varchar(200) = isnull((SELECT 'RT=['+convert(varchar,min(ResourceTypeId))+','+convert(varchar,max(ResourceTypeId))+'] Sur=['+convert(varchar,min(ResourceSurrogateId))+','+convert(varchar,max(ResourceSurrogateId))+'] V='+convert(varchar,max(Version))+' Rows='+convert(varchar,count(*)) FROM @WorkingResources),'Input=Empty')
SET @Mode += ' E='+convert(varchar,@RaiseExceptionOnConflict)+' CC='+convert(varchar,@IsResourceChangeCaptureEnabled)+' IT='+convert(varchar,@InitialTranCount)+' T='+isnull(convert(varchar,@TransactionId),'NULL')+' ST='+convert(varchar,@SingleTransaction)

SET @AffectedRows = 0
Expand All @@ -60,7 +96,7 @@ BEGIN TRY
IF @InitialTranCount = 0
BEGIN
IF EXISTS (SELECT * -- This extra statement avoids putting range locks when we don't need them
FROM @Resources A JOIN dbo.Resource B ON B.ResourceTypeId = A.ResourceTypeId AND B.ResourceSurrogateId = A.ResourceSurrogateId
FROM @WorkingResources A JOIN dbo.Resource B ON B.ResourceTypeId = A.ResourceTypeId AND B.ResourceSurrogateId = A.ResourceSurrogateId
--WHERE B.IsHistory = 0 -- With this clause wrong plans are created on empty/small database. Commented until resource separation is in place.
)
BEGIN
Expand All @@ -69,15 +105,15 @@ BEGIN TRY
INSERT INTO @Existing
( ResourceTypeId, SurrogateId )
SELECT B.ResourceTypeId, B.ResourceSurrogateId
FROM (SELECT TOP (@DummyTop) * FROM @Resources) A
FROM (SELECT TOP (@DummyTop) * FROM @WorkingResources) A
JOIN dbo.Resource B WITH (ROWLOCK, HOLDLOCK) ON B.ResourceTypeId = A.ResourceTypeId AND B.ResourceSurrogateId = A.ResourceSurrogateId
WHERE B.IsHistory = 0
AND B.ResourceId = A.ResourceId
AND B.Version = A.Version
OPTION (MAXDOP 1, OPTIMIZE FOR (@DummyTop = 1))

-- If all resources being merged are already in the resource table with updated versions this is a retry and only search parameters need to be updated.
IF @@rowcount = (SELECT count(*) FROM @Resources) SET @IsRetry = 1
IF @@rowcount = (SELECT count(*) FROM @WorkingResources) SET @IsRetry = 1

IF @IsRetry = 0 COMMIT TRANSACTION -- commit check transaction
END
Expand All @@ -92,7 +128,7 @@ BEGIN TRY
INSERT INTO @ResourceInfos
( ResourceTypeId, SurrogateId, Version, KeepHistory, PreviousVersion, PreviousSurrogateId )
SELECT A.ResourceTypeId, A.ResourceSurrogateId, A.Version, A.KeepHistory, B.Version, B.ResourceSurrogateId
FROM (SELECT TOP (@DummyTop) * FROM @Resources WHERE HasVersionToCompare = 1) A
FROM (SELECT TOP (@DummyTop) * FROM @WorkingResources WHERE HasVersionToCompare = 1) A
LEFT OUTER JOIN dbo.Resource B -- WITH (UPDLOCK, HOLDLOCK) These locking hints cause deadlocks and are not needed. Racing might lead to tries to insert dups in unique index (with version key), but it will fail anyway, and in no case this will cause incorrect data saved.
ON B.ResourceTypeId = A.ResourceTypeId AND B.ResourceId = A.ResourceId AND B.IsHistory = 0
OPTION (MAXDOP 1, OPTIMIZE FOR (@DummyTop = 1))
Expand All @@ -119,6 +155,7 @@ BEGIN TRY
,RawResource = 0xF -- "invisible" value
,SearchParamHash = NULL
,HistoryTransactionId = @TransactionId
,DeCompressedSize = 0
WHERE EXISTS (SELECT * FROM @PreviousSurrogateIds WHERE TypeId = ResourceTypeId AND SurrogateId = ResourceSurrogateId AND KeepHistory = 0)
ELSE
DELETE FROM dbo.Resource WHERE EXISTS (SELECT * FROM @PreviousSurrogateIds WHERE TypeId = ResourceTypeId AND SurrogateId = ResourceSurrogateId AND KeepHistory = 0)
Expand Down Expand Up @@ -159,10 +196,20 @@ BEGIN TRY
--EXECUTE dbo.LogEvent @Process=@SP,@Mode=@Mode,@Status='Info',@Start=@st,@Rows=@AffectedRows,@Text='Old rows'
END

INSERT INTO dbo.Resource
( ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, TransactionId )
SELECT ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, @TransactionId
FROM @Resources
IF @HasDecompressedSize = 1
BEGIN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for BEGIN/END

INSERT INTO dbo.Resource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation does not match the rest of code

( ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, TransactionId, DecompressedSize )
SELECT ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, @TransactionId, DecompressedSize
FROM @WorkingResources
END
ELSE
BEGIN
INSERT INTO dbo.Resource
( ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, TransactionId )
SELECT ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, @TransactionId
FROM @WorkingResources
END
SET @AffectedRows += @@rowcount

INSERT INTO dbo.ResourceWriteClaim
Expand Down Expand Up @@ -394,8 +441,8 @@ BEGIN TRY
END

IF @IsResourceChangeCaptureEnabled = 1 --If the resource change capture feature is enabled, to execute a stored procedure called CaptureResourceChanges to insert resource change data.
EXECUTE dbo.CaptureResourceIdsForChanges @Resources

EXECUTE dbo.CaptureResourceIdsForChanges @Resources = @Resources, @Resources_Temp = @Resources_Temp
IF @TransactionId IS NOT NULL
EXECUTE dbo.MergeResourcesCommitTransaction @TransactionId

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ CREATE TABLE dbo.CurrentResource -- This is replaced by view CurrentResource
IsRawResourceMetaSet bit NOT NULL,
SearchParamHash varchar(64) NULL,
TransactionId bigint NULL,
HistoryTransactionId bigint NULL
HistoryTransactionId bigint NULL,
DecompressedSize int NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in chat please change to DecompressedLength, as it is in line with SQL naming convention.

)
GO
DROP TABLE dbo.CurrentResource
Expand All @@ -32,7 +33,8 @@ CREATE TABLE dbo.Resource
IsRawResourceMetaSet bit NOT NULL DEFAULT 0,
SearchParamHash varchar(64) NULL,
TransactionId bigint NULL, -- used for main CRUD operation
HistoryTransactionId bigint NULL -- used by CRUD operation that moved resource version in invisible state
HistoryTransactionId bigint NULL, -- used by CRUD operation that moved resource version in invisible state
DecompressedSize int NULL

CONSTRAINT PKC_Resource PRIMARY KEY CLUSTERED (ResourceTypeId, ResourceSurrogateId) WITH (DATA_COMPRESSION = PAGE) ON PartitionScheme_ResourceTypeId(ResourceTypeId),
CONSTRAINT CH_Resource_RawResource_Length CHECK (RawResource > 0x0)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
--DROP TYPE dbo.ResourceList_Temp
GO
CREATE TYPE dbo.ResourceList_Temp AS TABLE
(
ResourceTypeId smallint NOT NULL
,ResourceSurrogateId bigint NOT NULL
,ResourceId varchar(64) COLLATE Latin1_General_100_CS_AS NOT NULL
,Version int NOT NULL
,HasVersionToCompare bit NOT NULL -- in case of multiple versions per resource indicates that row contains (existing version + 1) value
,IsDeleted bit NOT NULL
,IsHistory bit NOT NULL
,KeepHistory bit NOT NULL
,RawResource varbinary(max) NOT NULL
,IsRawResourceMetaSet bit NOT NULL
,RequestMethod varchar(10) NULL
,SearchParamHash varchar(64) NULL
,DecompressedSize INT NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in chat please change to DecompressedLength, as it is in line with SQL naming convention.


PRIMARY KEY (ResourceTypeId, ResourceSurrogateId)
,UNIQUE (ResourceTypeId, ResourceId, Version)
)
GO
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
using Microsoft.Health.Fhir.Core.Features.Persistence.Orchestration;
using Microsoft.Health.Fhir.Core.Features.Search;
using Microsoft.Health.Fhir.Core.Models;
using Microsoft.Health.Fhir.SqlServer.Features.Schema;
using Microsoft.Health.Fhir.SqlServer.Features.Schema.Model;
using Microsoft.Health.Fhir.SqlServer.Features.Storage.TvpRowGeneration;
using Microsoft.Health.Fhir.SqlServer.Features.Storage.TvpRowGeneration.Merge;
Expand Down Expand Up @@ -732,7 +733,15 @@ internal async Task MergeResourcesWrapperAsync(long transactionId, bool singleTr
cmd.Parameters.AddWithValue("@IsResourceChangeCaptureEnabled", _coreFeatures.SupportsResourceChangeCapture);
cmd.Parameters.AddWithValue("@TransactionId", transactionId);
cmd.Parameters.AddWithValue("@SingleTransaction", singleTransaction);
new ResourceListTableValuedParameterDefinition("@Resources").AddParameter(cmd.Parameters, new ResourceListRowGenerator(_model, _compressedRawResourceConverter).GenerateRows(mergeWrappers));
if (_schemaInformation.Current >= SchemaVersionConstants.DecompressedSize)
{
new ResourceList_TempTableValuedParameterDefinition("@Resources_Temp").AddParameter(cmd.Parameters, new ResourceListTempRowGenerator(_model, _compressedRawResourceConverter).GenerateRows(mergeWrappers));
}
else
{
new ResourceListTableValuedParameterDefinition("@Resources").AddParameter(cmd.Parameters, new ResourceListRowGenerator(_model, _compressedRawResourceConverter).GenerateRows(mergeWrappers));
}

new ResourceWriteClaimListTableValuedParameterDefinition("@ResourceWriteClaims").AddParameter(cmd.Parameters, new ResourceWriteClaimListRowGenerator(_model, _searchParameterTypeMap).GenerateRows(mergeWrappers));
new ReferenceSearchParamListTableValuedParameterDefinition("@ReferenceSearchParams").AddParameter(cmd.Parameters, new ReferenceSearchParamListRowGenerator(_model, _searchParameterTypeMap).GenerateRows(mergeWrappers));
new TokenSearchParamListTableValuedParameterDefinition("@TokenSearchParams").AddParameter(cmd.Parameters, new TokenSearchParamListRowGenerator(_model, _searchParameterTypeMap).GenerateRows(mergeWrappers));
Expand Down
Loading
Loading