-
Notifications
You must be signed in to change notification settings - Fork 569
DecompressedSize added to Resource table #5274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| ## ADR: FHIR Ingested Data Size Calculation | ||
|
|
||
| Pull Requests: [Initial Change](https://github.com/microsoft/fhir-server/pull/4856) | ||
|
|
||
|
|
||
| ### Problem Statement | ||
| - Persist the decompressed size of each resource. | ||
| - Calculate total data size using ingested volume of resources and total index size | ||
|
|
||
| ### Context | ||
| To support AHDS pricing strategy shift from used storage to ingested volume | ||
|
|
||
| ### Implementation Details | ||
|
|
||
| #### Schema Changes, Resource Persistence Logic, Data Backfill | ||
| Add DecompressedSize column to Resource table: | ||
| - Column: DecompressedSize INT NULL | ||
| - Stores the uncompressed size of each resource in bytes | ||
| - Nullable to support gradual rollout and historical data backfill | ||
|
|
||
| Parameter table entries: | ||
| - FHIR_TotalDataSize: Stores ( total ingested data size + total index size) in GB | ||
| - FHIR_TotalIndexSize: Stores total index size in GB | ||
| - Both entries include timestamp of last calculation | ||
|
|
||
| Modify all resource write operations to: | ||
| - Calculate decompressed size before compression | ||
| - Pass DecompressedSize value to data layer. | ||
| - Populate the new column for all new/updated resources | ||
|
|
||
| Historical Data Backfill | ||
| - Create a one-time migration script to calculate and populate DecompressedSize for all historical records. | ||
| - Execute updates in batches to minimize performance impact. | ||
|
|
||
| #### Background Calculation Job | ||
| Implement a periodic background job that runs every 4 hours to: | ||
|
|
||
| Calculate metrics: | ||
| - Sum of decompressed resource sizes (ingested volume) | ||
| - Sum of compressed resource sizes (actual storage) | ||
| - Total database used space (from SQL Server DMVs) | ||
| - Total index size = Total used space - Compressed resource size | ||
| - Total data size = Decompressed resource size + Total index size | ||
|
|
||
| Persist results: | ||
| - Update Parameters table with new metrics | ||
| - Include timestamp for each update | ||
|
|
||
| Emit notification: | ||
| - Publish TotalDataSizeNotification event containing: | ||
| - DateTimeOffset: Timestamp of calculation | ||
| - TotalDataSizeInGB: Total ingested volume + Total index size (decimal) | ||
| - TotalIndexSizeInGB: Index overhead only (decimal) | ||
|
|
||
| ### Implementation Phases | ||
|
|
||
| - Phase 1: Schema Changes, Resource Persistence Logic | ||
| - Phase 2: Data Backfill | ||
| - Phase 3: Background Calculation Job | ||
|
|
||
| ### Status | ||
| Proposed | ||
|
|
||
| ### Performance Metrics | ||
|
|
||
| **Historical Data Backfill Performance:** | ||
| - Estimated completion time: 8 hour per 1TB of existing data on 32vCores | ||
| - Processing occurs in batches to minimize performance impact during schema upgrade | ||
|
|
||
| **Background Calculation Job Performance:** | ||
| - Small database (3TB): Approximately 2 minutes per calculation cycle | ||
| - Large database (128TB): Approximately 4 hours per calculation cycle | ||
| - Job frequency: Runs every 4 hours to maintain current metrics | ||
| - Database size correlation: Calculation time scales linearly with database size | ||
|
|
||
| ### Consequences | ||
| - Background job adds periodic database load every 4 hours | ||
| - Failure in job does not impact core FHIR server functionality | ||
| - Falure in job results in stale data size metrics until next successful run |
Large diffs are not rendered by default.
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -111,5 +111,6 @@ public enum SchemaVersion | |
| V99 = 99, | ||
| V100 = 100, | ||
| V101 = 101, | ||
| V102 = 102, | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,24 @@ | ||
| CREATE PROCEDURE dbo.CaptureResourceIdsForChanges @Resources dbo.ResourceList READONLY | ||
| CREATE PROCEDURE dbo.CaptureResourceIdsForChanges | ||
| @Resources dbo.ResourceList READONLY, | ||
| @Resources_Temp dbo.ResourceList_Temp READONLY | ||
| AS | ||
| set nocount on | ||
| -- This procedure is intended to be called from the MergeResources procedure and relies on its transaction logic | ||
| INSERT INTO dbo.ResourceChangeData | ||
| ( ResourceId, ResourceTypeId, ResourceVersion, ResourceChangeTypeId ) | ||
| SELECT ResourceId, ResourceTypeId, Version, CASE WHEN IsDeleted = 1 THEN 2 WHEN Version > 1 THEN 1 ELSE 0 END | ||
| FROM @Resources | ||
| WHERE IsHistory = 0 | ||
|
|
||
| IF EXISTS (SELECT 1 FROM @Resources_Temp) | ||
| BEGIN | ||
| INSERT INTO dbo.ResourceChangeData | ||
| ( ResourceId, ResourceTypeId, ResourceVersion, ResourceChangeTypeId ) | ||
| SELECT ResourceId, ResourceTypeId, Version, CASE WHEN IsDeleted = 1 THEN 2 WHEN Version > 1 THEN 1 ELSE 0 END | ||
| FROM @Resources_Temp | ||
| WHERE IsHistory = 0 | ||
| END | ||
| ELSE | ||
| BEGIN | ||
| INSERT INTO dbo.ResourceChangeData | ||
| ( ResourceId, ResourceTypeId, ResourceVersion, ResourceChangeTypeId ) | ||
| SELECT ResourceId, ResourceTypeId, Version, CASE WHEN IsDeleted = 1 THEN 2 WHEN Version > 1 THEN 1 ELSE 0 END | ||
| FROM @Resources | ||
| WHERE IsHistory = 0 | ||
| END | ||
| GO |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,6 +11,7 @@ CREATE PROCEDURE dbo.MergeResources | |
| ,@TransactionId bigint = NULL | ||
| ,@SingleTransaction bit = 1 | ||
| ,@Resources dbo.ResourceList READONLY | ||
| ,@Resources_Temp dbo.ResourceList_Temp READONLY | ||
| ,@ResourceWriteClaims dbo.ResourceWriteClaimList READONLY | ||
| ,@ReferenceSearchParams dbo.ReferenceSearchParamList READONLY | ||
| ,@TokenSearchParams dbo.TokenSearchParamList READONLY | ||
|
|
@@ -33,8 +34,43 @@ DECLARE @st datetime = getUTCdate() | |
| ,@DummyTop bigint = 9223372036854775807 | ||
| ,@InitialTranCount int = @@trancount | ||
| ,@IsRetry bit = 0 | ||
|
|
||
| DECLARE @Mode varchar(200) = isnull((SELECT 'RT=['+convert(varchar,min(ResourceTypeId))+','+convert(varchar,max(ResourceTypeId))+'] Sur=['+convert(varchar,min(ResourceSurrogateId))+','+convert(varchar,max(ResourceSurrogateId))+'] V='+convert(varchar,max(Version))+' Rows='+convert(varchar,count(*)) FROM @Resources),'Input=Empty') | ||
| ,@HasDecompressedSize bit = 0 | ||
|
|
||
| -- Create working table and populate from appropriate source | ||
| DECLARE @WorkingResources TABLE | ||
| ( | ||
| ResourceTypeId smallint NOT NULL | ||
| ,ResourceSurrogateId bigint NOT NULL | ||
| ,ResourceId varchar(64) COLLATE Latin1_General_100_CS_AS NOT NULL | ||
| ,Version int NOT NULL | ||
| ,HasVersionToCompare bit NOT NULL -- in case of multiple versions per resource indicates that row contains (existing version + 1) value | ||
| ,IsDeleted bit NOT NULL | ||
| ,IsHistory bit NOT NULL | ||
| ,KeepHistory bit NOT NULL | ||
| ,RawResource varbinary(max) NOT NULL | ||
| ,IsRawResourceMetaSet bit NOT NULL | ||
| ,RequestMethod varchar(10) NULL | ||
| ,SearchParamHash varchar(64) NULL | ||
| ,DecompressedSize INT NULL | ||
| ) | ||
|
|
||
| IF EXISTS (SELECT 1 FROM @Resources_Temp) | ||
| BEGIN | ||
| SET @HasDecompressedSize = 1 | ||
| INSERT INTO @WorkingResources | ||
| (ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, HasVersionToCompare, KeepHistory, DecompressedSize) | ||
| SELECT ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, HasVersionToCompare, KeepHistory, DecompressedSize | ||
| FROM @Resources_Temp | ||
| END | ||
| ELSE | ||
| BEGIN | ||
| INSERT INTO @WorkingResources | ||
| (ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, HasVersionToCompare, KeepHistory, DecompressedSize) | ||
| SELECT ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, HasVersionToCompare, KeepHistory, NULL | ||
| FROM @Resources | ||
| END | ||
|
|
||
| DECLARE @Mode varchar(200) = isnull((SELECT 'RT=['+convert(varchar,min(ResourceTypeId))+','+convert(varchar,max(ResourceTypeId))+'] Sur=['+convert(varchar,min(ResourceSurrogateId))+','+convert(varchar,max(ResourceSurrogateId))+'] V='+convert(varchar,max(Version))+' Rows='+convert(varchar,count(*)) FROM @WorkingResources),'Input=Empty') | ||
| SET @Mode += ' E='+convert(varchar,@RaiseExceptionOnConflict)+' CC='+convert(varchar,@IsResourceChangeCaptureEnabled)+' IT='+convert(varchar,@InitialTranCount)+' T='+isnull(convert(varchar,@TransactionId),'NULL')+' ST='+convert(varchar,@SingleTransaction) | ||
|
|
||
| SET @AffectedRows = 0 | ||
|
|
@@ -60,7 +96,7 @@ BEGIN TRY | |
| IF @InitialTranCount = 0 | ||
| BEGIN | ||
| IF EXISTS (SELECT * -- This extra statement avoids putting range locks when we don't need them | ||
| FROM @Resources A JOIN dbo.Resource B ON B.ResourceTypeId = A.ResourceTypeId AND B.ResourceSurrogateId = A.ResourceSurrogateId | ||
| FROM @WorkingResources A JOIN dbo.Resource B ON B.ResourceTypeId = A.ResourceTypeId AND B.ResourceSurrogateId = A.ResourceSurrogateId | ||
| --WHERE B.IsHistory = 0 -- With this clause wrong plans are created on empty/small database. Commented until resource separation is in place. | ||
| ) | ||
| BEGIN | ||
|
|
@@ -69,15 +105,15 @@ BEGIN TRY | |
| INSERT INTO @Existing | ||
| ( ResourceTypeId, SurrogateId ) | ||
| SELECT B.ResourceTypeId, B.ResourceSurrogateId | ||
| FROM (SELECT TOP (@DummyTop) * FROM @Resources) A | ||
| FROM (SELECT TOP (@DummyTop) * FROM @WorkingResources) A | ||
| JOIN dbo.Resource B WITH (ROWLOCK, HOLDLOCK) ON B.ResourceTypeId = A.ResourceTypeId AND B.ResourceSurrogateId = A.ResourceSurrogateId | ||
| WHERE B.IsHistory = 0 | ||
| AND B.ResourceId = A.ResourceId | ||
| AND B.Version = A.Version | ||
| OPTION (MAXDOP 1, OPTIMIZE FOR (@DummyTop = 1)) | ||
|
|
||
| -- If all resources being merged are already in the resource table with updated versions this is a retry and only search parameters need to be updated. | ||
| IF @@rowcount = (SELECT count(*) FROM @Resources) SET @IsRetry = 1 | ||
| IF @@rowcount = (SELECT count(*) FROM @WorkingResources) SET @IsRetry = 1 | ||
|
|
||
| IF @IsRetry = 0 COMMIT TRANSACTION -- commit check transaction | ||
| END | ||
|
|
@@ -92,7 +128,7 @@ BEGIN TRY | |
| INSERT INTO @ResourceInfos | ||
| ( ResourceTypeId, SurrogateId, Version, KeepHistory, PreviousVersion, PreviousSurrogateId ) | ||
| SELECT A.ResourceTypeId, A.ResourceSurrogateId, A.Version, A.KeepHistory, B.Version, B.ResourceSurrogateId | ||
| FROM (SELECT TOP (@DummyTop) * FROM @Resources WHERE HasVersionToCompare = 1) A | ||
| FROM (SELECT TOP (@DummyTop) * FROM @WorkingResources WHERE HasVersionToCompare = 1) A | ||
| LEFT OUTER JOIN dbo.Resource B -- WITH (UPDLOCK, HOLDLOCK) These locking hints cause deadlocks and are not needed. Racing might lead to tries to insert dups in unique index (with version key), but it will fail anyway, and in no case this will cause incorrect data saved. | ||
| ON B.ResourceTypeId = A.ResourceTypeId AND B.ResourceId = A.ResourceId AND B.IsHistory = 0 | ||
| OPTION (MAXDOP 1, OPTIMIZE FOR (@DummyTop = 1)) | ||
|
|
@@ -119,6 +155,7 @@ BEGIN TRY | |
| ,RawResource = 0xF -- "invisible" value | ||
| ,SearchParamHash = NULL | ||
| ,HistoryTransactionId = @TransactionId | ||
| ,DeCompressedSize = 0 | ||
| WHERE EXISTS (SELECT * FROM @PreviousSurrogateIds WHERE TypeId = ResourceTypeId AND SurrogateId = ResourceSurrogateId AND KeepHistory = 0) | ||
| ELSE | ||
| DELETE FROM dbo.Resource WHERE EXISTS (SELECT * FROM @PreviousSurrogateIds WHERE TypeId = ResourceTypeId AND SurrogateId = ResourceSurrogateId AND KeepHistory = 0) | ||
|
|
@@ -159,10 +196,20 @@ BEGIN TRY | |
| --EXECUTE dbo.LogEvent @Process=@SP,@Mode=@Mode,@Status='Info',@Start=@st,@Rows=@AffectedRows,@Text='Old rows' | ||
| END | ||
|
|
||
| INSERT INTO dbo.Resource | ||
| ( ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, TransactionId ) | ||
| SELECT ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, @TransactionId | ||
| FROM @Resources | ||
| IF @HasDecompressedSize = 1 | ||
| BEGIN | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need for BEGIN/END |
||
| INSERT INTO dbo.Resource | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indentation does not match the rest of code |
||
| ( ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, TransactionId, DecompressedSize ) | ||
| SELECT ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, @TransactionId, DecompressedSize | ||
| FROM @WorkingResources | ||
| END | ||
| ELSE | ||
| BEGIN | ||
| INSERT INTO dbo.Resource | ||
| ( ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, TransactionId ) | ||
| SELECT ResourceTypeId, ResourceId, Version, IsHistory, ResourceSurrogateId, IsDeleted, RequestMethod, RawResource, IsRawResourceMetaSet, SearchParamHash, @TransactionId | ||
| FROM @WorkingResources | ||
| END | ||
| SET @AffectedRows += @@rowcount | ||
|
|
||
| INSERT INTO dbo.ResourceWriteClaim | ||
|
|
@@ -394,8 +441,8 @@ BEGIN TRY | |
| END | ||
|
|
||
| IF @IsResourceChangeCaptureEnabled = 1 --If the resource change capture feature is enabled, to execute a stored procedure called CaptureResourceChanges to insert resource change data. | ||
| EXECUTE dbo.CaptureResourceIdsForChanges @Resources | ||
|
|
||
| EXECUTE dbo.CaptureResourceIdsForChanges @Resources = @Resources, @Resources_Temp = @Resources_Temp | ||
| IF @TransactionId IS NOT NULL | ||
| EXECUTE dbo.MergeResourcesCommitTransaction @TransactionId | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,7 +14,8 @@ CREATE TABLE dbo.CurrentResource -- This is replaced by view CurrentResource | |
| IsRawResourceMetaSet bit NOT NULL, | ||
| SearchParamHash varchar(64) NULL, | ||
| TransactionId bigint NULL, | ||
| HistoryTransactionId bigint NULL | ||
| HistoryTransactionId bigint NULL, | ||
| DecompressedSize int NULL | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed in chat please change to DecompressedLength, as it is in line with SQL naming convention. |
||
| ) | ||
| GO | ||
| DROP TABLE dbo.CurrentResource | ||
|
|
@@ -32,7 +33,8 @@ CREATE TABLE dbo.Resource | |
| IsRawResourceMetaSet bit NOT NULL DEFAULT 0, | ||
| SearchParamHash varchar(64) NULL, | ||
| TransactionId bigint NULL, -- used for main CRUD operation | ||
| HistoryTransactionId bigint NULL -- used by CRUD operation that moved resource version in invisible state | ||
| HistoryTransactionId bigint NULL, -- used by CRUD operation that moved resource version in invisible state | ||
| DecompressedSize int NULL | ||
|
|
||
| CONSTRAINT PKC_Resource PRIMARY KEY CLUSTERED (ResourceTypeId, ResourceSurrogateId) WITH (DATA_COMPRESSION = PAGE) ON PartitionScheme_ResourceTypeId(ResourceTypeId), | ||
| CONSTRAINT CH_Resource_RawResource_Length CHECK (RawResource > 0x0) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| --DROP TYPE dbo.ResourceList_Temp | ||
| GO | ||
| CREATE TYPE dbo.ResourceList_Temp AS TABLE | ||
| ( | ||
| ResourceTypeId smallint NOT NULL | ||
| ,ResourceSurrogateId bigint NOT NULL | ||
| ,ResourceId varchar(64) COLLATE Latin1_General_100_CS_AS NOT NULL | ||
| ,Version int NOT NULL | ||
| ,HasVersionToCompare bit NOT NULL -- in case of multiple versions per resource indicates that row contains (existing version + 1) value | ||
| ,IsDeleted bit NOT NULL | ||
| ,IsHistory bit NOT NULL | ||
| ,KeepHistory bit NOT NULL | ||
| ,RawResource varbinary(max) NOT NULL | ||
| ,IsRawResourceMetaSet bit NOT NULL | ||
| ,RequestMethod varchar(10) NULL | ||
| ,SearchParamHash varchar(64) NULL | ||
| ,DecompressedSize INT NULL | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed in chat please change to DecompressedLength, as it is in line with SQL naming convention. |
||
|
|
||
| PRIMARY KEY (ResourceTypeId, ResourceSurrogateId) | ||
| ,UNIQUE (ResourceTypeId, ResourceId, Version) | ||
| ) | ||
| GO | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in chat please change to DecompressedLength, as it is in line with SQL naming convention.