diff --git a/docs.json b/docs.json index ce521e23..73afe052 100644 --- a/docs.json +++ b/docs.json @@ -192,7 +192,8 @@ "usage/use-case-examples/postgis", "usage/use-case-examples/prioritized-sync", "usage/use-case-examples/raw-tables", - "usage/use-case-examples/custom-write-checkpoints" + "usage/use-case-examples/custom-write-checkpoints", + "usage/use-case-examples/pre-seeded-sqlite" ] }, { diff --git a/usage/use-case-examples.mdx b/usage/use-case-examples.mdx index e7442949..ca334d24 100644 --- a/usage/use-case-examples.mdx +++ b/usage/use-case-examples.mdx @@ -18,6 +18,7 @@ The following examples are available to help you get started with specific use c + diff --git a/usage/use-case-examples/pre-seeded-sqlite.mdx b/usage/use-case-examples/pre-seeded-sqlite.mdx new file mode 100644 index 00000000..89a279bf --- /dev/null +++ b/usage/use-case-examples/pre-seeded-sqlite.mdx @@ -0,0 +1,103 @@ +--- +title: "Pre-Seeding SQLite Databases" +description: "Optimizing Initial Sync by pre-Seeding SQLite Databases." +--- + +# Overview + +When syncing with large amounts of data, it can be useful to pre-seed the SQLite database with an initial snapshot of the data. This can help to reduce the initial sync time and improve the user experience. + +To achieve this, you can run server-side processes to pre-generate and seed SQLite files. These files can then be uploaded to blob storage, such as AWS S3, Azure Blob Storage, or Google Cloud Storage, to be downloaded directly by the client applications, bypassing the initial sync process. + +The [PowerSync Node.js SDK](/client-sdk-references/node) comes in handy in these scenarios, because you can use it to on the server in Node.js applications. + + + Example repo using the a self-hosted PowerSync instance connected to a PostgreSQL database, PowerSync Node.js SDK, React Native SDK and AWS S3. + + +# Main Concepts + +## Generate a client specific JWT token +In the event you want to dynamically populate the SQLite database with data, you can generate a JWT token that is scoped to a specific client device. On the servier side application you would typically query the source database directly and fetch the IDs required in your sync rules to satisfy the conditions of the parameter queries. + +For example, if you have sync rules that look like this: + +```yaml +sync_rules: + content: | + bucket_definitions: + store_products: + parameters: SELECT id as store_id FROM stores WHERE id = request.user_id() + data: + - SELECT * FROM stores WHERE id = bucket.store_id + - SELECT * FROM products WHERE store_id = bucket.store_id +``` + +You would want to query the source database for the `store_id` and use it to generate a JWT token that is scoped to a specific client device. + +## Pre-seeding script + +A simple script to pre-seed the SQLite database would look like this: + +```typescript +async function prepareDatabase (storeId: string) { + const connector = new Connector(); + + await powersync.connect(connector); + + await powersync.waitForFirstSync(); + + const result = await powersync.execute("DELETE FROM ps_kv WHERE key = ?", ["client_id"]); + + const backupPath = `/path/to/sqlite/${storeId}.sqlite`; + + const vacuumResult = await powersync.execute(`VACUUM INTO ${backupPath}`); + await uploadFile(storeId, `${storeId}.sqlite`, backupPath); + + await powersync.close(); + await powersync.disconnect(); +} +``` + + Some critical points to note: + - You will need to wait for the first sync to complete before deleting the `client_id` key and vacuuming the database. This makes sure all of the data is synced to the database before we proceed. + - The `client_id` key is used to identify the client device and is typically set when the client connects to the PowerSync instance. So when pre-seeding the database, we need to delete the `client_id` key to avoid conflicts when the client connects to the PowerSync instance. + - It's important to note that you will need to use the [`VACUUM INTO`](https://sqlite.org/lang_vacuum.html) command to create a clean, portable SQLite database file. This will help to reduce the size of the database file and provide an optimized database file for the client to download. + - In this example the upload function is using AWS S3, but you can use any blob storage provider that you prefer. + + + +### Scheduling and Cleaning Up + +To enhance the process you can consider doing the following: +- To keep the pre-seeded SQLite databases fresh, scheudle a CRON jobs for periodic regeneration, ensuring that new clients always download the latest snapshot of the intial syncdata. +- After each run, perform some environment cleanup to avoid disk bloat. This can be done by deleting the pre-seeded SQLite database files after they have been uploaded to the blob storage. + +## Client Side Usage + +When the client applicaitons boot, before connecting to the PowerSync instance, check if the pre-seeded SQLite database exists in the blob storage. If it does, download it and use when initializing the PowerSyncDatabase class. + + + It's important to note that when the client downloads the pre-seeded SQLite database that it's stored in a permanent location on the device. This means that the database will not be deleted when the app is uninstalled or restarted. + Depending on which PowerSync SDK you are using, you may need to use framework specific methods to store the file in a permanent location on the device. For example, in React Native + Expo you can use the [`expo-file-system`](https://docs.expo.dev/versions/latest/sdk/filesystem/) module to store the file in a permanent location on the device. + + +Once the database is downloaded, insert a new client_id key into the ps_kv table and connect to the PowerSync instance. + +```typescript +async function configureDatabase() { + // Call init() first, this will ensure the database is initialized, but not connected to the PowerSync instance. + await powersync.init(); + await powersync.execute("INSERT INTO ps_kv (key, value) VALUES (?, ?)", ["client_id", "1234567890"]); + await powersync.connect(connector); +} +``` + + + It's important that you insert a new client_id key into the ps_kv table to avoid conflicts when the client connects to the PowerSync instance. + + +At this point the client should be able to connect to the PowerSync instance and sync the data as normal, bypassing the initial sync process. + + +