-
Notifications
You must be signed in to change notification settings - Fork 74
[Proposal] Add a delegate method that handles db conflict for records being inserted from the cloud #360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hi @hishnash, thanks for sharing this proposal. I’ve also been looking into customization of upserts recently, though from a slightly different angle. In my case, I’m after customizable conflict resolution rather than schema changes (see #272). I did consider hooking into the delegate as well, but decided against it in the end, as I found it’s better expressed via a more tailored API. That said, I wanted to comment on your scenario, because I think that even with this hook you would still run into issues around cross-version compatibility. While newer versions of the app could be made to interpret both the old and the new schema variants, older versions would continue to assume the legacy shape. This can easily lead to errors (e.g. if the original enum-based
I’m afraid that, in a distributed sync environment, you’ll generally need some form of overlap where both representations are handled if older app versions are expected to keep working alongside newer ones. There may be better ways of structuring this, but I don’t think removing fields is an appropriate measure here, which also aligns with SQLiteData’s guidance around disallowed migrations. |
|
Hi @hishnash, for this kind of proposal it would help to see a concrete example of how it could be used.
Unfortunately the reality is that DB schemas must be backwards compatible, and there really is no way around it. It's the only way to allow for old and new schemas to communicate with each other. The delegate action you are proposing may work for new schemas interpreting data from old schemas, but it does nothing to help old schemas interpret data from new schemas. And it can't possibly because apps with the old schema were shipped long before we knew the demands on the new schema. This does mean that over time your schema may become less than ideal because you are forced to make compromised decisions to keep backwards compatibility. Sometimes that can be mitigated with various techniques, but also it is just the reality by having a schema defined on device as well as remotely on iCloud.
I mentioned this in our conversation over Slack, but this is not how SQLiteData is supposed to work, and if you are seeing this behavior it would be a bug. We have test coverage on this exact scenario, and I just added more, and so I feel somewhat confident that it works. Each version of the app keeps around a full But if you definitely saw this data loss then we will need a lot more info on how to reproduce it. There could definitely be a bug in our code, and we would rather fix that than consider all new ways to provide compatibility between different schemas. |
|
I did responsed in the slack channel but will it here as well. I wander if our issue stems from the fact that we originally creates some of our records as fixtures within DB migrations (thus outside the runtime of the Sync engine) so the sync engine never created a As to providing backwards comparability for users that have 2 devices where one is on the new version and the other is on the old version. We do not intended to support that and have already trigged prompts on users devices to show up telling them they much update the device that is on the older version. As in the example above if one were to keep the I would love it if it were possible to setup some SQLight triggers that are scoped to only run for sync insert/update/deletes but I dont think SQLight supports transition stopping on these so any other updates that happen in the background at the same time a the sync is running would end up triggering it. Also I am a little concerned that such triggers might make the mentioned |
Yes this would not be a good thing to do in general. You should not make any writes to the database before the sync engine is setup.
That is an example you provided yourself in the original description of this PR, so I assumed it was relevant to you. Also, is it really the case that if you release the app twice just a few days apart you are going to force people who download the first app to upgrade to the newest one? I can only imagine that is going to be a very frustrating experience to force all users to be on the absolute newest version of the app.
Yep, that is a good solution, and exactly what we suggest. And you can distinguish writes to the database from your app versus writes made by the sync engine using |
So we cant do data migrations (or fixture population) within a traditional DB migration flow? We would need to create some other follow up hook that runs as a side effect of having just run a schema migration. Eg if schema migration 0001 just ran then after sync engine starts run schema migration 0001's related data migration. Or would there be a way we can just track what records have been updated/inserted/deleted during the data migration and then tap the sync engine on the shoulder when it starts up (in effect calling the same methods in the sync engine that the db triggers would have called had the changes happened while it was running)? The reason I mention multiple versions is that when the data migration runs (whenever it is) we cant be certain all changes have been synced up from other devices let alone synced down to the users device that has the new version. So the user on the old device might well get a prompt telling them to upgrade (this only happens if there is a schema change of course so not ever update) but while that prompt is on screen the that users device might well still have pending changes it has not yet streamed up to iCloud so even through the shame has effectively changed updates (and inserts) with the old schema will still be in flight on the wire. I am a little concerned that as the app evolves if we depend on all migrations to use DB trigger we are going to end up with a very messy schema that is very difficult to reason with. |
Any changes made to your database before the This is just the complication of having each device be in control of the schema of the database, and so you have to be careful with these things. You can't just approach the problems exactly as you would in a more traditional database set up prior to synchronization.
You can always keep track of the rows you inserted into the DB during the data migration and then after the sync engine is set up do the following: let idsCreatedDuringMigration = […]
try await database.write { db in
try Record
.find(idsCreatedDuringMigration)
.update { $0.id = $0.id }
.execute(db)
}That will cause the sync engine to see the records change (even though nothing really changed), and will enqueue changes into
What do you mean "all migrations to use DB trigger"? You should be designing your schema carefully enough so that these kinds of problems are rare. This should not be happening all of the time. And again, this is just the complication of having each device be in control of the schema. The same is true of SwiftData, and honestly the problem is much more pernicious over there. And to go back to the original purpose of this PR, you still need to explain exactly how this change will be useful. As far as I can tell, it does not help when older devices receive data from newer schemas. Are you sure this actually solves anything? Can you show a concrete example of how it would be used? |
Correct however I don't think that is something one can deal with anyway, at least not if you want to let that older version then make subsequent edits. The issue is that older version does not know about the new shame nor does it know about the constraints that might existing upon it within logic let alone db constraints. What I have done in the pastIn projects in the past were I have dealt with the ability for database schema to diverge (due to child db shards being on-prem at remote off grid locations that later need to sync in changes when they come online) the most robust solution that avoids conflicts as been to allow only one directional data flow. A new version of the DB can accept changes from an old version (and apply needed data migrations before writing them into the db) But a new version of the shamea can not be accepted by an older version as this leads to the described loops were local validation logic on the older version is unable to matinain logical constraints that the newer version requires. ## In the app we are woking on right now In our situations we would impose this by syncing through a schema version information and if a client is on an older schema version than the most recent version seen on any record we would place the client into a read only state with a related prompt to inform the user to update the app. This avoids the possibility of the older version making changes that break things and keeps the schema clean. For us our users will be very update if data is lost or corrupted, and it is not data they can easily recreate as it would require them to recall exactly what they did and when. The above PR might well not be the best way to provide a data migration prior to insert. Maybe what would be better is have the method that creates the query instead create a dictionary of values (along with mutation data and schema version etc) and then pass that (if configured) to a delegate or other method that can mutate that data before the insert is evaluated. Or create a proxy table that as all the fields (as optional) or all schema versions (so you can insert records) and then at the end of the transition copy these to the real tables. With the option to replace that copy operation with a custom operation (this is what we did in past projects to allow us to stream in changes from out of data remote db clients that were async and thus not up to data with the scheme) in effect we wrote a custom insert into real table from proxy table data migration. What was very nice about this is that it was easy to test, we could just load different versions of data into the proxy table and test the SQL that did the select upset/delete into the real tables and let our streaming from remote still do bulk inserts so we could handle very large blocks of data in one go rapidly. However this lib currently at lest just do upsert/delete on a per record basis so maybe it is simpler to go with the option were a handler method in swift is called that mutates the values to match. (of course this is more limited from a data migration perceptive as it is per row were a SQL level operation can do things like merge rows when inserting into the real tables). |
While it is nice to assume we can keep our db schema backwards compatible the reality is this leads to issues.
For example we we had a
PostalAddresstable that has an fieldenumfor addressType and later wanted to let users extend this the logic step would be to create a new tableAddressTypeand run a db migration that creates a row in that table for each enum case and update thePostalAddressto have a column that points to thisAddressTypetable with a data migration to set the corresponding FK value into this filed onPostalAddress.To ensure our local schema is clean we do not want to keep an old
addressTypeenum filed alongside a new optionaladdressTypeIDand then ensure every reference in the app that lookups up one also checks the other... so we removed the oldaddressTypefield after doing the data migration that created aAddressTyperow for each referenced address type and set the correspondingaddressTypeIDvalue.But now imagine a user that had been using that app on the old version, removed the app from their device and then installed the app again (or more commonly got a new phone and thus got the new version of the app but without any local data).. as they open the app the data migration runs against the local (empty) database. As they keep the app open the users real data starts to sync in from iCloud, however this data has the old
addressTypeenum values attached to it rather than theUUIDof the newAddressTypetable rows. The usersPostalAddressrows will not sync as the insert request will be missing theaddressTypeID that needs to be non optional.I propose adding a
handleUpsertFromServerRecordmethod to the delegate that is called if there is aDatabaseErrorwhen trying toupserta local value. Within this method one can then write any custom logic (such as in this case create a AddressType, if needed, and include that addressTypeID in the insert request).The other use case this can be important is if the user has 2 devices and they have only updated on one device but make db record changes on the other, the update even that come over the wire may end up not including a value for the new column and thus setting it to null. This is an issue even if the new column is optional as the row itself might even already have a value set but the update that comes through ends up nuking it. Being able to run whatever data migration logic we would apply locally to these insert requests (if a DB constraint is triggered) would be a great way to avoid such issues.
Alternatives considered:
Have a method that is called before
insert/update/deletefor each DB letting the delegate modify theCKRecord(or even suppress an update or delete). This is tempting but it would increase the cost of syncing for all users all the time, whereas the implemented solution only kicks in when a db constraint is triggered is much more tightly scoped.TODO:
CKRecord