Duplicate files of different sizes appear in remote storage

Hi, I’m really on a roll lately :sweat_smile:

Problem:
After odrive sync (upload), the remote storage (google drive) contains multiple copies of the same file with varying filesizes, while on the local storage there’s only 1 file.

Local:
AnyDesk_40bEKiQYqG

Remote:

The problem perhaps has to do with the fact that these video files are rendered directly to this folder, because while they are rendered, the filesize increases.

Strangely though:

  • This has been our modus operandi for ages, and we haven’t experienced any such problem before, as odrive normally detects a change in the file and puts the file on hold for upload, to check it later again.
  • I can’t see any logical relationship between the copy number (x), the date modified and the file size of the different versions of the same file. Edit: actually, the last modified version is the largest, but it does not have the highest increment number though

Hey @skander,
Does one of the main logs show the uploads of these files? If so I would like to take a look, if that is okay (you can send me a direct message).

odrive will try to look for changes in the file and abort the upload. However, if the upload is fast enough that it is able to upload the existing part of the file before it detects the change then it could end up completing the upload. Even if this happens, though, I would expect the file to be updated instead of uploaded… unless Google is not providing the file in the list it gives back to us (eventual consistency again), so we don’t think its an update, but a new upload, because the file “doesn’t exist” on the remote.

There are definitely several things you are seeing that are pointing to latency in Google listing remote changes, where we are getting back stale metadata and then acting on that. Maybe they are experiencing load issues recently in your region? I may be able to see more from the main logs. Unfortunately the diagnostic doesn’t cover this upload sequence (already rolled off).

I mentioned that we could disable immediate local change reporting to try to slow things down a bit and reduce the churn. I think this may be worth a shot based on what you are seeing recently and the evidence of what looks like frequent stale information being reported back by Google.

Here is the documentation on the advanced option: https://docs.odrive.com/docs/advanced-client-options#disablefsevents

You will simply change disableFSEvents from false to true.

You will also want to reduce the scan interval from its default, so change localScanIntervalSecs from 3600 to 600 (10 minutes).

Hi @Tony,

I think I have the log with all the information (see dm).

I’m up to try your suggested workaround. I’m wondering what it will do to our workflow. For instance, we have occasions where we create a folder with images, to share with clients for previewing purposes. Normally we’re used to the fact that after creating the folder locally and putting images in it, it is practically instantly synced to the remote storage, and we can proceed to share the remote storage folder link (Google Drive folder sharing) with the client.

So this would mean that we would have to wait a maximum of 10 minutes (for the next scan cycle), plus how long it takes to fully perform the scan, before we’d see the changes reflected on the remote storage, right?

I’m estimating that our total amount of storage synced with odrive is about 55 TB. Taking into account this size, is there anything else needed to be considered? Will the scan take relatively long? Will the scan every 10 minutes put a big I/O strain on our local storage?

Hi @skander,

That is correct.

The number of folders that need to be traversed will be the primary aspect for the intensity of the scan, so the total size of the data is actually not the deciding factor. It may be worth just testing it out and observing the impact to determine what the right balance of speed to pick-up vs overhead for scanning is.

I am also looking at what changes could be made to the file change detection routine. We may be able to tighten that up a bit, too.

Hi @skander,
Here is a build that tweaks a couple of things related to the issues you are seeing:
Officially released now

This build tightens up the file change detection during upload, which should reduce the number of duplicate files you are seeing when a file is being written to. This also adds a slight delay in responding to file system events, which should reduce the new folder name flip-flopping behavior you are seeing, as long as the new folder name is entered fairly quickly (within ~ 8 seconds, or so).

These won’t fix the root cause of getting stale data from Google, but it should help, and I feel they are improvements for general use.

I really appreciate your help in helping me track some of this stuff down!

Thanks @Tony

Sounds like it’s worth trying this version without using the aforementioned workarounds, and see if it works out. Installing it now!