Unsync folders with many files is extremely slow

I’m trying to unsync a lot of folders that contain thousands or tens of thousands of files each. odrive processes the files one by one, and it does so extremely slowly, taking on average more than a second per file.

So a folder that has 10000 files can take almost 3 hours to unsync. All in all, I’m looking at a couple of weeks, in which the machine has to stay on and online 24/7, just to unsync.

Why does odrive have to process the files one by one like this? Why can’t it simply unsync the whole folder in one fell swoop?

I understand if there are issues such as individual files that can’t be unsynced due to permission issues or broken symlinks or whatnot, but those can all be determined in a quick scan. But if the folder is in sync and none of those issues is encountered, why not just quickly delete all the files and replace the folder with a .cloudf file?

I can understand it taking a few minutes to unsync a folder with 10000 files, but not 3 hours.

Can this be fixed? Is there any way to speed this up?

Hi @noam,
odrive makes an effort to ensure that you will not be unsyncing any items that may not have been synced yet (data loss scenario). It definitely shouldn’t be taking this long under normal circumstances, though. Is this on an external or networked drive? If so, that could introduce a lot of latency.

If you are sure everything is already synced, so that there is no need for odrive to double-check, you could delete the folder, yourself, and wait for it to show up in the odrive trash. You can then “restore” that folder from the trash, which will bring it back as a placeholder file.

I understand that it has to verify that everything is synced first but, as you said, it shouldn’t take this long.

This is on my main local drive, a 1TB SSD drive.

The only indication I have that everything is synced is that odrive is no longer syncing. That is, the odrive status icon is white rather than pink. However, that apparently doesn’t always guarantee success because I’ve encountered other issues, such as broken symlinks and other weird cases, where it’s white but causes errors when unsyncing.

So I am not ready to delete the folder. That’s a pretty scary proposition, with no way to know if anything was lost, or any recourse if it has. Even restoring the folder from odrive’s cache and then syncing won’t do the trick because it will only “restore” the files odrive knows about, and none of the files that would have produced errors.

I’m guessing that part of the problem is that odrive is unsyncing each file, one at a time, one after the other. If I’m correct, then this is different than what I’d expect and might explain the poor performance. What I’d expect odrive to do is verify all the files in a folder without unsyncing any of them, and then unsync just the folder, deleting all its files in the process.

Basically, if I have a folder with a couple files:

– FOLDER
---- FILE1
---- FILE2

I expect it to follow this process:

  1. Verify that FILE1 is synced, has permissions, and isn’t a weird file
  2. Verify that FILE1 is synced, has permissions, and isn’t a weird file
  3. Verify that FOLDER is synced and has permissions
  4. Delete FOLDER (with FILE1 and FILE2) in one operation (if possible)
  5. Create FOLDER.cloudf

Hi @noam,
Can you run a quick experiment for me and test how long it takes you to unsync a single folder with 100 files in it? This should only take 1-2 seconds, so I want to see if your experience is different.

I am setting up a test right to unsync 10,000 items distributed through 100 folders on MacOS to benchmark the performance.

I @noam,
Just an update on my benchmarking: On my MacOS system it took odrive 1 minute and 32 seconds to unsync a folder containing 10,000 files distributed inside 100 folders.

Is it possible that you have an application running in the background that performs real-time scanning of the filesystem? This could be something like antivirus or other endpoint protection that is performing some work in-between odrive and the filesystem.

Hey @Tony,

I do have Malwarebytes running with real-time protection enabled.

So I ran an experiment as requested. I disabled the real-time protection and then tried to unsync a folder with 4291 files and no nested folders. Most of the files were about 150 bytes.

The folder unsynced successfully but it took about 38 minutes. That’s almost 2 seconds per file!

Hi @noam,
I have tried my benchmark on a few MacOS systems now running 10.13.6 and 10.14.6. I am seeing times between 30 seconds and 100 seconds for unsyncing 10,000 items, so I haven’t been able to reproduce what you are seeing, yet.

Can you tell me, approximately, how many local objects (files and folders) odrive is currently monitoring on your system?

Can you also run a diagnostic from the odrive menu and then message me the resulting “current_odrive_status.txt” file that is created in the root of the odrive folder, so I can take a closer look at things?

@Tony,

I have about 2 million local objects, all synced.

I’m trying to unsync about 1.6 million of them, all of which are under a single folder.

I thought I’d be able to unsync that single folder and the whole tree under it would quickly disappear.

I’ve sent the diagnostic report.

Hi @noam,
Wow! Okay, 2 million is a lot to actively track. I can see why you are trying to unsync.

I have a feeling the performance impact is due to the local tracking database size. I’ll try getting a larger overall object size for testing to see how it is impacted when the object get into the millions and also see if there is any room for performance tweaking.

Thanks @Tony.

Looking forward to hearing the results of your experiment.

Of course I don’t know how things are implemented, but it’s pretty obvious that odrive processes each file completely before moving on to the next file. When processing a folder of any size, perhaps processing its contained objects in bulk could provide an opportunity for optimization?

Hi @noam,
Can you give this build a shot and see if the performance has improved for your situation? https://www.odrive.com/s/e3a5f6ec-eb41-4b1f-887b-4ad7ae0dab1d-5d810174

Hey @Tony,

That works much better! Instead of weeks, it took about a day.

It could actually have taken just a few hours probably, or even less, but odrive didn’t sync all the files in the folder, so I had to intervene in the unsync process.

It seems odrive doesn’t sync files and folders with these patterns:

  • files and folders with a “.tmp” extension
  • folders named “##”
  • folders whose names start with a tilde “~”
  • some “.DS_Store” files (not sure why some would be synced and not others)
  • a file named “Icon?” (with the question mark)
  • a hidden folder named “.localized”

Is there a real reason these aren’t synced, or does odrive simply have problems with their names?

Hi @noam,
Great! Glad to hear it was a big improvement.

The items you listed are part of the set of names or patterns that we ignore. https://docs.odrive.com/docs/sync-changes#section--ignore-list-

The only exception in that list is ##, so I’ll need to check into that. The reason the others are on the list is because they are predominantly temporary, cache, or system files/folders that are not intended to be synced or can actually start to cause problems if synced.

Edit:
The latest versions of our desktop client now have this performance improvement.

@Tony, thanks for the reply. I understand what you’re saying, but I think that’s a problematic approach.

The fact that a file is a temp or cache file doesn’t necessarily mean that it shouldn’t be synced. For example, if I have a backup that contains temp or cache files, I may need those files when I restore the backup. Or if I’m working with someone on a shared folder, the temp and cache files may be necessary.

I can also see the other side of the coin, in which syncing those files can cause problems, including in very same the scenarios I just mentioned.

So odrive should not be opinionated. It should work the way the underlying storage mechanism works. So if I’ve got odrive syncing a Google Drive folder, I expect it to sync exactly what Google Drive would sync.

Another related issue that I encountered, while unsyncing 1.6 million files, that was caused by this behavior is that every time odrive encounters a file that it refuses to sync, an error appears when unsyncing the containing folder, and the unsync process is aborted. This makes unsyncing a very annoying process that requires a lot of manual intervention to complete.

I can understand aborting the unsync process if a file or folder that should be synced is not synced. It means that there’s potential for data loss. But I can’t understand aborting if that same file or folder is intentionally ignored by odrive. Even if I don’t agree that it should be ignored, at least by odrive’s internal logic, that should be safe. I wouldn’t expect those files and folders to be deleted, just skipped over while everything else is unsynced.

It also made me realize that odrive isn’t syncing everything, as I thought it was, which raises concerns.

I know odrive is supposed to show badges, but they don’t always show up in Finder. I also use Path Finder and they never show up there (it lacks odrive integration altogether). But even when the badges do appear, there’s no indication on the containing folders that there might be files buried deep down the folder hierarchy that might not be synced.

Hi @noam,

Over the years we have found that excluding the patterns we have chosen to exclude vastly increases the reliability and efficiency of sync. We haven’t received much push-back from users that do not want these patterns excluded, but have had lots of feedback from users who want even more patterns excluded (we added the feature to add additional exclusions for that).

We haven’t found an instance yet where system, temporary, or cache files/folders that we are excluding are required in a restore scenario. These types of items should always be “disposable”, transient, or regenerated by the application/operating system, when absent. If you know of an instance where this isn’t the case, though, please let me know.

I will pass on the request for an option to remove all non-mandatory exclusions to the product team. Enabling that type of feature would most likely be an unsupported configuration, but I understand why some users may want this.

Ignored items shouldn’t actually trigger a warning when unsyncing, so I’ll look into this. It may have to do with how these files are processed (or not processed) within the internal tracking.

We can’t unsync around the ignored items because that would throw off synchronization. You would end up with the folder structure present and then only ignored files inside. The sync engine would basically see empty folders and need to reconcile that with the remote folders that actually have content in them. In this case odrive could compensate by laying down placholders for all of the remote content, but that will likely cause confusion and create a result that is unexpected for the user.

With very large structures it can take time to show the roll-up folder badging, since odrive has to determine the state of the content inside the folder to know if synced badge can be show on the parent hierarchy. There is definitely a performance penalty and increased overhead as the scope of the objects odrive has to actively track increases. With your 2 million+ objects this was likely pretty significant, but should be much more responsive now that you have unsynced lots of content.

The Finder integration uses Apple’s Finder Extension framework. As far as I know, their framework is only available to Finder and they don’t allow it inside 3rd party applications like PathFinder, unfortunately.

You should never see a synced overlay on a folder if there are non-ignored files inside that have not yet uploaded to the remote storage. odrive will show a synced overlay if there is partial content that has been downloaded and cached, however. The synced badging basically represents the status of “dirty” files, which are files that have not been uploaded to the remote side.