Performance Problems with large number of files

I have a MacBookPro12,1 3.1 GHz i7, 16 MB RAM running OS 10.12.3.

I am in the process of migrating away from Dropbox because after being a customer of their for nearly 10 years, but the past year has been a nightmare of performance problems. The cause seems to be because I have a large number of files which I sync (around 700K) and there are some cases where there are symbolic links inside the file structure. Also, I heavily rely up using symlinks outside of my Dropbox folder as the locations of synced files are in disparate location. I chose Amazon Drive because I already use if for streaming backups (I use Arq) and it seems to have no issue with the huge numbers of files that it creates. At $70/year and currently at 3 TB that’s a bargain. I am looking at Odrive because the Amazon Drive client is basically unusable, even now in March of 2017. Odrive also offers a lot of other tantalizing features that I would like to leverage. Right now, I am hopefully going to go with either Odrive or GoodSync. But first, I am working with Odrive and I am already having problems on my first sync.

The problem is that I am seeing similar CPU/Memory problems using odrive to sync my Dropbox folder up to Amazon Drive. Things looked like they work working swimmingly for about a day for about 24-31 hours and I watched it go through all of my subdirectories without incident. Then about a 3 hours ago things suddenly have stopped progressing at 44% of the directory which contains the bulk of my files and my machine has been brought to a near crawl. The are current 248 files in “Waiting” and the number just keeps creeping up. 've watched memory consumption climb from around 1 GB to around 2 GB in just a few hours. Looking at my router, outbound traffic which used to be a nice flat cap of uploaded data with virtually no download traffic is now a spikey mess with about the same about of upload traffic as download. I can also see in the Activity Monitor that the odrive process is receiving a tremendous amount of network traffic. If I leave it like this I am going to hit my ISP cap very quickly.

What the heck is going on?

I don’t know if support will even read it, but I will create a diagnostic report now and then I am going to have to restart because I cannot run like this and hit my cap!

Hi @DarfNader,
Large scale bulk operations can definitely spike CPU, memory, and network activity. having 3/4 of a million files to sync will take its toll on things.

Taking a look at the diagnostic I see that you have your Dropbox folder setup as a “sync to odrive” folder. I just want to make sure, but is the Dropbox application completely closed, including the Dropbox Finder extension? That could cause some issues if odrive and Dropbox are working on the same folders at the same time.

The “smoking gun” I see is that there were a lot of exceptions coming from Amazon Drive when the diagnostic was sent. The majority are:
“System unavailable. Please back off for a while and try again later”
“Failed to establish a new connection”
“Cannot read from request”

These exceptions seem to indicate some issues on Amazon Drive’s side of things. I think that Amazon may have been having problems on their backend for a while and things started backing up on the odrive side because of it.

In any case, based on what you’ve described, there are going to be challenges with your scale of data and somewhat demanding use case. We are currently working on major refinements that should improve performance and bulk processing substantially.

A few other points:

Are you still seeing this behavior after a restart? Do you see lots of items filling up the “waiting” queue?

Dropbox is completely closed- all clients are stopped. I am just trying to use oDrive as the single client. The issue with Dropbox is there is a little known file number limit in an API call which is not well documented, so a lot of sync tools get tripped up on this. Since the native syncing tool doesn’t use the API, it doesn’t have the same problem wihere, like an app, needs to generate a needs authorization code every time. I ended up using a recursive delete python script that used the API to slowly remove all of the extraneous which were way more than I needed. Also, I have moved out the symlinks that I could find, but it would be a lot easier if oDrive would either skip them automatically for cloud services that don’t support the

Amazon Drive is a pretty poor servers all and all- they will often throw errors so any client has to be pretty resilient and just keep retrying when there are failures and not overtax that API lest they be throttle so hard they can barely send at a trickle. I use a backup tool which sends over very large numbers of files, but it only rights and verifies the files. It never reads (as I have never had to do a restore- I should test that!)

I have cut down the number of files considerably- down to less than 100K. Also, I am splitting the share into several shares. Is there a way to get oDrive to just sync one share at a time? I am actually looking to use a lot of vendors to take advantage of the “Free” offerings as much as possible… It’s frankenstein, but it is a great way to save some dough! I see that there are file throttles, but is there a way to set a max size for an entire cloud share?

I am sad that you guys are behind the curve with development of your B2 support and can only do read only. Their API is actually well developed now and their CLI tool is very powerful. The segmenting of large files gets complicated, but it is totally doable for now.

Hi @DarfNader,
Thanks for the response.

Currently odrive process each linked storage account independently. There isn’t a way to “pause” a particular link, which I think it what you are asking. We are working on better concurrency and transfer rate management, at a global level, which will help to control bandwidth saturation and resource overhead.

Can you elaborate on what you mean here?

For B2, they recently implemented an alternate model that we can use. We plan to update that integration once we get over the current hurdles.

What I mean is can the total aggregate size of a synced directory be capped at a certain size so that if the size of the synced directory reached that size that odrive will stop uploading content to the cloud replica and throw an error message to that effect so you know you need to clear out some stuff before you can sync? The point of this is say that for a cloud service I need to keep my usage under a certain amount to stay a “free” customer. I would prefer to have a hard limit imposed by odrive which would prevent me from syncing more content to the cloud directory than the amount I set in order to avoid this rather than rely on keeping an eye on it. Some services have cap alarms to alert you when you exceed a certain size which would be ok, but I would prefer a hard stop. Also, major providers like Dropbox do not offer cap alarms so you would need to home spin your own monitoring.

Hi @DarfNader,
Thanks for the details. There isn’t a way to do this, and we don’t have any plans to do so, but you can make a request for it in our feature requests category.