Best way to sync over 400,000 files/directories

toupeiro · April 25, 2017, 3:51am

Here’s the jest of it. One personal project I’ve had (more of a batch processing exercise) is to convert the entirety of modarchive.org’s multiple tracker formats to mp3 with on-the-fly demuxing. Well. I was successful, and now I have a few Terabytes of mp3 files I want to put into the cloud. I tried to use the early beta’s of one way backup and it was successful in creating the directory structure and a few files but then it seemed to get stuck and never uploaded another byte of data. I’ve read other threads where having hundreds of thousands of files is too much for the sync agent to handle, and I am good with the cloud being the primary system of record for this data set. What, however, is the best way to get all of this data uploaded using ODrive to do it?

Tony · April 25, 2017, 1:41pm

Hi @toupeiro,
So it sounds like you are wanting to do a one-time bulk import of all of this stuff, is that correct? I will assume it is, for now, and continue.

As you may have read in other threads, sync is a somewhat heavy process compared to traditional backup, and certainly to a one-time import. There are things we can do to try to alleviate the sync overhead and simulate more of an import flow, however.

The most basic flow would be:

Copy/move the data in in batches
Wait for sync to complete on the batch
Unsync the completed batch
Repeat

This constrains the scope of work odrive needs to do to the size of the current batch. You could do this “by hand” or script it up using something like this example with the CLI:

How is the data structured? Is it naturally segmented in a way that would allow for easy batching?

toupeiro · April 28, 2017, 12:41am

This is an interesting approach
tree.txt (114.4 KB)

This is the current directory structure only showing three layers deep. below each layer here is another layer of directories with a name and inside that, a file.

Tony · April 28, 2017, 9:39pm

Hi @toupeiro,
It looks like its segmented in a way that can be conducive to this type approach. Is the option of scripting something a possibility for you?

toupeiro · April 29, 2017, 4:02pm

I think this will work but I might have to modify it slightly, or call it in a for-loop at a particular depth so that $targetdir starts a bit further down the tree, otherwise it might be too big to land into the root of my locally cached odrive home before it starts to sync.