Method to SPEED up syncs/unsync with your cloud (almost!)

cli
amazon-drive
bash
osx

#1

I have several cloud accounts but really like Odrive and lately have been experimenting with the CLI scripting

I’m finding that syncing files that are stored on my Amazon cloud account has been taking forever lately. My internet connection is like a firehose with speed tests from my Florida home to Atlanta or DC at 125 Mbps down, 32 Mbps up on average… but my Odrive files come down from Amazon Cloud like a dripping faucet, somewhere in the 10-200 kiloBITS per second on average, but it fluctuates dramatically, sitting at ZERO more often than moving…

Assuming the bottleneck between me and Amazon may (hopefully) be a “per TCP connection” throttle and not a “by IP” throttle, I decided to write a little CLI script and run multiple instances of it simultaneously, in hopes of scaling my sync speed a few times - at least

This is a bash script using the Python CLI commands that I wrote to run on Mac OS-X Sierra
(NOTE: Today I read something about possibly checking the “syncstatus” state on each file before moving on to the next, so I may need to add that code yet):

#!/bin/bash

#
#   ODRIVE INSTANCE - RECURSIVE SYNC SCRIPT
#

# ------------------------------------------------------------------------------------
#         SET THESE VALUES UNIQUE TO EACH RUN SCRIPT

# Set the instance for this program                                   
INSTANCE=1

# Number of times to restart the recursive find from the root folder
FIND_LOOP_COUNT=100

# move into desired starting root directory
cd "<YOUR DESIRED STARTING ROOT DIRECTORY>"                    

# ------------------------------------------------------------------------------------

printf "CD TO STARTING DIRECTORY, EXIT CODE 0 IS GOOD: $?\n\n"

# Global variables
TOTAL_FILE_ATTEMPTS_SO_FAR=0
AGENT_LOCATION="$HOME/.odrive-agent$INSTANCE"
DIVIDER="-----------------------------------------------------------------------------"

printf "AGENT LOCATION IS: $AGENT_LOCATION\n\n"

# kill previous process if still running
python "$AGENT_LOCATION/bin/odrive.py" shutdown
printf "SHUTDOWN PREVIOUS AGENT IF RUNNING, EXIT CODE 0 IS GOOD: $?\n\n"
sleep 3s

# install instance 
od="$AGENT_LOCATION/bin" && curl -L "https://dl.odrive.com/odrive-py" --create-dirs -o "$od/odrive.py" && curl -L "https://dl.odrive.com/odriveagent-osx" | tar -xvzf- -C "$od/" && curl -L "https://dl.odrive.com/odrivecli-osx" | tar -xvzf- -C "$od/"
printf "INSTALL INSTANCE $INSTANCE, EXIT CODE 0 IS GOOD: $?\n\n"
sleep 3s

# authenticate if necessary
#python "$AGENT_LOCATION/bin/odrive.py" authenticate [YOUR AUTH CODE HERE]
#                              GET YOUR AUTH KEY HERE: https://www.odrive.com/account/authcodes 
#
#echo "AUTHENTICATE THIS INSTANCE, EXIT CODE 0 IS GOOD: $?"

# start this agent in the background
nohup "$AGENT_LOCATION/bin/odriveagent.app/Contents/MacOS/odriveagent">/dev/null&
printf "START THE AGENT IN THE BACKGROUND, EXIT CODE 0 IS GOOD: $?\n\n"
sleep 10s

# THIS IS THE SINGLE-LINE EQUIVALENT COMMAND FOR THE FOR LOOP BELOW (I had to
# break it apart to add debug lines):
# for i in {1..100};  do find . -iname "*.cloud*" | while read f; do python "$HOME/.odrive-agent1/bin/odrive.py" sync "$f"; python "$AGENT_LOCATION/bin/odrive.py" emptytrash; done; done

# RECURSE THIS FOLDER x TIMES - SINCE SYNCED FOLDERS NEED TO BE REVISITED 
# TO THEN SYNC THIR CONTENTS
for i in {1..FIND_LOOP_COUNT}    		
	do 
		echo $DIVIDER
		printf "   NOW ON FIND LOOP $i OF $FIND_LOOP_COUNT\n\n"
		echo $DIVIDER
		
		find . -iname "*.cloud*" | while read f
        	     do			
				python "$AGENT_LOCATION/bin/odrive.py" status 
				printf "\n\n**\n\nTHIS IS INSTANCE $INSTANCE\n\n$TOTAL_FILE_ATTEMPTS_SO_FAR file sync attempts so far\n\n**\n\n"

				printf "SEARCHING FOR NEXT CLOUD FILE, CURRENT DIRECTORY IS:\n"
				printf "$PWD\n\n"
													
				printf "FIND COMMAND EXECUTED, EXIT CODE 0 IS GOOD: $?\n\n"
				
				echo $DIVIDER
				printf "CLOUDFILE FOUND, SYNCING: $f\n\n"
				python "$AGENT_LOCATION/bin/odrive.py" sync "$f"
				printf "ODRIVE SYNC/UNSYNC COMMAND EXECUTED, EXIT CODE 0 IS GOOD: $?\n\n"

				printf "EMPTYING THE TRASH\n\n"
				python "$AGENT_LOCATION/bin/odrive.py" emptytrash
				printf "ODRIVE EMPTYTRASH COMMAND EXECUTED, EXIT CODE 0 IS GOOD: $?\n\n"
				sleep 2s
				
				((TOTAL_FILE_ATTEMPTS_SO_FAR++))
   	           done 
	done

echo $?

printf "COMPLETED $FIND_LOOP_COUNT FINDS, RE-RUN FOR MORE IF NEEDED"
		
exit 0

OK… Yeah, yeah I know - lots of debug output but I need it until working.. I’m not a coder either, but this style “works for me.” lol. I write extra statement so I can open it a year from now and “read it like a book” to remember…

It seems to run every time, and sometimes even snakes it’s way through as many as 80 files, but eventually it freezes at the sync command… I’ve had as any as four instances running at once, with unpredictable results (any of 1-4 of them can freeze suddenly at any time for no apparent reason).

Anyone have any ideas on,

  1. Why it freezes at all after running fine for a long time?

  2. Why often it will sit at some random percent complete on a single file (say 32% for example but it’s never the same) for 10 minutes then suddenly take off again…

  3. How important it is for me to check the “syncstate” of the file before proceeding in the script? Since it’s a bash script it already seems to wait until the python command completes before moving to the next step anyway… right?? :slight_smile:

  4. Why separate instances seem to affect each other…
    – I disabled the main/“GUI” app and agent entirely, only running CLI agents, but when you install and use any CLI do they reference some sort of “global” resources or “folder-level” resources that any other instances might mess up cause a crash/freeze to another instance from?? I don’t know enough about how odrive.py works to understand why separate installed agents would affect each other.
    – I do see what is probably a trash file growing huge for a while then it’s deleted. Is sharing the trash file causing the freezes? If so, maybe I can just work on different root folders for each intance/agent running?
    – If you look above, you can see that each instance agent is installed in it’s own $HOME/.odrive-agent<$INSTANCE-number> agent folder…
    – For example, when I start Instance 1 above, wait a few minutes, then install/create and start an Instance 2 in a separate Terminal window, Instance 1 SOMETIMES (not always) goes absolutely berzerk with errors…
    – yesterday I had four instances running fine in parallel and unsyncing files in the same directory for over an hour (like 20 GB of files!)! then all four instances running in separate terminal windows froze together, and CTRL-C couldnt even stop them…

  5. How to get this to work? Given this “per-thread throttling to Amazon Cloud” issue, I’d like to run as many as 10 of these at a time…

You might ask “why” would someone want to sync everything?? Well, simple. My files are an absolute mess in this encrypted odrive folder, and the only fast and clean way I know of to remove thousands of duplicate photos and reorganize all folders is to download everything, clean it up locally, then give it a month to all unsync back “up”… I have 6 TB of data tucked away in this particular Odrive folder, so yeah, this will take a while. :slight_smile: Hence the need for some parallelism!

Any help appreciated with getting this to work. I’d like to see 10 threads go for days without fail!

Marc


#2

Hi @spacecommguy,
Thanks for the extensive writeup and contributing to the forum. I love the enthusiasm :slight_smile:.

You are correct that Amazon Drive seems to be susceptible to slow single-instance downloads/uploads and benefits significantly from concurrent transfers, if you bandwidth can support it.

I haven’t actually tested running multiple agent instances as the same time, but I would guess that there could be problems doing so. All of the instances will share the same persistence and config, which may be causing some race conditions and locking out of the instances while one tries to write to the persistence layer. The same applies to the downloads. Downloads are done to a temporary file and then renamed once the file has finished. I am guessing that you may end up with parallel downloads of the same files as each instance is blissfully ignorant that another instance has already started downloading that file. It will also increase the overhead quite a bit.

Have you taken a look at this post here?

These are a few iterations of some one-liners, including running parallel downloads. You could parse one of the examples out into proper script form, with better feedback and logging.

I actually just used a variant of this one, yesterday, to download 2TB of data from Google Drive in one go:

exec 6>&1;num_procs=10;output="go"; while [ "$output" ]; do output=$(find "$HOME/odrive-agent-mount/Dropbox/" -name "*.cloud*" -print0 | xargs -0 -n 1 -P $num_procs "$HOME/.odrive-agent/bin/odrive.py" sync | tee /dev/fd/6); done

I was averaging about 200MB/sec download with 10 concurrent transfers against a single agent instance.


#3

Wow that script looks awesome~

I attempted to run it tho, and I’m getting a response of:

xargs: /Users/root1/.odrive-agent/bin/odrive.py: Permission denied
[1]+ Killed: 9 nohup “$HOME/.odrive-agent1/bin/odriveagent.app/Contents/MacOS/odriveagent” > /dev/null

have to go research this… Once I get it working I’ll open it up!


#4

You may need to add “python” to that command:

exec 6>&1;num_procs=10;output="go"; while [ "$output" ]; do output=$(find "$HOME/odrive-agent-mount/Dropbox/" -name "*.cloud*" -print0 | xargs -0 -n 1 -P $num_procs python "$HOME/.odrive-agent/bin/odrive.py" sync | tee /dev/fd/6); done


#5

Maybe!

In the meantime I went and did a chmod 777 on $HOME/.odrive-agent1/bin/odrive.py, and re-ran it… and… It’s running!

BUT same problem as before… Ramped up like crazy then throttled back - within a minute… ugh… CPU slowed to 20%… and there’s a stream of statements in the terminal window alternating 61 and 65 bytes remaining…

But CLOSER! Exciting… ttfn


#6

Hmm, that is odd.

Just to test, I am running the same command against my own Amazon Drive and it is flying along at 15-25MB/sec. Its gone through about 12GB so far. It will slowdown when it hits a bunch of folders to expand, or smaller files, but I don’t see any hitches. Keep in mind that doing this will spike your CPU. My Mac sound a bit like a jet at the moment.

I am running this against the odrive desktop client, however. I would recommend you try that, as the desktop client is a little more robust and running a slightly newer version of the sync engine than the agent. Additionally, if you run the desktop client you can send over a diagnostic for me to take a look at if things hitch again.

When running this, make sure you only have one instance of the odrive engine running at a time. So don’t run agent and the desktop client at the same time, for example.


#7

Thanks Tony. I did, to no avail. I’m out of time and am doing each file manually now. See related thread here: Need a fix for “No options available”