Backup of Shared Cloud Drive (S3) - Any Experience?

We have a couple of shared drives, each one is an independent S3 bucket. One is for all employees and the other restricted to administrative. They are using the odrive native file and folder naming so that everything is just an object with a hashed name on S3.

The issue comes into play when someone accidently deletes a file, or worse a folder, and it is no longer in their trash. I have versioning turned on for the bucket, but because everything is just some hashed object name, this seems to be a waste of money and storage and there is no way to know what object you need to find a prior version of.

Has anyone else tackled this yet and come up with a workable solution? Looking for any good feedback I can get towards a solution to this. A trashcan you can restore from within the odrive web console would have been great, but all I could find was documentation saying to use the undelete function of the underlying storage. I guess that is fine if you are using Dropbox, Google Drive, or Onedrive, but with the cost of those options and the amount we store, there is a reason we are using odrive with S3 rather than just a native solution from one of those providers. Also, the way odrive is storing the objects makes the option of S3 (versioning) useless, as previously mentioned.

Thank you in advance for any help you might provide.

Hi @John.C.Reid,
Thanks for reaching out. This is a good question. I need to think about how we can potentially handle this.

For now, I messed around a little bit with the AWS CLI and powershell. This script will allow you to list the deleted items in a bucket and print the relevant information for each item so that they can be recovered. (I am assuming you have Windows access, which may be a bad assumption, so just let me know if this isn’t going to work for you):

$bucket=$args[0]  # bucket name
$prefix=$args[1]  # prefix
$deleteMarkers=$args[2] # Look for deleted items 

# Use aws cli to list the object versions
$jsonOutput = $(aws s3api list-object-versions --bucket $bucket --prefix $prefix | ConvertFrom-Json)

# Look at versions, by default, with the option to look at DeleteMarkers
if($deleteMarkers) {
	$outputType = $jsonOutput.DeleteMarkers
}
else {
	$outputType = $jsonOutput.Versions
}

# Iterate through each item for the chosen type (Versions or DeleteMarkers)
foreach ($item in $outputType) {
	$originalPath = $item.Key
	$originalName = $originalPath.Split('/',[System.StringSplitOptions]::RemoveEmptyEntries)[1]
	$decodedObject = [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($originalName))
	$decodedObjectFromJson = $decodedObject | ConvertFrom-Json
	$decodedName = $decodedObjectFromJson.name
	$decodedId = $decodedObjectFromJson.oid
	$decodedFolder = $decodedObjectFromJson.isFolder
	$decodedSize = $decodedObjectFromJson.size
	$versionId = $item.VersionId
	$isLatest = $item.IsLatest
	echo "Decoded Object Name: $decodedName"
	echo "Decoded Object ID (odrive oid): $decodedId"
	echo "Decoded Size: $decodedSize"
	echo "Is a Folder: $decodedFolder"
	echo "Decoded Object Attributes: $decodedObject"
	echo "Original Path: $originalPath"
	echo "Version ID: $versionId"
	echo "Latest Version: $isLatest"
	echo ""
}


#$keys = $(aws s3api list-object-versions --bucket companys3testodriveoptimized --prefix $prefix | Select-String -Pattern Key | %{ "$_".Split(': "',[System.StringSplitOptions]::RemoveEmptyEntries)[1]; })
ForEach ($line in $keys)
{
	
    $originalPath = "$line".Split('/',[System.StringSplitOptions]::RemoveEmptyEntries)[1]
	$decodedObject = "$line".Split('/',[System.StringSplitOptions]::RemoveEmptyEntries)[1]
	#$decodedObject = $decodedObject.substring(0,$decodedObject.length-2)
	$decodedObject = [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($decodedObject))
	echo "Decoded Object: $decodedObject"
	echo "Original Path: $originalPath"
	echo ""
}

Example usage from powershell:
.\s3enhanced_list.ps1 companys3testodriveoptimized root

Example Output:

Decoded Object Name: SQA
Decoded Object ID (odrive oid): f33d52a7-b38c-4ded-a77b-1ded156d5592
Decoded Size:
Is a Folder: True
Decoded Object Attributes: {"modTime": null, "oid": "f33d52a7-b38c-4ded-a77b-1ded156d5592", "isFolder": true, "name": "SQA", "size": null}
Original Path: root/eyJtb2RUaW1lIjogbnVsbCwgIm9pZCI6ICJmMzNkNTJhNy1iMzhjLTRkZWQtYTc3Yi0xZGVkMTU2ZDU1OTIiLCAiaXNGb2xkZXIiOiB0cnVlLCAibmFtZSI6ICJTUUEiLCAic2l6ZSI6IG51bGx9
Version ID: B2VsEK5saUNNHKcOAJj7hIE86RozToyq
Latest Version: True

For a little bit more information on what odrive is doing:

The Enhanced odrive FS link option will store objects in the following way:

  • A root “folder” (prefix) is used to represent objects that will be in the root of the bucket hierarchy
  • File object names are a Base64 ASCII encoded json string. For example:
    • Object name: eyJtb2RUaW1lIjogMTUyMjg1MjY4MywgIm9pZCI6ICI2MzBlOWI2My1hYTQ0LTQwZTAtOWQwMi1kZGY5MzZjZTU5MDgiLCAiaXNGb2xkZXIiOiBmYWxzZSwgIm5hbWUiOiAiTXlGaWxlLnppcCIsICJzaXplIjogNzEwMDU5Nzd9
    • Object string:
      {"modTime": 1522852683, "oid": "630e9b63-aa44-40e0-9d02-ddf936ce5908", "isFolder": false, "name": "MyFile.zip", "size": 71005977}
  • Folders work the same way as the root “folder” (prefix). A unique id is used as a prefix for storing its child objects. All folders exist as prefixes in the root of the bucket.
  • There is also a pointer object that will “point” to a folder from within another folder.

If you use the Simple ‘/’ Delimited link option, the data will be saved in a format that is more accessible to other clients (like Amazon’s S3 web client browser) which will make restoration a lot easier, but then you don’t get the improved efficiency of Enhanced FS.

I do have access to Windows and Powershell, but I typically only use AWS CLI in my Linux shells because that is where my scripts are. I don’t do much with PowerShell. That said, AWS CLI is really easy to install, so I will experiment and report back.

After installing AWS CLI2, having to manually add the path to my PowerShell profile for some unknown reason (even after a restart), and configuring AWS CLI with my keys, I was able to finally run the script you provided. It did output a fairly long list of items, but unfortunately not the ones I was looking for.

Thank you for your efforts. This is an interesting problem indeed and likely not an easy solve without a facility like a global trash can for the share. However, I am sure that will somehow break the paradigm of multiple individuals each with their own odrive account.

The stop-gap fix I currently have in mind is to have a dedicated machine with enough storage to hold everything, and have it keep 100% of the content synced, from which I will run backups. The backups can run from Volume Shadow Copy snapshots, which will of course shift the multiple versioning of files and folders from S3 to the File Server. This solution of course begs the question, if you have to do this, why not setup a dedicated file server to begin with? That dedicated file server is exactly what we were attempting to avoid when we devised our current solution.

Hi @John.C.Reid,
Thanks for the response!

For the script, you will likely need to “browse” into a few more paths to find what you are looking for. When I wrote the script last night I neglected to add the ability to browse versions in additions to deleted items, which you will need to “browse” properly. I have updated the script above to allow this.

Also see below for a search script that may work better for you

For example, let’s say you are looking to recover a file that was in the path /Folder1/Folder2/File1:

  • We will start in the root
    • .\s3enhanced_list.ps1 mybucket root
    • This will list the versions in the root prefix, instead of the deleted items. In my example it should list Folder1 and I can get the odrive oid of that object (which is a pointer to another prefix). I will then use that odrive oid in the next command. In this example we will say that the odrive oid of that folder was:
      • Decoded Object Name: Folder1
      • Decoded Object ID (odrive oid): f33d52a7-b38c-4ded-a77b-1aaa6785111
  • The next command will list that folder by using the odrive oid as the prefix
    • .\s3enhanced_list.ps1 mybucket f33d52a7-b38c-4ded-a77b-1aaa6785111
    • This will list the versions in Folder1. In my example it should list Folder2:
      • Decoded Object Name: Folder2
      • Decoded Object ID (odrive oid): 4820a281-1205-4844-a966-edf4ccd2b538
  • The final command will list the deleted objects in Folder2. I have added a 1 to the command to indicate that I want to turn on listing deleted items (DeleteMarkers):
    • .\s3enhanced_list.ps1 mybucket 4820a281-1205-4844-a966-edf4ccd2b538 1
    • This should list all of the deleted items in the /Folder1/Folder2/ folder and in there should be File1:
      • Decoded Object Name: File1
      • Decoded Object ID (odrive oid): c6b7db97-14be-44ac-9e5a-6912d677e046
      • Original Path: 4820a281-1205-4844-a966-edf4ccd2b538/eyJtb2RUaW1lIjogMTUyMjg1MjY4MywgIm9pZCI6ICI2MzBlOWI2My1hYTQ0LTQwZTAtOWQwMi1kZGY5MzZjZTU5MDgiLCAiaXNGb2xkZXIiOiBmYWxzZSwgIm5hbWUiOiAiTXlGaWxlLnppcCIsICJzaXplIjogNzEwMDU5Nzd9
      • Version ID: .FLQEZscLIcfxSq.jsFJ.szUkmng2Yw6
    • The Original Path provides the path (S3 key) within the S3 bucket, so you can browse there through the S3 console browser to that path and restore that object.
    • The Version ID is the S3 version id of the object, which can alternatively be used to restore the object using the AWS CLI.

I also created this other script that will search through all of the deleted items in a bucket for a specific string. This will be much easier to use, but could be slow/intensive depending on the number of objects returned.

$bucket=$args[0]  # bucket name
$search=$args[1]  # search string 

# Use aws cli to list the object versions
$jsonOutput = $(aws s3api list-object-versions --bucket $bucket | ConvertFrom-Json)

# Iterate through each deleted item and list those matching the search string
foreach ($item in $jsonOutput.DeleteMarkers) {
	$originalPath = $item.Key
	$originalName = $originalPath.Split('/',[System.StringSplitOptions]::RemoveEmptyEntries)[1]
	$decodedObject = [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($originalName))
	$decodedObjectFromJson = $decodedObject | ConvertFrom-Json
	$decodedName = $decodedObjectFromJson.name
	$decodedId = $decodedObjectFromJson.oid
	$decodedFolder = $decodedObjectFromJson.isFolder
	$decodedSize = $decodedObjectFromJson.size
	$versionId = $item.VersionId
	$isLatest = $item.IsLatest
	if($decodedName -match $search) {
		echo "Decoded Object Name: $decodedName"
		echo "Decoded Object ID (odrive oid): $decodedId"
		echo "Decoded Size: $decodedSize"
		echo "Is a Folder: $decodedFolder"
		echo "Decoded Object Attributes: $decodedObject"
		echo "Original Path: $originalPath"
		echo "Version ID: $versionId"
		echo "Latest Version: $isLatest"
		echo ""
	}
}

Example of usage:

  • .\s3enhanced_search_deleted.ps1 companys3testodriveoptimized 'api call'

  • Output:

    Decoded Object Name: ZZZ - API CALL Test.txt
    Decoded Object ID (odrive oid): b52830c6-4b18-405b-9a23-b5a3684d293d
    Decoded Size: 5902
    Is a Folder: False
    Decoded Object Attributes: {“modTime”: 1481227267, “oid”: “b52830c6-4b18-405b-9a23-b5a3684d293d”, “isFolder”: false, “name”: “ZZZ - API CALL Test.txt”, “size”: 5902}
    Original Path: f33d52a7-b38c-4ded-a77b-1ded156d5592/eyJtb2RUaW1lIjogMTQ4MTIyNzI2NywgIm9pZCI6ICJiNTI4MzBjNi00YjE4LTQwNWItOWEyMy1iNWEzNjg0ZDI5M2QiLCAiaXNGb2xkZXIiOiBmYWxzZSwgIm5hbWUiOiAiWlpaIC0gQVBJIENBTEwgVGVzdC50eHQiLCAic2l6ZSI6IDU5MDJ9
    Version ID: B2VsEK5saUNNHKcOAJj7hIE86RozToyq
    Latest Version: True

Thank you for helping and putting so much time into this. I should note that fortunately the items deleted this time were not very important, and I am not really concerned with recovering them. However, this incident did bring to light that I really didn’t have a way to recover them, thus this thread in the forums.

I didn’t want to create a feature request, because at this point the issue is too far removed from a easy solution to request anything specific enough to act on.

Really I was curious how others handle this and if anyone has a solution in place to deal with this. If not, I am willing to assist in anyway I can towards working to a solution which might evolve into a feature that would make dealing with this easy. The issue seems to arise from using the odrive file system rather than the S3 native one. As such a solution must be tied into the odrive file system, this is where I started.

I see two potential ways forward towards a permanent solution. One being a odrive maintained trash can that is accessible via the web interface, ala previously mentioned cloud storage providers that have such. The other being odrive having some kind of insight to the S3 versioning, and having the ability to use that to restore previous versions of files or even deleted ones. I don’t know how feasible either of those options are.

Again, thank you for your time and efforts.

Hi @John.C.Reid,
No problem! I figure this information can help others that may run into this.

I have a feeling most S3 links use the Simple ‘/’ Delimited link option to maintain compatibility with other interfaces. There are detriments to this, of course, so the pros and cons have to be weighed against the use cases.

One thing that we could possibly do is allow a bucket to be linked in “deletes only” mode, where we list the latest version of a deleted item and make them available for recovery.

I think the scripts above should prove useful as a way forward, for now. I am hoping that they can provide enough utility that you do not need to create a separate, dedicated box for backup. I will also look at creating bash equivalents of them so that Windows is not needed. I can probably also create an additional one for the recovery operation, to provide a complete flow.

Thank you Tony. When I made the decision to go native odrive file system, I did so primarily due to the encouragement in the documentation to do so, and this post:

Perhaps a warning in the documentation and in the help message when you are adding the bucket could prove helpful. We say pros and cons, but I don’t remember seeing this as a con anywhere. I had even turned versioning on in the bucket because this scenario didn’t occur to me when I was making the setup decisions. I almost used the / delimiter, but having a good deal of experience with Gladinet a few years back in setting up a white labeled storage service for medical and legal, things like performance and folder and cloud file moves were on my radar.

Hi @John.C.Reid,
Good idea. I will make sure we add some information about underlying bucket versioning considerations with regards to Enhanced FS.

1 Like