Spotlight Search Integration

DarfNader · April 29, 2019, 7:34am

Is it possible to populate Spotlight Search with files that have not been downloaded to your system yet? If you do download them and they are included in SS, if they are unsynced, do their contents remain in the SS index so you can find them again, even if the contents are not on your system at that moment?

Tony · April 30, 2019, 3:26am

Hi @DarfNader,
Spotlight can index placeholder names, but content indexing will not be persisted once a synced file has been unsynced. To Spotlight they are two separate files, with the placeholder having a .cloud extension.

DarfNader · April 30, 2019, 5:24pm

Thanks @Tony, I suppose it would be kind of miraculous if odrive could index document contents that have never been downloaded, at least not without some sort of extension.

I do have an idea how this might work. I don’t know exactly how to make handlers for Spotlight based on file extensions or how to manually have Spotlight index a document, but what one could do is create a process that would be run a background and would process all placeholder files whose contents are not in the Spotlight database, either one at a time, in a group of optimal size, or in parallel, until every file in your cloud mount is indexed. This could process at a user-definable rate.

Sync the file(s)
Explicitly direct Spotlight index the file(s)
Direct Spotlight to NOT remove the newly indexed data when the file is replaced by a .cloudf placeholder.
After it’s indexed, Unsync the file(s) but ensure Spotlight retain the doc contents’ indexed when it gets Unsynced. (See below)

Requirements:

Configure Spotlight to treat a .cloudf file as equivalent to the files it is placeholding.
Let Spotlight remove index data when the .cloudf file is emptied from Trash as it normally would.

The only problem with this specific approach would be if changes were made to a document which you only have a placeholder it wouldn’t update Spotlight, but I suppose odrive has to catch these anyway when there are updates to files that would need to be synced when the files have been Synced. I suppose if odrive only checks for updates on files which are synced this would be a problem, but otherwise I suppose it could be managed with the same mechanism, right?

Seems reasonable for a Feature Request, no? (You know, because we both know the engineers have plenty of time for stuff like this! ) I realize this is a very “wanty” sort of request, but it would be pretty cool! I use search all of the time and to not be able to search documents that are not presently on your system is a drag and why Spotlight is pretty limited in this age of cloud computing. It really should have built-in extensions so you can add your cloud drives to it explicitly and let it do the legwork by indexing itself or plugging into the cloud drive’s native search. But this is Apple… they will more likely have a way for you to Facetime while looking like Daffy Duck before they implement functional features, amirite?

Tony · May 1, 2019, 3:23pm

Hi @DarfNader,
I certainly understand the utility of this type of feature, being a heavy MacOS user, myself. I have converted it to a feature request.

I don’t know how much of a chance there is for implementing this, both because of priority and because I’m not sure Apple supports a way to pull this off cleanly.

We are doing some work on a more seamless filesystem virtualization layer that could more readily facilitate this type of OS-level indexing (without the added placeholder file name extensions), so it may be something that is more doable in the future.

Yes, you know them well

jon1 · March 28, 2022, 3:49pm

I’ll put in a strong “second” for this feature, and not just for macOS. Every major OS has a content indexing system that stores its data on the indexed volume. I’ve hit up Google about server-side content indexes for Spotlight and Windows Search, so clients wouldn’t have to create them in the first place, but so far crickets. This is, to me, the single greatest flaw with cached cloud storage systems. If we can’t do a “google-style” search of everything on the volume, then we can’t find our data…so it might as well not even be there.