Honestly I'd love to see SQL (or something like it) become a first-class citizen in operating systems generally -- as a standard way for manipulating files, logs, preference files, etc.
To be clear: not turning things into databases (keep logs as logs), but make everything interpretable to SQL.
It's bizarre to me that in just a few lines I can achieve magic with a database... but I can't trivially do the same magic with my filesystem.
Just like a shell is an integral part of an OS... shouldn't a query language be too? I'd love it if, out-of-the-box, Linux distributions came with a standardized SQL+JSON-over-the-filesystem approach that functioned as a small-scale working database for any tool that wanted to use it as such.
It's halfway what I'm talking about, thanks. Absolutely yes to the idea of being able to query everything!
But for me the idea of writing (CREATE, UPDATE, DELETE) would be an integral part of it as well -- whether inserting a property into a JSON file, adding a cronjob as column parameters rather than a text line, or creating files.
See also https://augeas.net/, which presents structured file contents as navigable paths that you update.
Red hatters maintained it (hung out in irc) until cloud container workflows surpassed in-place system administration. It pairs nicely with puppet, and can be used manually.
This has been cool every time it has come back around. It is a shame it isn't more universal. One simple recipe would be to read the file tables directly for speed (like the program 'Everything') and put it in an sqlite db.
Something that at least gets halfway is 'Everything' since it indexes files extremely quickly, sorts on multiple attributes extremely quickly, and allows regular expressions. It would be better if these things were built into the OS so that tools could be built on top of them.
My thoughts before reading the README.md: SQL is nice and all but seems verbose and awkward compared to just using find.
My thoughts after reading the README.md: Wow, this tool can query on file format-specific metadata such as ID3 tags in mp3 files and even width/height in pictures. This is really neat. There is a lot of possibility here.
Of course a built-in would be nice, but I think one could get 99% of the way there just with a specialized scanner (that I don't know the name of) that kept its own database, like CTAGs or the old OSX Quicksilver (and probably Spotlight after it).
ColdFusion/CFML has had this capability for a while: Most iterables return a "query" data type, and queries can be queried against using an in-memory SQL engine.
HPC sites are likely to use https://github.com/cea-hpc/robinhood/wiki/Documentation. While its user interface isn't SQL, it does store the filesystem metadata in an SQL database (updated from changesets for Lustre filesystems). You don't want to run -- or, specifically, have users run -- ls, find, or du on significant amounts of a typical large filesystem. (As I understand it, you could do something similar to lustre with Isilon/OneFS, for instance.)
The metadata store of OrangeFS (PVFS2) is implemented with a database, though a key-value one, and one of the project ideas for it is to search that directly.
CONTAINS_JAPANESE Used to check if string value contains Japanese symbols
CONTAINS_KANA Used to check if string value contains kana symbols
CONTAINS_HIRAGANA Used to check if string value contains hiragana symbols
CONTAINS_KATAKANA Used to check if string value contains katakana symbols
CONTAINS_KANJI Used to check if string value contains kanji symbols
The perfect complement for this would be a reboot of the BeOS-style filesystem.
For the unaware, BeFS combines the attributes of a hierarchical filesystem and (some of) a relational database. It could do interesting things like store contacts as a zero-contents file which had key-value pairs for things like names and email addresses.
There are some open-source reimplementations of it, thanks to the Haiku project; I wonder if they've ever been integrated with the Linux kernel and distros.
I have a lot of files in my home directory, specially in the code folder where all my projects are, and you can imagine all the files that live in the python virtualenv and node_modules but even so the command finished execution in 45s, I think this is very impressive!
Grep integration would be awesome -- e.g. "find files with name satisfying x and content satisfying y", or (even better) "files in which grep query y is satisfied within n lines of a spot satisfying grep query y".
~= | =~ | regexp | rx Used to check if the column value matches the regex pattern
!=~ | !~= | notrx Used to check if the column value doesn't match the regex pattern
Am I the only one that always creates an alias pointing grep to egrep? Not sure why I would ever want my regex to not be extended or why it would be fun to have to have to escape characters by default
Sqlite does have the fileio extension, which includes an fsdir() function that will traverse a directory.
sqlite> .headers on
sqlite> .mode column
sqlite> SELECT name,mode,mtime FROM fsdir("/usr") where name like '%.h' LIMIT 3;
name mode mtime
------------------------ ---------- ----------
/usr/include/_G_config.h 33188 1607359089
/usr/include/aio.h 33188 1607359089
/usr/include/aliases.h 33188 1607359089
It's not nearly as full featured as this Fselect tool, though. I'm somewhat surprised nobody has cloned or updated the extension and added more stat() fields or things like extended file attributes. It doesn't even have the file size.
In my experience the limiting factor in response time is the traversal of the FS/OS structures in your step 1. It seems unlikely that anything this program is doing would be any slower than what you are describing.
On Windows for example there is Everything search engine which scans NTFS table and installs filter driver. Its instant on any disk size. If it were keeping its database in sqlite, we would have exactly what AtlasBarfed suggested.
I am drawing a distinction between actually using an SQL database as the dynamic attribute store of the file system, vs “dumping the FS to SQLite”, which implies an on-demand traversal to me.
Honestly I'd love to see SQL (or something like it) become a first-class citizen in operating systems generally -- as a standard way for manipulating files, logs, preference files, etc.
To be clear: not turning things into databases (keep logs as logs), but make everything interpretable to SQL.
It's bizarre to me that in just a few lines I can achieve magic with a database... but I can't trivially do the same magic with my filesystem.
Just like a shell is an integral part of an OS... shouldn't a query language be too? I'd love it if, out-of-the-box, Linux distributions came with a standardized SQL+JSON-over-the-filesystem approach that functioned as a small-scale working database for any tool that wanted to use it as such.