Hacker News new | past | comments | ask | show | jobs | submit login
Fselect: Find files with SQL-like queries (github.com/jhspetersson)
225 points by ingve on April 6, 2021 | hide | past | favorite | 42 comments



I just want to say this is really cool.

Honestly I'd love to see SQL (or something like it) become a first-class citizen in operating systems generally -- as a standard way for manipulating files, logs, preference files, etc.

To be clear: not turning things into databases (keep logs as logs), but make everything interpretable to SQL.

It's bizarre to me that in just a few lines I can achieve magic with a database... but I can't trivially do the same magic with my filesystem.

Just like a shell is an integral part of an OS... shouldn't a query language be too? I'd love it if, out-of-the-box, Linux distributions came with a standardized SQL+JSON-over-the-filesystem approach that functioned as a small-scale working database for any tool that wanted to use it as such.


Have you used osquery? Seems like a good chunk of what you’re looking for!

https://www.osquery.io/schema/4.7.0/


It's halfway what I'm talking about, thanks. Absolutely yes to the idea of being able to query everything!

But for me the idea of writing (CREATE, UPDATE, DELETE) would be an integral part of it as well -- whether inserting a property into a JSON file, adding a cronjob as column parameters rather than a text line, or creating files.


The writing part is what NixOS does (with Nixlang rather than SQL).


See also https://augeas.net/, which presents structured file contents as navigable paths that you update.

Red hatters maintained it (hung out in irc) until cloud container workflows surpassed in-place system administration. It pairs nicely with puppet, and can be used manually.



This has been cool every time it has come back around. It is a shame it isn't more universal. One simple recipe would be to read the file tables directly for speed (like the program 'Everything') and put it in an sqlite db.

Something that at least gets halfway is 'Everything' since it indexes files extremely quickly, sorts on multiple attributes extremely quickly, and allows regular expressions. It would be better if these things were built into the OS so that tools could be built on top of them.


Good point. SQL is very powerful for somethings.

Having some cli tools that focussed on this would be relatively easy to implement, and you could even create a special GUI based tool.

What do you have in mind when it comes to it being a “first class citizen” of the OS?


q is a general use programming language that has a subset of sql embedded inside of it. And it's amazing. You can write stuff as:

    c : a + b
    select col from my_table where other_col > c
So no more faffing around with manually building queries, and queries are ran by the same process that has access to all your variables.


If you're skeptical, I was too.

My thoughts before reading the README.md: SQL is nice and all but seems verbose and awkward compared to just using find.

My thoughts after reading the README.md: Wow, this tool can query on file format-specific metadata such as ID3 tags in mp3 files and even width/height in pictures. This is really neat. There is a lot of possibility here.


Ahh BeFS/BFS. I had my whole MSX-rom collection with file system attributes, including thumbnails


I loved BeOS/BeFS. I remember exporting ID3 tags from mp3 files into file system attributes and querying them to create playlists.

Aside from Haiku, does anyone know if any of the open-source Unix-like operating systems support arbitrary extended file attributes?


It seems like most of them do[1] (inc. ext4, zfs, btrfs), but they’re fairly limited in size (but should be fine for media tags and such).

[1] https://en.wikipedia.org/wiki/Extended_file_attributes#Linux


They do.. but integration is almost nonexistent.

All e-mail were separate eml files, with attributes for most of the important data.

Same for contacts.

In BeOS everything was more a file than in most *nix


Of course a built-in would be nice, but I think one could get 99% of the way there just with a specialized scanner (that I don't know the name of) that kept its own database, like CTAGs or the old OSX Quicksilver (and probably Spotlight after it).


Not unix-y but OS/2 had something similar.


The BeOS filesystem had something like this built-in, back in the day.

https://arstechnica.com/information-technology/2018/07/the-b...



ColdFusion/CFML has had this capability for a while: Most iterables return a "query" data type, and queries can be queried against using an in-memory SQL engine.

https://cfdocs.org/directorylist

https://cfdocs.org/queryexecute


HPC sites are likely to use https://github.com/cea-hpc/robinhood/wiki/Documentation. While its user interface isn't SQL, it does store the filesystem metadata in an SQL database (updated from changesets for Lustre filesystems). You don't want to run -- or, specifically, have users run -- ls, find, or du on significant amounts of a typical large filesystem. (As I understand it, you could do something similar to lustre with Isilon/OneFS, for instance.)

The metadata store of OrangeFS (PVFS2) is implemented with a database, though a key-value one, and one of the project ideas for it is to search that directly.


This is very similar to fsql, which looks similary useful.

https://github.com/kashav/fsql


I'm surprised to see this in the `--help` menu:

``` Japanese string:

        CONTAINS_JAPANESE           Used to check if string value contains Japanese symbols

        CONTAINS_KANA               Used to check if string value contains kana symbols

        CONTAINS_HIRAGANA           Used to check if string value contains hiragana symbols

        CONTAINS_KATAKANA           Used to check if string value contains katakana symbols

        CONTAINS_KANJI              Used to check if string value contains kanji symbols
```


The perfect complement for this would be a reboot of the BeOS-style filesystem.

For the unaware, BeFS combines the attributes of a hierarchical filesystem and (some of) a relational database. It could do interesting things like store contacts as a zero-contents file which had key-value pairs for things like names and email addresses.

There are some open-source reimplementations of it, thanks to the Haiku project; I wonder if they've ever been integrated with the Linux kernel and distros.


Microsoft tried this in NT5.0 and ended up shelving it.

Trying again was a substantial reason for the delay of one of the more recent versions. Vista perhaps?


I have a lot of files in my home directory, specially in the code folder where all my projects are, and you can imagine all the files that live in the python virtualenv and node_modules but even so the command finished execution in 45s, I think this is very impressive!


Grep integration would be awesome -- e.g. "find files with name satisfying x and content satisfying y", or (even better) "files in which grep query y is satisfied within n lines of a spot satisfying grep query y".


It does support regex expressions directly:

  ~= | =~ | regexp | rx       Used to check if the column value matches the regex pattern
  !=~ | !~= | notrx           Used to check if the column value doesn't match the regex pattern


    find $PATH -regex $FILENAME_REGEX -exec grep $CONTENT_REGEX

    grep -r [-E] $CONTENT_REGEX --include $FILEGLOB
Your bonus case:

    find $PATH -regex $FILENAME_REGEX \
        -exec sh -c "grep [-E] -$NUMBER_OF_CONTEXT_LINES $CONTENT_REGEX1 | grep -l [-E] $CONTENT_REGEX2"
The "[-E]" here is just to mention the command switch if you want to use extended regex.


Am I the only one that always creates an alias pointing grep to egrep? Not sure why I would ever want my regex to not be extended or why it would be fun to have to have to escape characters by default


Isn't this what pipes are for?


Reminds me of osquery!

Looks like osquery can also be used for file searching, but it's not as optimized for command-line usage.

https://blog.kolide.com/the-file-table-in-osquery-is-amazing...


This is amazing. A lot of people forget the fact that the filesystem is in a way a database, and a flexible one at that!


I know this is rust and has its own query engine and it is pretty cool, but makes me wonder how much can be done with:

1) dump filesystem (or some other data schema generation) metadata into sqllite 2) pass query to sqllite 3) pass sqllite response to stdout

and it would still be faster than the adhoc query engines like this.


Sqlite does have the fileio extension, which includes an fsdir() function that will traverse a directory.

  sqlite> .headers on
  sqlite> .mode column
  sqlite> SELECT name,mode,mtime FROM fsdir("/usr") where name like '%.h' LIMIT 3;
  name                      mode        mtime     
  ------------------------  ----------  ----------
  /usr/include/_G_config.h  33188       1607359089
  /usr/include/aio.h        33188       1607359089
  /usr/include/aliases.h    33188       1607359089
It's not nearly as full featured as this Fselect tool, though. I'm somewhat surprised nobody has cloned or updated the extension and added more stat() fields or things like extended file attributes. It doesn't even have the file size.


In my experience the limiting factor in response time is the traversal of the FS/OS structures in your step 1. It seems unlikely that anything this program is doing would be any slower than what you are describing.


Not necessarily.

On Windows for example there is Everything search engine which scans NTFS table and installs filter driver. Its instant on any disk size. If it were keeping its database in sqlite, we would have exactly what AtlasBarfed suggested.


I am drawing a distinction between actually using an SQL database as the dynamic attribute store of the file system, vs “dumping the FS to SQLite”, which implies an on-demand traversal to me.


I've been wanting to have something like this.

There is a nice CPAN module called File::Find::Rule which has similar functionality, but you need to write some code, of course.


I've found myself using this rather than come up with the equivalent `find` predicate.


I haven’t tried it yet, but his looks potentially super useful.


This on macOS' Spotlight database...


Awesome, loving




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: