I use gron a lot, because I can never remember how to use jq to do anything fancy but can usually make awk work. (I may be unusual in that department, in that I actually like awk)
One warning to note is that gron burns RAM. I've killed 32GB servers working with 15MB JSON files. (I think gron -u is even worse, but my memory is a bit fuzzy here).
Thanks, it would be great to make it official gron 2.0, I tried to achieve full functionality parity with the original. Also a serious buffer overflow bug was just fixed, so make sure to upgrade to 0.7.
I'm thinking of doing some marketing (for example a blog entry just to show what was the main learnings in I/O and memory management in order to achieve this speed).
Maybe I'm misunderstanding, but why does it need to read the file into memory at all? Can't it just parse directly as the data streams in? It should be possible to gron-ify a JSON file that is far bigger than the memory available - the only part that needs to stay in memory is the key you are currently working on.
edit: Interestingly whilst doing this test, I piped the output into `fastgron -u` (39.5G resident) and `jq` rejected that. Will have to investigate further but it's a bit of a flaw if it can't rehydrate its own output into valid JSON.
If I remember correctly, it took a 128GB AWS EC2 to parse that file without OOMing. Go is not that efficient at deep multi-level size- and type-unknown data structures.
I have a version of `gron` which uses almost no RAM to parse files (uses the streaming JSON parser rather than loading the file.) Processed a 4GB JSON file on a Pi using it (admittedly, it took forever) taking, IIRC, about 64MB RAM tops.
`gron -u` is basically impossible to optimise unless you know the input is in "sorted" order (ie the order it comes out of `gron`, including the `json.a = {};` bits) in which case my code can handle that in almost no RAM also. But if it's not sorted or you're missing the `json.a = {};` lines, there's not a lot you can do since you have to hold the whole data structure in RAM.
That 15MB JSON expands when piped through `gron` - my 7MB pathological test file is 143MB and 2M lines after going through `gron` (which is lines like `json[0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][1][1][0][0] = "x";`)
Which is 20 levels of unknown-sized and unknown-typed slices of slices of `any` in Go and that is not super-efficient, alas. It gets worse when you have maps of slices of maps etc. `fastgron` gets around this by being able to manage its own memory.
(`gron` can, however, reconstruct the output correctly if you shuffle the input. `fastgron` cannot. Which suggests to me it's maybe using the same 'output as we go' trick that my `gron` fork uses for its "input is sorted" mode which uses almost no RAM but cannot deal with disordered input.)
(`gron` could/should maybe indicate the maximum size of the slices and if they're a single type which would make things more efficient and I might add that to my fork.)
I remembered where some of my old files are and re-tested; forward-gron was "only" about 7GB for the 15MB file. gron -u was the real killer, clocking in around 53GB.
> as you just have to fill a data structure in memory as you go
You don't know the size, shape, or type of any of the levels in the data structure until you get to a line specifying one part of it. If you did, yep, it would be trivial!
But you do: if the first line is `users[15].name.family_name = "Foo"`, all you need to know is there: there is an array of users, each is a map containing a field called name, which is a map with a field called family_name.
If users[14] is a string, or there are 1500 users, the amount of memory usage to ungrok that line is exactly the same. Prove me wrong, I can't think of any way it would not be trivial, provided one uses the correct datastructures.
How big is it? All you know at this point is that it's at least 16 entries long. If the next line starts with `users[150]`, now it's 151 entries. Next line might make it 2000 entries long. You have no idea until you see the line.
> each is a map
But a map of what? `string -> object`? Ok, the next line is `users[15].flange[15]` which means your map is now `string -> (object|array)`.
Then the next line is `users[15].age = 15` and you've got `string -> (object|array|int)`. Each line can change what you've got and in Go this isn't a trivial thing to handle without resorting to `interface{}` (or `any`) all over the show and reflection to handle the management of the data structures.
> Prove me wrong, I can't think of any way it would not be trivial, provided one uses the correct datastructures.
All I can suggest is that you try to build `ungron` in Go and have it correctly handle disordered input. If you find a better way of doing it, I'd be happy to hear about it because I spent several months fighting Go in 2021-22 trying to optimise this without success.
As someone who has been using jq for years, my first instinct was why not jq? and while it answers it in the README as well, it is not very clear until you compare the output to jq.
Theoretically jq can be coerced into printing similar output. Realistically though if you've already written that jq query and you wanted the email field you'd just append .email.
`gron` is super useful when you don't directly know the structure of some JSON - gives you a nice simple path to locate things you can then construct a `jq` query to deal with (since e.g. dealing with multiple items in the same list can be a faff with `gron`.)
I think of gron and jq as siblings: gron + grep for discovery, then you use gron's output (for the desired line) as jq's input, which shows you what you were after.
With jq alone, you have to already understand the structure, which isn't always a given if you're combing through k8s manifests for instance.
> The jq version isn't greppable, as you can't do `| grep '.author.email'` for example.
Truth to be told with jq you don't need to grap it, you can grab just emails directly. I find gron a lot more useful for grep -v, that is for filtering out the parts that you don't need. Super easy to clean up data.
The idea is pretty brilliant, but it is not a new one. I remember a similar program about 20 years ago when XML was all the rage. I don't recall the name but something like "py*" I think.
Thumbs up for gron. Been using it for a couple of years to get the jsonpath to a property I need. It's super handy with kubectl and other ctls of the sort.
Hmmm… at first glance, this feels like I’d use it for the same sorts of things I’d use jq for, only easier to use but also way less powerful. Jq does have a little bit of a learning curve necessary to get good use out of it, so I could see this being a nice quick tool for people who don’t want to make that investment. Having already learned jq, I’m not sure why I would reach for gron, but maybe I’m missing something.
Looks like an interesting tool. I use jq for any JSON related task, but it can often be finicky and complex when I just need to get at a value or search for something.
Looks like gron would be a nice addition to my workflow with JSON tasks.
"maybe doing something depending on context for some nodes in a nested structure by using transforms or side-effects while iterating over the regularized representation of the structure's nodes"
It's a library but also includes CLI utilities that do the same as gron. Well, I hope without that memory ballooning problem described in sibling comments.
jq is awesome, and a lot more powerful than gron, but with that power comes complexity. gron aims to make it easier to use the tools you already know, like grep and sed.
I know grep, but sed is one of those I always have to look up whenever I have to escape a weird character or something.
I wonder if you could accomplish the same with structural search tools like Comby (https://comby.dev/) - with the bonus that you can target specific levels of beating since it's syntax-aware
gron is one of my favorite tools because I don't do this type of searching as much anymore and can't remember the options for more advanced tools (jq, etc). I can easily and confidently compose from gron -> grep though.
One warning to note is that gron burns RAM. I've killed 32GB servers working with 15MB JSON files. (I think gron -u is even worse, but my memory is a bit fuzzy here).
https://github.com/adamritter/fastgron as an alternative has been pretty good to me in terms of performance, I think both in speed and RAM usage.