More

FjordWarden · 2025-01-31T18:41:12 1738348872

If you expect Jena to be more battle-tested because it is older, forget it, if the process is killed by a unexpected shutdown or some other reason it results in data corruption. At least this was my experience a few years ago.

I found graph databases a beguiling idea when I first learned about them, and this is a welcome addition, but I've since temperated my excitement. They are not as flexible and universal a modal as is often promised. Everything is a graph, sure but the result of your SPARQL query not necessarily.

I found classical DBMS based on sets/multisets to be much easier to compose from a querying point of view. A table is a set/multiset and a result of a query is also a set/multiset, SPARQL guarantees no such composability. Maybe, if you want to start mucking around with inference engines, but you'll either run into problems of undecidability.

PaulHoule · 2025-01-31T18:53:55 1738349635

Jena lets you make little in-memory triple stores that you can use the way people use the list-map-scalar trinity. I've been working on this publication about that (RDF for difficult cases and when ordering counts) for years and it just got published last week

https://www.iso.org/standard/76310.html

I'll call out my collabortor Liju Fan for being the only person I've met who knew how to do anything interesting with OWL. (Well, I can do interesting things now but I owe it all to her.)

(For the research for that paper I used rdflib under PyPi because CPython was not fast enough.)

When I needed big persistent triple stores (that you use the way you might use postgres) I used to use

https://en.wikipedia.org/wiki/Virtuoso_Universal_Server

and had pretty good luck if I loaded a billion triples if I used plenty of 'stabilizers' (create a new AWS instance with ample RAM, use scripts to load a billion triples starting from an empty database, shut it down, make an AMI, start a new instance with the AMI, expect it to warm up for 20 minutes or so before query performance is good)

I don't regularly build systems on SPARQL today because of problems with updating. In particular, SQL has an idea of a "record" which is a row in a table, document oriented databases have an idea of a "record" which is a bit more flexible. Updating a SPARQL database is a little bit dangerous because there is no intrinsic idea of what a record is; i mean, you can define one by starting at a particular URI and traversing to the right across blank nodes and saying it is a 'record' and it works OK. But it's a discipline that I impose on it with my libraries, it ought to be baked into standards, baked into the databases, wrapped up in transactions, etc. For anything OLTP-ish I am still using SQL or document-oriented databases, but I hate the lack of namespaces and similar affordances that make SPARQL scalable in terms of "smash together a bunch of data from different sources" in document-oriented databases wheras SPARQL is missing the affordances you have in document-oriented databases for handling ordered collections. We badly need a SPARQL 2 which makes the kind of work that I talk about in that technical report easy.

smarx007 · 2025-01-31T20:21:17 1738354877

> Updating a SPARQL database is a little bit dangerous because there is no intrinsic idea of what a record is

SPARQL has a notion of a transactional boundary just like SQL has. You can combine multiple SPARQL queries in one transaction, they will all succeed or all fail just like you'd expect.

PaulHoule · 2025-01-31T20:39:51 1738355991

Sorta kinda.

Your code has to put the right things in a transaction all the time for transactions for transactions to work right. If there is some flow of information like

   application does query -> application thinks -> application does update

you have to wrap the whole sandwich in a transaction, people frequently don't do that. If I'm writing 20 of those for an application I want something that I know is bulletproof.

My experience with SQL is that the average SQL developer doesn't really understand how to do transactions right but their ass gets saved (in a probabilistic sense) by the grouping of updates that is implicit by running an INSERT or an UPDATE against a table.

There's also the fact that a lot of triple stores are seriously half baked research-quality code if that. Many triple stores struggle if you just try to load 100,000 triples sequentially, for an application like my YOShInOn RSS reader which I expect to use every day and not have to patch or maintain anything for 18+ months. (Ok, a 20GB database that needs to be pruned crept up on me gradually, but that's an arangodb problem, I'd expect the average triple to store to have crumbled 17 months ago.)

I'd love to have something that updates like a document-oriented database but lets you run a SPARQL query against the union of all the documents. Database experts though always seem to change the subject when it comes to having a graph algebra that lets you UNION 10 million graphs.

(For that matter, I sure as hell couldn't pitch any kind any kind of "boxes-and-lines" query tool [1] etc. that passed JSON documents/RDF graphs over the lines between the operators to the VCs and private equity people who were buying up query engines circa 2015 because they were hung up on the speed of columnar query engines... Despite the fact that the ones that pass relational rows over the lines require people who really aren't qualified to do so create analysis jobs that look like terrible hairballs because of all the joins they do.)

[1] Alteryx, KNIME

smarx007 · 2025-01-31T20:57:41 1738357061

> you have to wrap the whole sandwich in a transaction

True, SPARQL does not allow "opening" transactions such that you can run one query, do some logic, and run another query while doing commit. Which was a pain for me. RDF4J has a non-standard API to do that, I think they are trying to upstream it to SPARQL 1.2.

> There's also the fact that a lot of triple stores are seriously half baked research-quality code if that.

Also true. Although excellent researchers who wrote one of the best reasoners (Pellet) decided to leave academia and make a production grade system. They succeeded with Stardog but you don't want to know how much a license costs.

> couldn't pitch any kind any kind of "boxes-and-lines" query tool [1] etc. that passed JSON documents/RDF graphs

I really enjoy this talk from one of the creators of OWL [1]. There, he makes a point that OWL is unpopular not because it's too complex but because it's not advanced enough to solve real problems people care about (read: ready to pay money for). I think the case you described involves VCs having clarity on how to make money off one thing but not the other. I do think that the Semantic Web 3.0 (if we count Linked Data as a Semantic Web 2.0 aka Semantic Web Lite attempt) will need a better (appealing to business) case than the one presented in the 2001 SciAm paper.

[1]: https://videolectures.net/videos/eswc2011_hendler_work

kendallgclark · 2025-01-31T23:39:53 1738366793

OWL ontologies making a big comeback as part of Knowledge Graph groundings for LLM outputs. And several SPARQL and RDF knowledge graph startups are VC-baked and thriving. The world is a big place.

smarx007 · 2025-02-01T00:06:31 1738368391

Well, there is the new use case that appeals to VCs! And I guess it's a good reminder that I should re-subscribe to your blog :)

PaulHoule · 2025-01-31T21:08:12 1738357692

Personally I thought Stardog was trash, but if I'd had different requirements I might be happy with it.

The trouble w/ OWL as I see it (talked about in that TR) is that people don't really want "first order logic", but they want "first order logic + arithmetic" which is a nightmare that Kurt Godel warned you about. (That ISO 20022 which that TR is related to is about the financial domain which is all about arithmetic)

After Doug Lenat's death a lot of stuff came out that revealed the problems w/ Cyc, not least that even if you try to build something that is "knowledge based" it can't practically solve all the problems you want using a SMT-based strategy but you have to build a library of special purpose algorithms for everything you want to do and it turns out to be a godawful mess.

I'm disappointed that the semweb community hasn't made a serious crack at usable and efficient production rules (dealing w/ problems like negation, controlling execution order, RETE execution, retraction) instead we get half-answers like SPIN with fixed-point execution (used an even more half-baked version of that to research that TR, gets you somewhere). Of course, production rules never got standardized in any domain because nobody can agree on the way to address those four issues even though it usually isn't hard to find an answer that's fine for a particular application.

(It's a frequently problem that experts on a technology can get by on half-baked specific answers that would need a general solution if they were going to be useful for a general audience. One reason why parser generators are so bad is that if you understand parser generators enough to write a parser generator you aren't bothered by the terrible developer experience of parser generators.)

kendallgclark · 2025-01-31T23:35:49 1738366549

You seem nice.

PaulHoule · 2025-02-01T03:18:16 1738379896

Sorry for the negativity Kendall but the semweb didn't return the love that I gave it. I did hundreds of sales calls that went nowhere, but my phone kept ringing for people who wanted me to work on neural nets.

kendallgclark · 2025-02-02T03:48:40 1738468120

That’s tough. Not sure what that has to do with Stardog. Biggest companies in the world rely on it daily and you say it’s trash. I couldn’t find an email from you using it since 2013. I guess we figured something out. NNs are cool too; at last count we use half a dozen different ones including GNNs… NeSy is hot and I can hardly read a paper these days that doesn’t talk about triples.

PaulHoule · 2025-02-02T13:46:52 1738504012

(1) I'll grant it was a long time ago. Things could have changed a lot.

(2) It's generic that a new database comes out, gets hyped, but turns out to be "trash" when you try to use it. If a new database was actually good that would be exceptional. (Probably in 2013 it satisfied somebody's requirements but the hype for Stardog in 2013 seemed to be entirely out of line with what I needed for the project I was doing at the time)

I thought Postgres was trash in 2001 and called it CrashGreSlow, now I swear by it. Early on people were making big claims for it that were not substantiated but people did the hard work over a long time to make it great.

I thought mongodb was trash when it came out, then I worked for a place that used it despite the engineers believing it was trash and begging me not to use it for a spike prototype. It never got better. Now it is common knowledge that mongodb is trash.

(3) Maybe it's not fair but I was hurt by the experience, my wife was furious at the balance I'd run up on the HELOC chasing my Moby Dick. As an applications programmer who was accustomed to getting things right I had a terrible opinion of most of the luminaries in the semantic web field at the time many of whom were shipping code that was academic quality at best.

UltraSane · 2025-02-01T11:25:52 1738409152

You mean brutally honest.

svilen_dobrev · 2025-01-31T19:39:28 1738352368

datomic (and partially xtdb /former crux) are OLTPish, and use only such "tuples" , essentially it's up to the user to define what constitutes an entity if at all ("row", "object", "document", whatever) - maybe some entity-id and everything linked to it, but maybe other less-identity-related stuff. Which might feel freeing to extent, but as you said, also expects great responsibility/discipline to cobble the proper properties together.

PaulHoule · 2025-01-31T20:04:06 1738353846

Mathematically the boundaries of a record can be defined by production rules

https://en.wikipedia.org/wiki/Business_rules_engine

which could be written as SPARQL queries, I've used these to cut records out of a big graph, I haven't thought seriously if these could be built into a large scale general purpose systems.

The most fun I ever had with Jena was when I used the rules engine for the control plane of a batch processing system which used stream processing primitives [1]

https://jena.apache.org/documentation/inference/

The Jena folks said my use was completely unsupported, I had looked at the source code and got to understand how the rules engine worked and I knew damn well there was nothing wrong with what I was doing.

I've thought a lot about why production rules have had so little impact on the industry, I mean people really hate drools

https://www.drools.org/

That kind of system is particularly strong at handling deep asynchrony, like when a business process at a bank might involve some steps where you might have to wait for a loan office to approve a loan. It's disappointing to me that nobody has tried to use them (so far as I can tell) to deal with the asynchronous comms problems in Javascript though I've yet to get a clear picture in my mind about how to get started on that. (Funny I am getting an idea now so I'm putting a ticket on my personal Kanban board)

[1] I worked later at a place that had a similar engine written in very awkward Scala that allegedly used Either and Optional for error handling but actually dropped errors most of the time; I knew what algebra my engine supported, they argued whether or not something like that had an algebra; my engine got the same answers every time because it tore down the system properly at the end, their engine gave different answers every time but they didn't seem to care

smarx007 · 2025-01-31T20:17:31 1738354651

I said suitable for newcomers aka people touching RDF for the first time. If you want production-ready, you probably want Stardog, Ontotext GraphDB, or AWS Neptune - neither is cheap. https://github.com/the-qa-company/qEndpoint is also an interesting project that's used in production.

zozbot234 · 2025-01-31T18:43:34 1738349014

> SPARQL guarantees no such composability.

SPARQL has a CONSTRUCT clause which gives you RDF as your query output. Isn't that compositional enough?

FjordWarden · 2025-01-31T19:46:47 1738352807

Ok, that is true, but how do I tell my graph database that the result of the construct query is some other graph in my DB?

smarx007 · 2025-01-31T20:34:57 1738355697

> how do I tell my graph database that the result of the construct query

I am assuming you are asking how to do a CONSTRUCT query that will return you the contents of a given named graph?

https://www.w3.org/TR/sparql11-http-rdf-update/#http-get is a much simpler way to get a graph. As the spec says, it's equivalent to the following query

       CONSTRUCT { ?s ?p ?o } WHERE { GRAPH <graph_uri> { ?s ?p ?o } }

UltraSane · 2025-02-01T03:28:23 1738380503

Property graphs map to relational databases pretty well. Using Neo4j's terminology

table name -> node label

table row -> node

table column -> node property

The result of a query is a sub-graph and is very composable.

AtlasBarfed · 2025-02-02T19:54:42 1738526082

I think it maps much better to document databases.

Nodes are just documents.

You just need to slap on a relations document type for the graph edges, and to store edge properties

I was close to finishing at least version 1.0 of a document/graph database on top of Cassandra and dynamodb.

UltraSane · 2025-02-03T04:09:57 1738555797

With Neo4j properties and list members cannot be complex objects just like DB table rows. I was thinking that my dream DB would be a hybrid of MongoDB documents with Neo4j type relationships between them.

FjordWarden · 2025-01-20T11:34:25 1737372865

I understand, but what if I want to use the enums the way they are used in C, as a label for a number, probably as a way to encode some type or another. Sum types of literal numbers are not very practical here because the labels should be part of the API.

mistercow · 2025-01-20T11:50:29 1737373829

What in your view is the downside to doing this?

    export const MyEnumMapping = {
      active: 0,
      inactive: 1
    } as const

    export type MyEnum = typeof MyEnumMapping[keyof typeof MyEnumMapping];

So you have the names exposed, but the underlying type is the number.

eeue56 · 2025-01-20T15:08:41 1737385721

I would do this instead:

  type MyEnum = {
    active: 0;
    inactive: 1;
  }

  const MyEnum: MyEnum = {
    active: 0, 
    inactive: 1,
  }

  const showAge = MyEnum.active;
  const showPets = MyEnum.inactive;

It's slightly more duplication, but a lot more readable (imo) to those unfamiliar to utility types. TypeScript also enforces keeping them in sync.

mistercow · 2025-01-20T16:23:22 1737390202

That doesn't give you a type that you can use for the actual enum values. If you wanted a function argument to take in one of your enum values, you'd still have to use keyof in the signature like:

   function doSomethingWithMyEnum(val: MyEnum[keyof MyEnum])

You could do `val: number`, but now you're allowing any number at all.

Ultimately, the type syntax in TypeScript is a key part of the language, and I don't think it's unreasonable to expect developers to learn the basic typeof and keyof operators. If we were talking about something wonkier like mapped types or conditional types, sure, it might make sense to avoid those for something as basic as enums.

akdev1l · 2025-01-20T13:22:07 1737379327

This is way harder to parse and understand than the enum alternative.

Personally I am definitely not skilled enough at typescript to come up with this on my own before seeing this thread so this was not even an option until now.

nosefurhairdo · 2025-01-20T16:23:39 1737390219

You get used to it! Also much easier to read in an editor with intellisense than your first exposure as plain text. If you're going to spend any considerable amount of time writing typescript, parent's advice is good.

mistercow · 2025-01-20T13:43:28 1737380608

Well, that's basically how you would have done it in vanilla JS before typescript came around. The main awkwardness is the type definition. I often prefer to use a library like type-fest for this kind of thing so you can just say:

    export type MyEnum = ValueOf<MyEnumMapping>;

TypeScript not having enough sugar in its built-in utility types is definitely a fair criticism.

But more to the point, the above is not usually how you do enums in TS unless you have some very specific reason to want your values to be numbers at all times. There are some cases like that, but usually you would just let the values be strings, and map them to numbers on demand if that's actually required (e.g. for a specific serialization format).

Kiro · 2025-01-20T16:36:58 1737391018

Is this what you're referring to when you're talking about more elegant alternatives? Come on. You're not going to convince anyone with this.

mistercow · 2025-01-20T16:41:10 1737391270

No, this is what I suggest for someone who wants to do something non-idiomatic because they're used to C. What I suggest in most cases is just a string union type.

Edit: But for what it's worth, yes, the above is still more elegant than enums. The syntax may feel less elegant, but among other things, the above does not depart from the structural type paradigm that the rest of the language uses, all for something as simple as an enum.

cap11235 · 2025-01-20T19:41:10 1737402070

Typescript devs love doing anything except developing scripts with types

anamexis · 2025-01-20T11:49:47 1737373787

In that case you can just use object literals `as const`.

FjordWarden · 2024-12-21T21:52:57 1734817977

How is this different from the Durable Promises idea by ResonateHQ? It seams to me a bitter easer to get started with as a small standalone project. Would you mind comparing this solutions to that?

NathanFlurry · 2024-12-22T02:00:55 1734832855

I skimmed a bit about Durable Promises, and if I understand correctly, they’re similar to workflow tools like Temporal but come with high-quality language bindings that let you write workflow steps using async/await.

We built something almost identical in Rust to let us use async/await for long-running & failure-prone workflows. This powers almost everything we do at Rivet. We have a technical writeup coming later, but here's a couple links of interest:

- Rough overview: https://github.com/rivet-gg/rivet/blob/58b073a7cae20adcf0fa3...

- Example async/await-heavy workflow: https://github.com/rivet-gg/rivet/blob/58b073a7cae20adcf0fa3...

- Example actor-like workflow: https://github.com/rivet-gg/rivet/blob/58b073a7cae20adcf0fa3...

Rivet Actors can start, stop, or crash at any time and still continue functioning, much like Durable Promises.

However, their scope differs: Rivet Actors are broader and designed for anything stateful & realtime, while Durable Promises seem focused on workflows with complex control flow.

Rivet Actors can (and likely will) support workflow-like applications in the future, since state management and rescheduling are already built-in.

For a deeper dive, Temporal has a writeup comparing actors and workflows: https://temporal.io/blog/workflows-as-actors-is-it-really-po...

FjordWarden · 2024-12-20T16:11:42 1734711102

Maybe homology can help, it is a sort of calculus for discrete structures where you count how many N dimensional hole there are over time. Dunno about NN but that is what they can do with fMRI.

FjordWarden · 2024-12-16T14:03:20 1734357800

Active acoustic camouflage.

mckirk · 2024-12-16T14:41:30 1734360090

I imagine that to look something like this:

https://www.youtube.com/watch?v=vSK3maq8Cyk

FjordWarden · 2024-11-08T12:31:17 1731069077

No mention of ARX, but it is also a tool that lets you calculate those metrics: https://arx.deidentifier.org

marols · 2024-11-11T12:46:22 1731329182

Thank you for the recommendation! I've added it to the list of resources.

FjordWarden · 2024-11-04T14:45:49 1730731549

There is also a difference between "bible study" and "historical criticism", where the later is devoid of any specific religious interpretation.

I am secular person and I find it interesting how ideas evolve over time. I think it was around the year 1000 that Jewish scholars started to wonder why the old testament didn't mention the planets which where a Greek discovery and cultural "meme". People where worried about things like that, but couldn't formulate a good answer. Also most people don't know but ancient Judaism was a polytheistic religion, and only became monotheistic after the return from the Babylonian exile.

FjordWarden · 2024-09-11T19:16:01 1726082161

Do you think this can be useful for a computational algebra system?

yamafaktory · 2024-09-11T20:03:51 1726085031

I'm honestly not sure if I can answer that question. There's plenty of applications for hypergraphs (e.g. https://www.sciencedirect.com/science/article/pii/S209580992...), so I'd say why not?

emmanueloga_ · 2024-09-11T22:33:24 1726094004

Are you implementing this to solve any particular application?

yamafaktory · 2024-09-12T04:58:34 1726117114

Not really - yet. I made that as an exercise to learn some Zig and to see how fast/efficient it might be.

FjordWarden · 2024-09-05T22:23:51 1725575031

Sorry for being nerd, but your internal organs are going to get UV damage.

murphyslab · 2024-09-05T22:51:18 1725576678

UV damage to internal tissues seems unlikely given that the tartrazine dye they used absorbs strongly in the UV region of the spectrum. You can see this in Figure S1 A & B:

https://www.science.org/doi/suppl/10.1126/science.adm6869/su...

Also the abstract of the article notes that strong UV absorption is likely a prerequisite for this effect:

> We hypothesized that strongly absorbing molecules can achieve optical transparency in live biological tissues. By applying the Lorentz oscillator model for the dielectric properties of tissue components and absorbing molecules, we predicted that dye molecules with sharp absorption resonances in the near-ultraviolet spectrum (300 to 400 nm) and blue region of the visible spectrum (400 to 500 nm) are effective in raising the real part of the refractive index of the aqueous medium at longer wavelengths when dissolved in water, which is in agreement with the Kramers-Kronig relations. As a result, water-soluble dyes can effectively reduce the RI contrast between water and lipids, leading to optical transparency of live biological tissues.

https://www.science.org/doi/10.1126/science.adm6869

However this kind of research into the effects of absorption bands on the transmission properties at interfaces might ultimately bring about more effective sunscreen formulations.

JumpCrisscross · 2024-09-05T23:01:09 1725577269

> UV damage to internal tissues seems unlikely given that the tartrazine dye they used absorbs strongly in the UV region of the spectrum

To expand: "the most hazardous UV radiation has wavelengths between 240 nm and 300 nm" [2]. While tartrazine has a lambda max at 425 nm in water [2], it has a second ridiculously-convenient peak around 260 nm [3].

TL; DR It should be mildly UV protective ceteris paribus.

[1] https://ehs.umass.edu/sites/default/files/UV%20Fact%20Sheet....

[2] https://pubchem.ncbi.nlm.nih.gov/compound/Tartrazine#section...

[3] https://www.aatbio.com/absorbance-uv-visible-spectrum-graph-...

permo-w · 2024-09-05T23:48:40 1725580120

according to this study, tartrazine can cause kidney and liver damage in rats

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5326541/

like an x-ray, I'd risk that for a one-off doctors appointment, but I'd probably not risk it on my body at all times. maybe there are safer dyes that have the same effect

JumpCrisscross · 2024-09-06T00:06:37 1725581197

> maybe there are safer dyes that have the same effect

Given the effect is optical, perhaps encapsulation in benign, transparent beads? (Could be particularly effective is the goal is a tattoo.)

EvanAnderson · 2024-09-06T00:12:53 1725581573

So that whole "get UV light inside the body to fight COVID" trope could come true? >smile<

FjordWarden · 2024-09-04T18:47:30 1725475650

His work is absolutely fantastic in both meanings of that word.