This type of error is better remedied by making the balance property be *require...

ldb · on May 12, 2019

Hhm, this time your answer doesn't convince me. Setting "required" is not always possible, and using UPDATE to set a value works only as an ex-post solution, i.e. you must already know that there is an error. But this was exactly the point of my hypothetical example: the balance-query returns 0 which is a perfectly legal account balance and there is no reason to suspect an error in the first place.

So, I still think that the definition "sum({}) = 0" has more potential to hide errors than the definition "sum({}) = {}" would have.

vpetrovykh · on May 13, 2019

A few things come to mind:

1) It's a little unlikely that at the same time you have data where an {} is an error, but you are not making the property required AND with knowledge of that you still don't bother with other validation approaches. The point is that if you're aware that this property is potentially incorrect, you would want to check or restrict it. But yes, if this situation is completely unexpected, then "sum" won't notice any issues.

2) You have to remember that {} can arise from perfectly normal operations, such as filtering. So if you filter by a certain date range and there's no Payments there, then "(SELECT Payment FILTER .timestamp > <datetime>$date).amount" expression becomes {} even if the "amount" property itself is required. So doing a sum over it is simply a question of "What's the total amount in payments since $date?" and if there aren't any payments, the answer is 0, not {}. Plus what you certainly don't want is to get an {} from the "sum" here and do "{} + 100" to add some other charge (perhaps a sum from a different account) and still end up with {}, which now creates an error in a situation where there's nothing wrong with the data.

3) Rather than imbuing {} with special meaning to signal errors, a separate property would be more appropriate. Such as a boolean "valid" flag that gets set after the record checks out or can even be a dynamically computable expression that looks at the record (say, Payments from our example) and does something like "valid := EXISTS .amount". Then you'd filter things by the valid property before feeding them to "sum" like so:

  SELECT sum( (SELECT Payment FILTER .valid).amount );

ldb · on May 14, 2019

Thanks for your thoughts which are all good and reasonable.

Over the past years I've seen many programmers struggle with the combination of "data aggregation" and NULL. In the beginning I thought that these programmers should just "RTFM", but as this problem occurs so often it might very well be that the practical implementation of aggregation and NULL is "a bit off".

I like your "set" approach a lot, however, we still have "sum({1, 1, {}})" unequal to "1+1+{}" and "sum({}) = 0" which, while consistent, I think are somewhat counterintuitive and I am pretty sure will lead to misunderstanding.

Having said that, I suspect that any decent and consistent approach to this problem is subject to a very reduced form of John Lydgate's famous quote, that is "You can only please some of the programmers some of the time".

Many thanks for your time and this interesting and insightful conversation.