This type of error is better remedied by making the balance property be required (so that it cannot be set to {} and produce an exception at the time the error is introduced). This way you will know about the error early enough. The point is that if an empty value is NOT valid then forbidding it at schema level is the best solution. So required keyword is going to do that for you.
Alternatively, if making the property required is not possible due to some workflow constraints, you could do "SELECT Payment{customer} FILTER NOT EXISTS .balance" to find all payments (and the associated customer), which don't have any balance set. Then once you know what they are you might use "UPDATE" to fix the problem.
Empty sets have fairly well-defined and consistent behavior w.r.t. functions (and operators). You learn it once and it applies in all contexts - specifically that empty sets are just sets like any other.
Hhm, this time your answer doesn't convince me. Setting "required" is not always possible, and using UPDATE to set a value works only as an ex-post solution, i.e. you must already know that there is an error. But this was exactly the point of my hypothetical example: the balance-query returns 0 which is a perfectly legal account balance and there is no reason to suspect an error in the first place.
So, I still think that the definition "sum({}) = 0" has more potential to hide errors than the definition "sum({}) = {}" would have.
1) It's a little unlikely that at the same time you have data where an {} is an error, but you are not making the property required AND with knowledge of that you still don't bother with other validation approaches. The point is that if you're aware that this property is potentially incorrect, you would want to check or restrict it. But yes, if this situation is completely unexpected, then "sum" won't notice any issues.
2) You have to remember that {} can arise from perfectly normal operations, such as filtering. So if you filter by a certain date range and there's no Payments there, then "(SELECT Payment FILTER .timestamp > <datetime>$date).amount" expression becomes {} even if the "amount" property itself is required. So doing a sum over it is simply a question of "What's the total amount in payments since $date?" and if there aren't any payments, the answer is 0, not {}. Plus what you certainly don't want is to get an {} from the "sum" here and do "{} + 100" to add some other charge (perhaps a sum from a different account) and still end up with {}, which now creates an error in a situation where there's nothing wrong with the data.
3) Rather than imbuing {} with special meaning to signal errors, a separate property would be more appropriate. Such as a boolean "valid" flag that gets set after the record checks out or can even be a dynamically computable expression that looks at the record (say, Payments from our example) and does something like "valid := EXISTS .amount". Then you'd filter things by the valid property before feeding them to "sum" like so:
Thanks for your thoughts which are all good and reasonable.
Over the past years I've seen many programmers struggle with the combination of "data aggregation" and NULL. In the beginning I thought that these programmers should just "RTFM", but as this problem occurs so often it might very well be that the practical implementation of aggregation and NULL is "a bit off".
I like your "set" approach a lot, however, we still have "sum({1, 1, {}})" unequal to "1+1+{}" and "sum({}) = 0" which, while consistent, I think are somewhat counterintuitive and I am pretty sure will lead to misunderstanding.
Having said that, I suspect that any decent and consistent approach to this problem is subject to a very reduced form of John Lydgate's famous quote, that is "You can only please some of the programmers some of the time".
Many thanks for your time and this interesting and insightful conversation.
Alternatively, if making the property required is not possible due to some workflow constraints, you could do "SELECT Payment{customer} FILTER NOT EXISTS .balance" to find all payments (and the associated customer), which don't have any balance set. Then once you know what they are you might use "UPDATE" to fix the problem.
Empty sets have fairly well-defined and consistent behavior w.r.t. functions (and operators). You learn it once and it applies in all contexts - specifically that empty sets are just sets like any other.