Wuffs the Language

oconnor663 · on April 8, 2021

> There is no operator precedence. A bare a * b + c is an invalid expression. You must explicitly write either (a * b) + c or a * (b + c).

Honestly I've often wished for this in mainstream languages. It seems like operator precedence should go the way of bracketless if and implicit int casts. (Though I wonder if they wind up making exceptions here for chains of method calls? I guess technically those rely on operator precedence sort of?)

Edit: Yeah I see the example code has "args.src.read_u8?()". So it looks like they figured out how to keep the good stuff.

mekkkkkk · on April 8, 2021

Yes please! I'm always using parenthesis for every compound expression, and I've heard so many times from coworkers or code reviewers smuggly going "you know you can skip that, right?". At the same time I've heard the same people having discussions and scratching their heads about precedence in some attempt to code golf their way through a feature. Not to mention bugs caused by incorrect assumptions. Or pausing to figure out what some previously written expression actually does. Meanwhile, I'll gladly write `X + (Y / Z)`. You can thank me later.

ZaoLahma · on April 9, 2021

Thank you!

In some sane industries where functional safety is required, this is strictly enforced. It leaves no ambiguity - what you write is what you get - and when the excrement hits the quickly revolving pressure modification device, you can just glance at the expressions and tell if they make sense or not.

wuschel · on April 8, 2021

LISP-like languages have enforced operator precedence due to polish notation e.g. (+ (* a b) (+ c d))

aidenn0 · on April 8, 2021

In addition the variadic prefix-notation means the operators are not limited to being binary:

  3*x*y*z+w

becomes:

  (+ (* 3 x y z) w)

User23 · on April 9, 2021

Also, pleasantly the 0-ary invocation is the identity so (+) is 0 and (*) is 1.

ganafagol · on April 9, 2021

In addition to requiring remembering the precedence of + versus * this requires you to remember order of evaluation. Is it (ab)c or a(bc)? And no, with certain types those are not necessarily the same. Floats, for example.

aidenn0 · on April 14, 2021

1. It doesn't require you to remember precedence, since there is no ambiguity

2. It doesn't require you to remember order of evaluation because the order is unspecified (* x y z) is defined to be "The product of x, y, and z" with no requirement on the order of evaluation. If you need a well-defined order of evaluation then you can do that explicitly: (* x (* y z))

spicybright · on April 9, 2021

At least you could write a simple macro to do left to right evaluation with standard operators to get the same effect if you wanted.

hansvm · on April 9, 2021

Does lisp have native bigints, or would something like (+ MAX_INT MAX_INT MIN_INT) still suffer from operator precedence issues?

kazinator · on April 9, 2021

Common Lisp's integers are transparently multi-precision. There is no need to work with a separate type, or to use special syntax for writing bignum tokens in source code.

Bignum support first appeared in the MacLisp dialect in 1970 or 1970, one of the main predecessor dialects of Common Lisp.

According to Gabriel and Steele's Evolution of Lisp paper, "bignums—arbitrary precision integer arithmetic—were added [to MacLisp] in 1970 or 1971 to meet the needs of Macsyma users".

aidenn0 · on April 9, 2021

I dont' know about other lisps, but both common-lisp and scheme have native bignums and fixnums are promoted to bignums automatically.

sqrt17 · on April 8, 2021

there's no operator precedence if you don't have (multiple) operators that could precede each other. In LISP-like languages these are simply functions (or more correctly, forms) which have other expressions as arguments, like any other functions or forms. LISP works just fine without much of the things we take for granted in ALGOL-like languages.

stkdump · on April 9, 2021

Polish notation enforces binary operators. LISP doesn't, so you have to have the parentheses. (+ a b c) is + a + b c or + + a b c in polish notation. These are the same, of course until thet are not, such as with floating point arithmetic or in case you trap on integer overflows.

kazinator · on April 9, 2021

That is a completely wrong-headed view. There is no precedence there because there is no ambiguity. The parentheses in your example are the function call parentheses, not the optional grouping parentheses. They are mandatory.

There are some issues of associativity in the semantics of some Lisp functions. For instance we know that that the syntax (+ 1.0 2 3.0 4) is a list object containing certain items in a known order.

But how are they added? This could depend on dialect. I think in Common Lisp, the result has to be as if they were added left to right. When inexact numbers are present, it matters.

This isn't a matter of syntax; it's a semantic matter, which depends on the operator.

For instance in (- 1 2 3 4), the 2 3 4 are treated as subtrahends which are subtracted from the 1. But (- 2) means subtract 2 from 0.

In TXR Lisp, I allowed the expt operator to be n-ary: you can write (expt 2 3 4). But this actually means (expt 2 (expt 3 4)): it is a right to left reduction! It makes more sense that way, because it corresponds to:

     4
    3
   2

The left-to-right interpretation would be

     3 + 4
   2

which is less useful: you can code that yourself using (expt 2 (+ 3 4)), for any number of additional terms after the 4.

sleepydog · on April 8, 2021

APL (and, its derivatives, I think) evaluate strictly right to left, so

a * b + c

is a * (b + c). It might be jarring at first but I really came to enjoy the consistency, I never had to remember operator precedence, which helps in a language like APL where most functions are infix.

tragomaskhalos · on April 8, 2021

Conversely, Smalltalk is left-to-right, so

a + b * c is (a + b) * c

which is simply a result of every operation being a message send - muddling the rules with precedence would be likewise confusing, and would ruin the simplicity of the grammar.

guram11 · on April 9, 2021

and in forth

a b + c *

I believe language with "default precedence" was meant to help us write less (parenthesis) but in the long run we ended up abusing (it)

brundolf · on April 8, 2021

I tend to think it's fine for the very most common and obvious operators (MDAS, etc), but as soon as you get outside of those I agree. In particular I've been bitten by the precedence of JavaScript's ?? operator:

  function foo(a) {
    return a ?? 10 + " is the num";  // a ?? (10 + " is the num")
  }

  foo(12) // 12

nestorD · on April 8, 2021

Me too! So far I have seen four actual bugs in large numerical code bases that were caused by overlooking operator precedence. I expect to see more in the years to come.

I think that precedence of '*' over '+' is acceptable (as everyone knows it instinctively) but I would love a way to require parenthesis for everything else.

avmich · on April 9, 2021

APL wasn't about to bet that * is obviously over + in precedence. There everything is of equal precedence, right-to left. Until you put parenthesis. And that's for verbs (monadic and dyadic operators over nouns), not for other parts...

niccl · on April 9, 2021

Everyone doesn't know it instinctively, although I used to think so.

A while ago I taught an introductory spreadsheet class for adults. I got them to try "=2*3+4" and about half the class were surprised that the result wasn't 20. It's a lesson that has stayed in my mind.

mekkkkkk · on April 9, 2021

I don't know if it was a joke (good one if so), but you seem to have mixed the operators up in your example.

lifthrasiir · on April 9, 2021

I don't think ditching the "basic" operator precedence (MDAS etc.) is a good idea, but I strongly agree that operator precedence should be a partial order, not a total order. See also [1].

[1] https://foonathan.net/2017/07/operator-precedence/

puzzlingcaptcha · on April 8, 2021

Have there been attempts at creating languages that use a postfix (RPN) notation?

benhoyt · on April 8, 2021

Forth is (I think?) the oldest and most well-known. Postscript, the printer control language, is possibly more widely-deployed. And Factor is a modern take on Forth.

huachimingo · on April 8, 2021

Dont forget to add UNIX's dc (in old times bc was a wrapper for dc)

pdpi · on April 9, 2021

There's also Bitcoin Script, which is a forth-like language.

avhon1 · on April 9, 2021

to add to the list: HPL, the programming language on Hewlett-Packard's RPN calculators.

robobro · on April 8, 2021

like forth?

sedatk · on April 9, 2021

That's just brilliant, and now I think about it, I wonder why no other newer languages have adopted this. I wish this to be the new norm of 2020's.

rurban · on April 9, 2021

Only the good use it like that. Pony eg enforces it, and it was way before wuffs. Rust on the other hand lives with precedence rules which you have to remember by hard.

smt1 · on April 9, 2021

That isn't a bad idea, but keep in mind, there is always a usability aspect (I'll just call it the programmer computer interface problem) of "what makes a programming language popular". For example, consider PL/I: https://en.wikipedia.org/wiki/PL/I#Implementation_issues

When people see for example, * or + (or a, b, c). They may have some preassumptions about some implied associativity from arithmetic (depending on what they are taught and what level of math they are at), that may be hard to break. If you have learned some college (abstract) algebra, it may mean something quite different. How about the = sign? Of course, a, b, c may be meaningless to someone who is not a native latin-1 speaker either. My point I guess is that these are just matters of convention, there is just some implied commutativity or associativity usually implied, but this is all arbitrary.

Now, one intereting "quirk" with with PL/I was that certain things looked similar "to what people were used to" (relative to say other PL/I code, or FORTRAN or COBAL), but worked differently even in some small spatial area on a screen (two blocks of nearby code in some editor). For example, if the programmer's eye saw a block of code, reflexively, depending on their experience they may be able to predict what the result of the computation could do. PL/I was an interesting experiment because of the lack of reserved keywords. This made it very expressive but very hard to understand code in context. For example, in pseudo PL/I: foo = 1; = = 2; bar 2 + foo. You are basically changing the grammatical syntax of the language in 3 lines.

But on the other hand, everything is just a symbol and this may not be completely unusual. Consider the diversity of the world's languages and how they are written and how meaning is derived. Natural language grammars may connotate very different representations and transformations, but people learn because they see enough examples. Consider for the differences between Han, Brahmic scripts, Arabic BiDi, various African scripts, Cuneiform, Emoji, whatever. Perhaps all computer languages are "overfit" due to for example, Chomsky's ideas and BNF (keep in mind Chomsky's ideas about morphology were quite different).

Now, let's consider mathematical notation. Depending on how much pure math (or say, mathematical physics or other sciences) you consider, there may be more and more semantic overhead with the conventions of mathematical notation, and people often historically just "cartesianize" and "euclidized" things for convenience because of lack of tooling (think of a sheet of paper metaphor, we've simply moved it over to a computer. it's a skewmorph). Clearly we have better computer graphics, so why haven't developer tools and languages changed along with it? Maybe with more immersive manipulation they will.

dang · on April 8, 2021

Surprisingly little discussed so far, aside from these past related threads:

Wuffs’ PNG image decoder - https://news.ycombinator.com/item?id=26714831 - April 2021 (135 comments)

C performance mystery: delete unused string constant - https://news.ycombinator.com/item?id=23633583 - June 2020 (105 comments)

That first one was just yesterday but this is a rare case where we would not downweight the follow-up post (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...).

zokier · on April 8, 2021

Apparently the language was renamed at some point, google/puffs redirects to google/wuffs?

Puffs was discussed few years back https://news.ycombinator.com/item?id=15711767

pjmlp · on April 8, 2021

Thankfully they changed the name, in Germany it would be quite impossible to use it in any public discussion, as Der Puff is a special kind of boys club.

avmich · on April 9, 2021

Yeah, Coq also feels the heat with name...

teraflop · on April 8, 2021

Yeah, I was curious about that as well. The README file links to a Google Groups discussion about the name change that seems to have been memory-holed, but apparently it was renamed to avoid confusion with a NetBSD component: https://news.ycombinator.com/item?id=15712659

pnathan · on April 8, 2021

This is a fascinating spin: a pure language, designed for libraries, not for complete programs. A tip of the hat to whoever was able to break out of the "a language has to do x y and z" thinking and perceive that this is a possibility.

dkersten · on April 8, 2021

Interestingly, I was thinking about this exact thing a few hours ago, before I saw this on HN. The thought was that you can design much more interesting languages by not making them be everything for everyone. Not quite domain specific, but also not quite general purpose.

I made a language a while back that was used to implement custom logic for a product (I've since replaced it with a more declarative system that's basically TOML but where values can be expressions that get evaluated to generate the actual values). One goal of this language was that it should always terminate[1], so it had no unbounded loops. Another goal was that it should be deterministic, so all input was gathered before execution and all output was accumulated to be processed at the end. The entire thing ran in a database transaction (so input could be queried, then the code was executed as a pure function of this input, then the result would be written back to the database or sent elsewhere). Externally triggered events would cause this to run. Essentially an event driven synchronous[2] language of the transformational system variety. It was slightly inspired by Lustre[3], which is used in critical systems like aeroplanes, trains and power plants. I'm a big fan of this style of language.

Basically, by constraining what the language can do or can be used for, you can design much more powerful semantics or language features for the things that it is designed for, similar to a domain specific language. I guess it really would be a somewhat more general domain specific language, or at least domain specific to multiple domains.

I was thinking about this while walking home from the shops and was wondering if such a language would be beneficial to solve some challenges I hit in my work and I was going to spend some time thinking what semantics would be useful, but haven't done so yet and then came across this HN submission. :)

[1] I was also thinking about how the halting problem doesn't really say that determining if a program halts is impossible, just that there are programs that are not computable. If you add constraints (like not being able to feed the program to itself as is done in the halting problem and not allowing unbounded loops) then it is possible to determine if a program will terminate or not.

[2] https://en.wikipedia.org/wiki/Synchronous_programming_langua...

[3] https://en.wikipedia.org/wiki/Lustre_(programming_language)

avmich · on April 9, 2021

> If you add constraints (like not being able to feed the program to itself as is done in the halting problem and not allowing unbounded loops) then it is possible to determine if a program will terminate or not.

Dhall is a good example - https://github.com/dhall-lang/dhall-haskell .

dkersten · on April 9, 2021

I'm not familiar with Dhall, but that looks pretty neat! Thanks for sharing.

kovac · on April 8, 2021

I'm not sure if languages are designed to build libraries or apps specifically. The languages I use are designed to communicate with a computer. It's the frameworks that dictate how we package a set of instructions.

So a "pure language" here is just a bs marketing term rather than any inherent feature in the language. As far as purity goes, a language like c does that just fine. That's as pure a language as it can get.

This whole opinionated "can never allocate memory" is condescending to engineers. A powerful language should be safe by default (to take some of the pressure off the developers of having to be always careful) but have the knobs to let them take full control when needed. C# does this very well.

handsaway · on April 8, 2021

Arguably Elm takes this approach to SPA development. The language spec itself is general purpose but in practice it's been developed to serve a narrow purpose (too narrow for some!)

tomthe · on April 8, 2021

Yes, an interesting concept. But I guess it won't see a wide adoption: Who wants to learn a new language, if you can't use it for anything but X? And who will use this language for X, if you haven't been able to learn it while doing Y?

dkersten · on April 8, 2021

Who would want to learn Javascript for the frontend and Python/Ruby/C#/Java/PHP/Whatever for the backend? And HTML for UI with CSS for styling?

You say this, but people do it all the time.

enneff · on April 8, 2021

People who want to do X well will learn the language. You can learn more than one language.

brundolf · on April 8, 2021

I disagree. It's just a question of how many people are trying to do what it's designed for, and how much benefit it provides for that focused domain.

peteretep · on April 9, 2021

> Traditionally, the first program anyone writes in a given programming language is something that prints "Hello world". This doesn't work for Wuffs, for two reasons. One is that Wuffs doesn't have a string type per se. Two is that Wuffs code doesn't even have the capability to write to files directly, such as to stdout. Wuffs is a language for writing libraries, not complete programs, and the less Wuffs can do, the less Wuffs can do that is surprising (such as upload your files to the internet), even when processing untrusted input.

grawprog · on April 8, 2021

To be honest, I'm not sure what to make of this. Wuff the library makes sense as a drop in for the C standard library, but the language, I'm not sure how it fits.

It seems to offer some of the features offered by languages like D and Rust, while staying more C like, but also removing one of the few actual reasons to use C, which both D and Rust also provide on top of the other features offered by Wuff.

It's cool and all but it seems confused as to whether it wants to be a library for C, an extension to C or a standalone language. As a stand alone language, I'm not sure I really see the benefits over alternatives as a C library, it does have some interesting ideas.

zokier · on April 8, 2021

The readme has more explanation:

> Wuffs (Wrangling Untrusted File Formats Safely) is formerly known as Puffs (Parsing Untrusted File Formats Safely).

> Wuffs is a memory-safe programming language (and a standard library written in that language) for wrangling untrusted file formats safely. Wrangling includes parsing, decoding and encoding. Example file formats include images, audio, video, fonts and compressed archives.

> Wuffs is not a general purpose programming language. It is for writing libraries, not programs. The idea isn't to write your whole program in Wuffs, only the parts that are both performance-conscious and security-conscious. For example, while technically possible, it is unlikely that a Wuffs compiler would be worth writing entirely in Wuffs.

chrchang523 · on April 8, 2021

The purpose of the language is clear to me: make it practical to prove that >90% of a C file-munging library is safe from several common types of errors. I will be looking into reorganizing some of my existing C library code of this type into large Wuffs components and small interaction-with-outside-world components.

mjevans · on April 8, 2021

The only part of the Wuffs spec I just read that I dislike:

Strings. I would really prefer strings to work like existing C and 'bash' style quoting. At least the simple aspects of it, the parts of the rules that are easy to remember and simple. A string should always be a sequence of octets, but easily coerced by a casting operator to a numeric format from any index. I'm not sure what the syntax for that would be offhand.

incrudible · on April 9, 2021

> I would really prefer strings to work like existing C...

The way "strings" work in C is a big source of the kind of bugs that Wuffs is supposed to prevent.

> A string should always be a sequence of octets, but easily coerced by a casting operator to a numeric format from any index

The Wuffs language just uses "a sequence of octets" in favor of a string type.

rattray · on April 8, 2021

Yeah, when I got to the bottom and saw that they don't have strings, I immediately decided not to spend any more time learning about the language.

I'm not sure what I would want them to be like for this language, but I'd definitely want something.

incrudible · on April 9, 2021

It isn't made especially clear on the linked page, but Wuffs is a language to write parsers with. It's not a "general purpose" language, though it might find use in other domains.

If you've written a parser, you will have noticed that the built-in string types of other languages are counter-productive. You really want to work with plain ranges over bytes, and Wuffs offers that to you.

incrudible · on April 9, 2021

I like the idea. The inability to do something is an often underrated feature.

ledauphin · on April 9, 2021

I'm confused by the "all functions are methods" restriction. the "what" seems clear, but the "why" is eluding me, and I'd love to read an explanation.

mkj · on April 9, 2021

How does error handling work in Wuffs? That seems to be an important aspect for a reliable language, it wasn't immediately clear from the docs.

Edit ah found it, https://github.com/google/wuffs/blob/main/doc/note/statuses....

peter_d_sherman · on April 9, 2021

>"Wuffs (Wrangling Untrusted File Formats Safely) is formerly known as Puffs (Parsing Untrusted File Formats Safely). Wuffs is a memory-safe programming language (and a standard library written in that language) for wrangling untrusted file formats safely. Wrangling includes parsing, decoding and encoding. Example file formats include images, audio, video, fonts and compressed archives.

It is also fast. On many of its GIF decoding benchmarks, Wuffs measures 2x faster than "giflib" (C), 3x faster than "image/gif" (Go) and 7x faster than "gif" (Rust).

Goals and Non-Goals

Wuffs' goal is to produce software libraries that are as safe as Go or Rust, roughly speaking, but as fast as C, and that can be used anywhere C libraries are used. This includes very large C/C++ projects, such as popular web browsers and operating systems (using that term to include desktop and mobile user interfaces, not just the kernel).

Wuffs the Library is available as transpiled C code. Other C/C++ projects can use that library without requiring the Wuffs the Language toolchain. Those projects can use Wuffs the Library like using any other third party C library. It's just not hand-written C.

However, unlike hand-written C,

Wuffs the Language is safe with respect to buffer overflows, integer arithmetic overflows and null pointer dereferences.

A key difference between Wuffs and other memory-safe languages is that all such checks are done at compile time, not at run time. If it compiles, it is safe, with respect to those three bug classes.

The trade-off in aiming for both safety and speed is that Wuffs programs take longer for a programmer to write, as they have to explicitly annotate their programs with proofs of safety. A statement like x += 1 unsurprisingly means to increment the variable x by 1. However, in Wuffs, such a statement is a compile time error unless the compiler can also prove that x is not the maximal value of x's type (e.g. x is not 255 if x is a base.u8), as the increment would otherwise overflow. Similarly, an integer arithmetic expression like x / y is a compile time error unless the compiler can also prove that y is not zero.

Wuffs is not a general purpose programming language. It is for writing libraries, not programs. The idea isn't to write your whole program in Wuffs, only the parts that are both performance-conscious and security-conscious. For example, while technically possible, it is unlikely that a Wuffs compiler would be worth writing entirely in Wuffs."

PDS: Would like to see a future AV1 / AOM / libaom / FFmpeg -- written/compiled in Wuffs...

chubot · on April 8, 2021

Wuffs seems fascinating and I really wanted to like it. But when I look at the code for the JSON decoder it seems so low level, and full of places for bugs to hide. JSON is a pretty simple spec and this obscures it (although to be fair it's also handling UTF-8).

https://github.com/google/wuffs/blob/main/std/json/decode_js...

Yes it prevents buffer overflows and integer overflow, but it can't prevent logical errors.

I'd rather see efficient code generated from a short high level spec, not an overwhelming amount of detail in a language verified along a few dimensions.

---

Logical errors in parsing also lead to security vulnerabilities. For example, here is an example of parser differentials in HTTP parsing:

https://about.gitlab.com/blog/2020/03/30/how-to-exploit-pars...

The canonical example of this class of bug is forging SSL certificates to take advantage of buggy parsers, but I don't have a link handy. There should be one off of https://langsec.org/ if anyone can help dig it up.

Again, this has nothing to do with buffer or integer overflows.

(aside: while googling for that I found the claim that mRNA vaccines work by parser differentials: https://twitter.com/maradydd/status/1342891437537505280?lang... If anyone understands that I'd be curious on an opinion/analysis :) )

At the very least, any language for parsing should include support for regular languages (regexes). The RFCs for many network protocols use this metalanguage, and there's no reason it shouldn't be executable. They compile easily to efficient code.

The VPRI project claimed to generate a TCP/IP implementation from 200 lines of code, although it's not really a fair comparison because it hasn't been tested in the wild: https://news.ycombinator.com/item?id=846028 .

Still I think that style has better engineering properties. Oil's lexer, which understands essentially all of bash, is generated from a short source file

https://www.oilshell.org/release/0.8.8/source-code.wwz/front...

which generates

https://www.oilshell.org/release/0.8.8/source-code.wwz/_devb...

which goes on to generate 28,000 lines of C code. It's short, but it really needs a better regex metalanguage to be readable: https://www.oilshell.org/release/latest/doc/eggex.html

A large part of JSON can be described by regular languages, and same with HTTP, etc.

-----

edit: An re2c target for wuffs could make sense. The generated code already doesn't allocate any memory, although it uses tons of pointers which could be dangling.

And in fact that was a problem Cloudflare, which sprayed the user data of their customers all over the Internet back in 2017: https://en.wikipedia.org/wiki/Cloudbleed

That was with Ragel and not re2c, which perhaps has a more error prone API.

summerlight · on April 8, 2021

> I'd rather see efficient code generated from a short high level spec

This is a holy grail for many PL researchers, but I don't think that there's any languages that reached this level of sophistication with expressiveness/practicality enough for production usages. At least with the status quo, you will probably need to write a massive amount of formal proofs if you want logical correctness, even with deceivingly simple specifications.

chubot · on April 8, 2021

It doesn't have to be the same metalanguage for every program. You can write your own code generators adapted to the specific problems. They should have a "pit of success", and the knowledge of the domain is used to ensure that.

There are few research-level issues here; it's just good engineering.

The 0% or 100% mindset is bad engineering. You want something that's short, and that you can explain to other people, and that other people can write an independent implementation of. If the proof is 10x longer than normal code, and it's written in a metalanguage that the relevant people don't know, then it's not very useful.

Proofs are not guarantees. CompCERT has had logic bugs despite being written in a formal language. (It does reduce the number of bugs drastically in general, but it's also an extremely expensive technique, and not what I'm advocating.)

There are no guarantees in engineering, just good practices. Groveling through bytes one at a time in imperative languages is not an ideal engineering practice, even if the imperative language comes with more guarantees than most.

dmitriid · on April 8, 2021

> JSON is a pretty simple spec a

Yet there are no proper implementations because it's too simple, sometimes ambiguous, and there are several standards. JSON parsing is a minefield, http://seriot.ch/parsing_json.php

chubot · on April 9, 2021

JSON is simple compared to any textual format and many binary formats. Compare it with HTTP 1.1, 2, 3, or say Apache Arrow.

JSON is a compromise, and specifies only syntax, without semantics, and without an API. But I think that was a good tradeoff compared to say XML syntax, schemas, and APIs like SAX and DOM. You can build a lot of stuff on top, and people have.

kstenerud · on April 9, 2021

Unfortunately, the lack of specification is becoming a security liability in modern times. That's a big reason why I'm developing https://concise-encoding.org/

- 1:1 compatible binary and text formats. Edit in text, send in binary.

- The binary encoding is super simple, and what's in use 99% of the time since it's usually machines talking to each other. Text is only for humans to see the data and to input the data.

- The format is more complex than JSON, but FAR better specced to avoid variance in implementations (important for security).

lifthrasiir · on April 9, 2021

Tony Garnock-Jones (known for AMQP and RabbitMQ) is also developing Preserves [1] which I found promising.

[1] https://preserves.gitlab.io/preserves/

umanwizard · on April 9, 2021

Avro for example is extremely complex compared to json.

fulafel · on April 9, 2021

Bencode is nice and simple arguably text based format, though needs a bit of editor support to be really human friendly.

Ericson2314 · on April 9, 2021

No, binary formats are often simpler. Really anything that involves text is hugely complicated.

chubot · on April 9, 2021

Not sure how this relates to what I said, maybe you misread it.

refulgentis · on April 9, 2021

The first sentence: "JSON is simple compared to any textual format and many binary formats. "

SonOfLilit · on April 9, 2021

The Pfizer vaccine indeed contains an important parser differential exploit to evade a security system. Ctrl+F "1-methyl-3’-pseudouridylyl" in this excellent post:

https://berthub.eu/articles/posts/reverse-engineering-source...

lancepioch · on April 8, 2021

So a similar sense to Haxe: https://haxe.org

chubot · on April 9, 2021

I think they're pretty different. Haxe is designed to compile to many different languages, sort of like a least common denominator.

Whereas Wuffs translates only C, and is pretty semantically close to it. Its goal is really safety, while Haxe seems to be a portability (e.g. running the same game on many platforms).