Hacker News new | past | comments | ask | show | jobs | submit login
Dynamic Typing in SQL (rockset.com)
97 points by canadi on Nov 2, 2018 | hide | past | favorite | 24 comments



SQLite is also dynamically typed, https://www.sqlite.org/datatype3.html


hi, this is Venkat from Rockset.

yes, that is correct. We are big fans of SQLite too.


Just learned about Rockset today. Seems interesting but that pricing is way off considering that a columnar data warehouse like Snowflake has great JSON support and can handle way more data for far less. For 10M rows, even a single-node Postgres or MongoDB install will run just fine.

It's not an OLTP system, so is the primary use-case for Rockset to be an OLAP system with low response times?


Hey, Igor from Rockset here.

Rockset’s primary use-cases are: 1/ developers building low-latency operational applications, esp. combining real-time data sets with other structured data sets (eg: you are building a microservice to relieve pressure from your OLTP system) 2/ data scientists wanting to quickly test hypotheses on different structured and semi-structured datasets without having to stand-up any servers or do any ETL or data prep. (you can suspend collections/documents in Rockset when you don’t use them -- our pricing page currently only lists Active Documents’ pricing)

Rockset is mutable which allows it to keep itself in sync with any data source, unlike columnar data warehouses, which are not optimized for data manipulation.

Rockset’s strong dynamic typing allows it to treat JSON as a data representation format rather than a special data type or a storage format. So, once you load JSON data into Rockset, you can access all fields at all levels without any special JSON operators or functions.

Comparing Snowflake with Rockset is perhaps akin to comparing Teradata with Elasticsearch. Both useful systems but built for very different use cases.

The biggest thing Rockset has in common with Snowflake is in sharing the philosophy that data management systems have to be built ground up for the cloud to take full advantage of cloud economics. Our blog (https://rockset.com/blog/) has a few posts on these already and we will write more.


The SQL with dot notation is nice, and something other databases have failed at doing with JSON. As for updates, it looks like _id is set automatically, and there's no UPDATE endpoint listed in the docs, so how is data mutated (by key)?

And my comment was more about the price, because at low volume of 10M records, any database system can already do fast queries. From the whitepaper, it looks like the indexing is intensive and that's where most of the cost is coming from, and explains the active/passive storage and tiered rocksdb-cloud setup.

Interesting use-case for prototyping but I don't see how it is cost-effective for higher-scale usages. Congrats on the launch though.


add_docs() API always UPSERTS and so yes, updates are through "_id". The system auto-assigns an "_id" only when it is not supplied by the user or when an existing field is not mapped as the "_id" field at collection creation time. You will have to use delete_docs() before add_docs(), if you want replace-document behavior.

Our backend architecture is quite scalable and actually grows and shrinks with the demand continuously.

And yes, all documents are automatically indexed and replicated for fast query performance, which is more expensive than just storing them in "_id"->"doc" format. For our use cases and value prop, this one time indexing cost pays for itself several times over by saving time during query processing.


> Strong, because values have one specific type (or None)

Because its type system does not provide implicit type conversion I guess. But some languages have a stronger type system with, in addition, highly qualified types (e.g. constness, range, sign or any custom contract).

> Dynamic, because variables acquire type information at runtime, and the same variable can, at different points in time

This is not a characteristic of dynamic typing, some statically typed languages supports variable shadowing in the same scope. A dynamic (or hybrid) typing system is IMHO simply a system which provides a variant type as first class citizen.


> But some languages have a stronger type system with, in addition, highly qualified types (e.g. constness, range, sign or any custom contract).

Sure. Those are overly simplistic categories, but a full discussion of type systems in programming languages would be out of scope for that post.

> some statically typed languages supports variable shadowing in the same scope

Shadowing is a different issue. Shadowing refers to two different variables (which happens to have the same name), so it's orthogonal to whether the same variable is allowed to hold values of different types at different times.


This is not a characteristic of dynamic typing, some statically typed languages supports variable shadowing in the same scope. A dynamic (or hybrid) typing system is IMHO simply a system which provides a variant type as first class citizen.

You're welcome to your humble opinion.

Your opinion is not generally accepted, though. The most common definitions of static versus dynamic typing boil down to whether both names and values have types and those types must be compatible (static) or whether only values have types and no checking for compatibility with names is required (dynamic). That is:

    a = 3
    a = 'foo'
The above probably fails in a statically-typed language, because 'a' will be a name of a type incompatible with the value 'foo'. In a dynamically-typed language it's perfectly OK.


> The above probably fails in a statically-typed language

I am not sure that this is always correct. I am pretty sure that your example would be accepted in some static type systems with union types such as https://www.cl.cam.ac.uk/~sd601/papers/mlsub-preprint.pdf


You know, I considered the possibility someone would "well, actually..." me there, and I thought about putting in a pedantic "unless the language you are using supports a type which can be either an integer or a string, and you have previously indicated that the name in question is of such a type, and the values you are attempting to assign are legal for that previously-indicated type".

But then I decided I wanted to live in a world where people don't need to write a Ph.D. thesis with a hundred pages of footnotes covering every conceivable logically-possible eventuality just to make a simple point.


> and you have previously indicated that the name in question is of such a type

This is not needed under the inference system proposed in that paper.

> But then I decided I wanted to live in a world where people don't need to write a Ph.D. thesis with a hundred pages of footnotes covering every conceivable logically-possible eventuality just to make a simple point.

Have you considered finding better examples instead? I think that a better explanation on the differences between static and dynamic typing would be to simply say that in dynamic typing type checking happens during the runtime while in static typing type checking happens during the compile time.


Have you considered finding better examples instead?

Have you considered being a better person? Rushing to "well actually" someone else over even the tiniest omission in something they say is not a positive character trait. What you should do is immediately and permanently and irrevocably stop doing it.


> Rushing to "well actually" someone else over even the tiniest omission

Except that I don't believe that it was a tiny omission - especially when you are trying to correct someone.

> is not a positive character trait

Neither is saying "You're welcome to your humble opinion" and continuing with an incorrect example to be honest.

> What you should do is immediately and permanently and irrevocably stop doing it

No can do. I have multiple times fallen "victim" to incorrect examples or claims online - and have multiple times also been "saved" by people posting corrections (even minor ones). If this is something that I can prevent from happening to others, is the ethical thing not to post a correction?

In any case, I did not mean to insult you nor was my post meant to be an aggression to you. I would suggest for the future to take corrections with a more open mind - not everyone is after you.


There are also static type systems that support dynamic inheritance (eg the one designed for Cecil) where the type of a value can change over time.


SQL types are actually default nullable, which puts them to the "not typed enough" category for my taste.


Then you really won't like it that we're moving in the opposite direction :)

I personally like strong, static type systems. But a lot of real world data doesn't match static schemas, which is why businesses hire data engineers to (build systems to) clean it up.


Oh, the real world is strongly typed. It's just the that the type is "json.Value" :(

I of course recognize that a database is not very useful if you can't get your data in there.


It would be helpful to see a side by side comparison with Postgres JSON schemas/queries.


Even supposing this is an easier way to do without constraints, the effort to weaken static guarantees should be looked at askance. The great merit of Postgres' JSON types is that they <i>can</i> be used without losing strong data integrity guarantees.


Yes. In our case, though, Rockset is not a transactional database, but a query engine on top of existing data, so we're not the right place to enforce constraints anyway.


That's on my TODO list, both in terms of performance and in terms of features / ease-of-use. (Another interesting thing to compare against is SQL++ (https://ci.apache.org/projects/asterixdb/sqlpp/manual.html) which has a couple of usability affordances that we might adopt.)


And a comparison to CockroachDB which also has excellent JSON support along with all the benefits of SQL. See: https://www.cockroachlabs.com/blog/json-coming-to-cockroach/


I think the main difference between our flavor of SQL and CockroachDB's (or PostgreSQL's) is that we don't have a specific "json" data type, but rather:

1. We don't need you to specify the types of any fields, indeed, we don't even need to know upfront which top-level fields ("columns") may exist in the document.

2. You don't need to use special operators / functions to deal with JSON-typed values.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: