> I don’t know yet if the implementations of this yet are good enough to use at scale. Maybe they’re slow or maybe the bugs aren’t ironed out yet.
This technique is very well supported in the big commercial engines. In MSSQL's Indexed View case, the views are synchronously updated when the underlying tables are modified. This has implications at insert/update/delete time, so if you are going to be doing a lot of these you might want to do it on a read replica to avoid impact to production writes.
What a great post. Humble and honest and simple and focused on an issue most developers think is so simple (“why not just vibe code SQL?”, “whatever, just scale up the RDS instance”).
Compliments aside, where this article stops is where things get exciting. Postgres shines here, as does Vitess, Cassandra, ScyllaDB, even MongoDB has materialized views now. Vitess and Scylla are so good, it’s a shame they’re not more popular among smaller startups!
What I haven’t seen yet is a really good library for managing materialized views.
Curious if anyone know any implementation where they would be automatically updated?
Now that would be awesome!
EDIT: come to think of it, it would require going through CDC stream, and figuring out if any of the tables affected are a dependency of given materialized view. Maybe with some ast parsing as well to handle tenants/partitions. Sounds like it can work?
> (Technically speaking, if 100 people load the same page at the same time and the cache isn’t populated yet, then we’ll end up sending 100 queries to the database which isn’t amazing, but let’s just pretend we didn’t hear that.)
Isn't their tech to address that, like golang's "singleflight"?
> I don’t know yet if the implementations of this yet are good enough to use at scale. Maybe they’re slow or maybe the bugs aren’t ironed out yet.
This technique is very well supported in the big commercial engines. In MSSQL's Indexed View case, the views are synchronously updated when the underlying tables are modified. This has implications at insert/update/delete time, so if you are going to be doing a lot of these you might want to do it on a read replica to avoid impact to production writes.
https://learn.microsoft.com/en-us/sql/relational-databases/v...
https://learn.microsoft.com/en-us/sql/t-sql/statements/creat...
What a great post. Humble and honest and simple and focused on an issue most developers think is so simple (“why not just vibe code SQL?”, “whatever, just scale up the RDS instance”).
Compliments aside, where this article stops is where things get exciting. Postgres shines here, as does Vitess, Cassandra, ScyllaDB, even MongoDB has materialized views now. Vitess and Scylla are so good, it’s a shame they’re not more popular among smaller startups!
What I haven’t seen yet is a really good library for managing materialized views.
> And then by magic the results of this query will just always exist and be up-to-date.
With PostgreSQL the materialized view won't be automatically updated though, you need to do `REFRESH MATERIALIZED VIEW` manually.
Just landed here to write this. Materialized Views are _very_ implementation specific and are definitely _not_ magic.
It's important to understand how your implementation works before committing to it.
Curious if anyone know any implementation where they would be automatically updated?
Now that would be awesome!
EDIT: come to think of it, it would require going through CDC stream, and figuring out if any of the tables affected are a dependency of given materialized view. Maybe with some ast parsing as well to handle tenants/partitions. Sounds like it can work?
MSSQL and Sybase SQLAnywhere has options for that, we use it a fair bit with both.
At least on SQLAnywhere it seems to be implemented using triggers, ie not unlike what one would do if rolling your own.
Postgres materialized views are pretty terrible / useless compared to other rdbms. I’ve never found a usecase for the very limited pg version.
having a dataset refresh on a timer and cache the result for future queries is pretty useful
> (Technically speaking, if 100 people load the same page at the same time and the cache isn’t populated yet, then we’ll end up sending 100 queries to the database which isn’t amazing, but let’s just pretend we didn’t hear that.)
Isn't their tech to address that, like golang's "singleflight"?