Timescale Interview English | embedded data GmbH

You have built up an impressive community very quickly, and many of your specialists work remotely from all over the world. How did TimescaleDB come about in the first place, and how did you manage to grow globally so quickly? Was this your goal from the beginning?

When we started Timescale, we didn’t initially set out to build the leading relational databases for time-series data. We were initially building a platform for IoT data – we similarly saw a future where all of our devices are connected, especially industrial ones, generating high volumes of data. And developers needed a way to easily collect, store, and analyze that data. But when building that platform, we quickly grew unhappy with the existing time-series database options. They failed to provide the scale and performance we needed, especially for high cardinality data. They weren’t sufficiently reliable or flexible for our needs. Their SQL-ish dialect failed to meet our needs in important ways; for example, we couldn’t JOIN our device metrics with additional information about the devices. The list went on.

Thinking more architecturally about the nature of IoT and time-series workloads, we figured out how we could build what we needed by extending PostgreSQL: Inherit all its goodness, but now supercharge it for time-series data. We quickly saw that this project – which we initially built internally to satisfy our own needs – had a much larger and broader demand. So we refocused the company and released TimescaleDB as open-source in early 2017.

We’ve been growing quickly since then, both in terms of our community and company. And being a widely distributed team has certainly helped.

For most of our history, at least 60% of our organization has been working remotely from various locations, while the remaining 40% were in one of our two offices in New York and Stockholm. But then the COVID-19 pandemic made us and all teams 100% remote. (We share more about our experience building remote-first team culture on our blog.)

We have grown significantly in the past few months. We realized that if we want to attract the best talent, we need to make it simple - and encouraged! - to work wherever people want to work. So, since March 2020, we are a fully remote company with employees on every continent (except Antarctica). And, if you want to join us and build the next great database company, we are hiring across all teams and departments!

Most time-series databases are centered in the NoSQL movement. You decided to endow an established SQL database with superpowers. What motivated you to take this approach, and what are your strengths?

Typically, the reason that most developers adopt NoSQL time-series databases is the perception (incorrect, in my opinion) of scale. While relational databases have many useful features that most NoSQL databases do not (e.g., rich schemas and transactional operations, robust secondary indexing, advanced query planning, support for complex predicates, a rich query language, JOINs and foreign keys, etc.), before TimescaleDB they were not designed specifically around time-series workloads.

We always believed that relational databases could be pretty powerful for time-series data if carefully designed for time-series workloads. When we launched the first version of TimescaleDB in April 2017, we got a lot of positive feedback but also heard from the skeptics. They found it hard to believe that it is possible to build a scalable time-series database on a relational database. (We’ve proven that this skepticism is unwarranted; in several benchmark analyses, TimescaleDB surpasses NoSQL database performance for time-series workloads.)

We decided to build TimescaleDB on top of PostgreSQL because we knew we could build on top of a rock-solid foundation and enable developers to use the skills and tools they already knew and loved. We also just liked PostgreSQL’s reliability and ease of use – and we’re not the only ones: PostgreSQL has a huge ecosystem and consistently ranks among the top databases.

Our customers say that the support of full SQL, combined with massive scale, our hybrid row-columnar compressed storage, and continuous aggregates are key differentiators for TimescaleDB. And, as I mentioned, because TimescaleDB is built on PostgreSQL, it inherits PostgreSQL’s inherent reliability, functionality, and tooling. And now, with our horizontally scalable architecture in TimescaleDB 2.0, combined with our impressive 94% - 97% compression rates, one can reach a petabyte-scale with all the goodness of PostgreSQL.

By using continuous aggregates, we can offer impressive performance to the users of our IoT platform. Did you focus on the use of your database in IoT from the beginning?

Very happy to hear about your success with continuous aggregates. They are a powerful capability that’s very useful in IoT applications. You define what looks like a materialized view on your data, and the database incrementally maintains it: Say, give me each device’s min, max, and average reading per hour, so I can quickly perform longitudinal queries that scan months or years of data.

But let me touch on two unique aspects of TimescaleDB’s continuous aggregate that we’re especially happy with. First, they correctly, seamlessly, and efficiently handle backfill. Let’s say some data comes in late, and you’ve already computed a “rollup” for that time period. It’s now not accurate. But TimescaleDB properly tracks “invalidation records”, so it transparently and asynchronously just recomputes those narrow regions that have late data.

Second, it supports real-time aggregates. You want historical data, but also the latest readings, and you don’t want your database to continuously update the latest aggregation on every insert - it’s not efficient. So – again, totally transparent – when you query a continuous aggregate, the database engine combines the previously calculated regions, with the latest raw data. So your query is always up-to-date but fast and easy for users.

We’re continuing to expand more use cases with continuous aggregates over the coming months.

As I mentioned, we think a lot about IoT use cases because our roots at Timescale came from IoT. We were storing hundreds of thousands of time-series data points from the IoT devices and needed a database to store all this data. At that time, we were using two databases: a NoSQL database for storing time-series data and PostgreSQL to store our relational data. This led to problems: it fragmented our dataset into silos, led to complex joins at the application layer, and required us to maintain and operate two different systems.

We weren’t happy with this setup. It was operationally complex, didn’t allow us to ask the questions we wanted, and wasn’t performant.

We - and many developers - love and use PostgreSQL for many reasons, and we built TimescaleDB to combine the best of both worlds: a relational database for time-series data.

But, while we started in IoT, time-series data is everywhere and in the last few years, time-series workloads have exploded for many reasons (e.g., growth in IT monitoring, product analytics, financial data and crypto, gaming, machine learning, logistics, and much more).

As my co-founder, Ajay wrote in our Series B fundraising announcement, “Everyone wants to make better data-driven decisions faster, which means collecting data at the highest fidelity possible. Time-series is the highest fidelity of data one can capture because it tells you exactly how things are changing over time. While traditional datasets give you static snapshots, time-series data provides the dynamic movie of what’s happening across your system: e.g., your software, your physical power plant, your game, your customers inside your application.”

What new features can we expect in the near future that will benefit the users of our IoT Platform?

We have many new capabilities and enhancements in the works, both for specific capabilities - like hyperfunctions and easier analytics, continuous aggregates, compression, and multi-node - and at the product level. Our docs always include the full release notes for details about the latest features, improvements, and our future plans.

Interview with Mike Freedman, co-founder / CTO of Timescale

Interview with Mike Freedman,
co-founder / CTO of Timescale