Subscribe to:
Apple Podcasts Google Podcasts Soundcloud Spotify Cloudy podcast channel
Transcript
Introduction and welcome [00:05]
Wes Reisz: The CAP theorem, also called Brewer’s theorem, is often cited when discussing availability guarantees for distributed systems (or a system of networked interconnections). In general, it is said that if you have, say, a database that runs on two or more computers, you can have linearity (a specific sequence requirement that, I’ll just say, limits what outputs are possible when an object is available from multiple processes at once) or availability, which means you can always get an answer.
You may have heard the CAP theorem described as sequence availability or partition tolerance, select two and by the way you must select partition tolerance. Today in the podcast we talk to Oren Eini, CEO of Hibernating Rhinos and founder of RavenDB. RavenDB is a fully transactional open source NoSQL document database. RavenDB includes a cluster of distributed database, automatic indexing, easy-to-use graphical interface, the highest security, memory performance for a permanent database.
Today we will discuss the background of RavenDB. We’ll discuss how Raven thinks about the CAP theorem, a spoiler warning, thinks about it on two different levels, a little about choosing an implementation language, C #, and finally managing indexes with Raven. As always, thank you for joining me during your jogs, walks and trips to work. Oren, welcome to the podcast.
Oren Ain: Thank you for having me.
Where does the name Hibernating Rhinos come from? [01:22]
Wes Reisz: So the first question I have is right behind you on the screen. People who listen can’t see your screen, but it says Hibernating Rhinos, the name of the company behind RavenDB. where does this come from
Oren Aeney: This is a joke. This is an old joke. So around 2004 (18 years ago) I started the NHibernate project, which is a Hibernate ORM port to dotnet. I have been very busy with this for a long time. I have also participated in several open source projects and the motto for most of these projects was rhino something. Rhino Mod, Rhino Service Bus, Rhino DistributedHashTable and others. So when I needed a name, I just broke these things and Hibernating Rhinos came out and it was incredibly funny. Just to be clear, I live in Israel, which means that every time I have to go to the store and buy a pen, and then they ask me, “Who do you want the invoice for?” I have to tell them, “Wintering rhinos.” “. Then I have to tell them how to spell it in Hebrew. So right now, I’d really like to call the company “One, Two, Three” or something. Even just for that reason.
Wes Reisz: Now you go back to my early days. I came out as a Java developer and then moved on to working with C # for some different companies and for some different roles I’ve been in. So I remember NHibernate very well when I was trying to catch similar tools that I was familiar with in the world of Java to bring to the world of C #. So you definitely brought back memories for me.
Oren Ain: Triggered, I think that’s the right word.
What is the problem that RavenDB solves? [02:57]
Wes Reisz: It is triggered. That’s a good word, huh? So in the introduction I mentioned that Raven is a completely transactional, NoSQL document, go deeper than that. What is the problem that RavenDB solves?
Oren Aeney: Go to 2006-7, there around. At the time, I was much larger than the NHibernate community. I moved from client to client, building business applications and helping them achieve their productivity, reliability, all the usual things they want to have. At the time, the most popular databases I worked with were SQL Server MySQL, Oracle sometimes, things like that, and every single client was having problems. Now working with NHibernate, I had a very deep level of knowledge on how to get NHibernate to do things. I was also the commission, so I could change the code base and have them there. It felt like this endless journey of trying to make things work. At one point I realized that the problem was not in the tools I had, the problem was the real problem I was trying to solve. Also, keep in mind that this time the domain-driven design was in vogue and everyone wanted to build the right model and domain for their system and build things right.
But there was this huge impedance mismatch between the way you present your data in memory, the storage of type objects, a lot of interconnections and complex things. Then you need to hydrate this in a database and, oh my God, you have 17 tables for one object. If you want to have complex object graphics with polymorphism and queries and things like that. Where you have a discriminatory table, you have a union table, you have a partial inheritance of the table. We’re also in a podcast, but folks, if you can see his face right now, he’s like a trigger, a trigger, a trigger everywhere. The problem was, you can’t do that. The database he uses is actually fighting you to do that. You had options for basically building transaction scripts, and even if you want to create a transaction script that’s basically everything to read and write to a database, we build such complex applications today, and 15-20 years ago it was the same in many ways. That the amount of going back and forth you had to go through with the app’s database was huge.
So I started looking for other options and at that time I came across NoSQL databases and it was amazing because it broke the assumption in my head that the relational model is how you store data for consistency. So I started looking at all these non-relational databases, and there were a lot of them at the time. In 2007 or so, there came Dynamo Paper from Amazon, which had a huge impact on the database scene. There were all kinds of projects, like Riak and Project Voldemort, a lot of other things. MongoDB and CouchDB at the time and they all gave you a much better system to deal with many of these problems.
However, if you really look at what you are giving up, it was awful. NoSQL databases usually have no transactions, which is good if you’re building a system that can dump data. It is known that MongoDB is designed to store logs and well, we lose one or two records from time to time. Oh, no big deal, that wasn’t the reason it was. So transactions are something that happened to someone else, but then you realize that, well, I want to create a business application, and I actually care about my data. I cannot accept any transactions. Some databases say, “You can get transactions if you write these terribly complicated terms to do this.” Which isn’t really safe yet, but it seems so. So I started to think that I wanted to have a database that would be a transaction, to give me all the usual benefits I get from a relational database, but also to give me the right modeling I want.
Wes Reisz: What did this first MVP look like? So, you set out for this purpose, what came out?
Oren Ein: So the first version was basically, there’s a storage engine that comes with Windows called ESENT. This was recently open source, as it was two or three years ago. This is the dealer that comes with each version of Windows that implements ACID and ISAM, a sequential sequential access index method. This is very old terminology and I’m talking about the 70’s, even the 60’s, but it’s an amazing technology in many ways. I took this as the ACID layer I had, included Lucene for indexing, mixed them together with a fairly permitted user interface at the time. Then I created the client API, which largely mimics API, NHibernate and other Oms. This allows you to work with real domain objects and save them in a transactional way with all the usual things like sharing, tracking and all that. On top of that, you had a full text search, you had the ability to create the data in any way, and the database will just answer you.
Wes Reisz: So fast forward 15 years, 16 years since the release of this MVP. When you look at the NoSQL landscape, it’s definitely crowded, but there’s a lot of overlap. So where does RavenDB fit into the landscape, next to things like Cassandra and MongoDB in general? Where is RavenDB different and how is it different?
Oren Aeney: RavenDB is a document database that stores documents in JSON format. In addition to JSON documents, you can also add additional data to the document, the data can be attached files. So some binary data of any size and we like to think of them as email attachments, which proved to be a really useful feature, you can store distributed counters. This is very similar to the Cassandra function, where you can have a high-performance counter that is changed to many nodes at once and converted to the correct value. Again, this is a value that you can attach to a document. So a great example if you have an Instagram post and want to count the number of flags. So this is something very easy to do with this type of feature. We also have revision support so you can have an automatic audit of any changes that happen to a document.
Finally, we have the concept of time sequence, we can attach a value that changes over time to a document. The idea here, for example, is I have a smartwatch, so my heart rate is tracked over time and I can add it to RavenDB. The idea here is that each of these four, five different aspects of a document has its own dedicated and …
Add Comment