joins are bad, mkay?

Graph databases are great. But if I read more FUD about RDBMSs, I’m going to have another flashback to the 80’s where someone is telling me about how you don’t want to dismantle your car every time you park it in the garage (OODBMS reference for the oldies).
I recently read a graph-database book that explained that every SQL join requires computing a cartesian product. And yesterday, a post that explained that equijoins have an exponential order of complexity. (I guess… if the exponent is 1.) It’s getting very frustrating to read through the inaccuracies and FUD, and now, to me, it is all just turning into


Which is why it was nice to see @guyharrison‘s even-handed and accurate treatment in his new book Next Generation Databases, and then to see this on LinkedIn (thanks to Shahram Ghandeharizadeh at USC for posting it and @markcallaghan for tweeting it).  I’m reading the paper that used Neo4j and SQL-X first.  I look forward to learning more about the many forms of data storage, access and usage, based on real research and data.  Graph DB writers: don’t ruin your case by turning it all into “joins are bad, mkay” kinds of regurgitations.


9 Responses to “joins are bad, mkay?”

  1. Justin Swanhart Says:

    Well to be just a little fair, it would be nice if mysql had more than nested loops. This graph database thing ignores the performance characteristics of many algorithms for joins and pretends that there is one join to rule them all. Just not true on so many levels, which is ironic considering we are talking about a graph.

  2. Justin Swanhart Says:

    Look how redshift with carefully crafted sortkey attributes can bypass the sort phase of the sort/merge join. It is incredibly fast at joining presorted data! Cartesian product my ass.

  3. Jan Steinman Says:

    Still, it would be nice if open-source databases supported graphs better.

    I’ve been struggling with trying to use OQGRAPH to do graphs. It seems that if you want WITH RECURSIVE, you have to pay big bucks for the big-iron databases.

    • Justin Swanhart Says:

      This is a reply saying to read my other comments, because I didn’t use reply. Let me know if you need any assistance. There are some other solutions that might work too like this one:

    • Ben Krug Says:

      Hi Jan –

      Thanks for your comment!

      It’s certainly true that different products make different tasks simpler, or faster. I’m just asking for educated honesty in the comparisons (and marketing)!

      Regarding WITH RECURSIVE, as Justin replied, you can do that with stored procs in MySQL. PostgreSQL does support WITH RECURSIVE directly, as well.

      As far as open-source databases supporting graphs, there are also options like OrientDB or ArangoDB, multi-modal databases. It’s an interesting time!

      But your point is well-taken, and I believe change is inevitable! 🙂

  4. Justin Swanhart Says:

    Use a stored procedure Jan – it can be recursive. I have an example of a recursive stored procedure in Flexviews. It is the procedure that calculates deltas for the view:

    See execute_refresh_step – it calls itself and stores state into a temp table and tracks depth of the recursion with a session variable.

  5. Justin Swanhart Says:

    Sorry, execute_refresh() is recursive. execute_refresh_step is not. complex code I don’t normally have to look at 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: