joins are bad, mkay?

February 18, 2016

Graph databases are great. But if I read more FUD about RDBMSs, I’m going to have another flashback to the 80’s where someone is telling me about how you don’t want to dismantle your car every time you park it in the garage (OODBMS reference for the oldies).
I recently read a graph-database book that explained that every SQL join requires computing a cartesian product. And yesterday, a post that explained that equijoins have an exponential order of complexity. (I guess… if the exponent is 1.) It’s getting very frustrating to read through the inaccuracies and FUD, and now, to me, it is all just turning into

badjoins

Which is why it was nice to see @guyharrison‘s even-handed and accurate treatment in his new book Next Generation Databases, and then to see this on LinkedIn (thanks to Shahram Ghandeharizadeh at USC for posting it and @markcallaghan for tweeting it).  I’m reading the paper that used Neo4j and SQL-X first.  I look forward to learning more about the many forms of data storage, access and usage, based on real research and data.  Graph DB writers: don’t ruin your case by turning it all into “joins are bad, mkay” kinds of regurgitations.

on CAP

January 19, 2016

The “CAP theorem” is often presented as something deep and hard to understand, and then discussed in confusing terms, to make sure it is hard to understand.  Really, it’s almost common sense – eg, if you have a partition, you have to give up availability or consistency.  Anyone familiar with, eg, drbd/split-brain/STONITH knows all about this.

I was reading a bit about it (again), and I use the sneer quotes on “theorem” because I think that the “proof” is really just adding some mathematical terms to the mix and restating it – QED.  (If this is considered the proof – “Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services” .)  “Conjecture” doesn’t seem right either.  Maybe CAP observation?

It’s refreshing to find an update written by Eric Brewer (the originator) in 2012, “CAP Twelve Years Later: How the “Rules” Have Changed“.  It’s very readable (down-to-earth), interesting, and informative.  I recommend it to anyone interested in the subject who hasn’t already read it.

 

long, strange trip

November 25, 2015

Thanks to google’s insatiable appetite for scanning and offering documents, you can take a trip back to a 1994 issue of Computer World magazine, read about Windows NT, cc:Mail, pentium processors, and see an ad for Oracle CDE, featuring yours truly looking over his shoulder.  (Remember Oracle CDE?)  This was my first Oracle DBA gig, and the team received a Smithsonian Award for a global client/server customer information system that we built on Oracle 7 (and DB2, via Oracle Transparent Gateway).

Tarski and Codd

November 25, 2015

Wikipedia says that “Relational calculus is essentially equivalent to first-order logic, and indeed, Codd’s Theorem had been known to logicians since the late 1940s.”  I couldn’t find the cited sources online, but did find these interesting papers:

Applications of Alfred Tarski’s Ideas in Database Theory

and

Tarski’s influence on computer science.” (see the section starting “The final thing I want to tell something about is the connection of Tarski’s ideas and work with database theory.”)

If you’ve studied mathematical logic (or math, eg, topology), you are probably familiar with Tarski’s name.  The historical development of mathematical logic and relational database theory is an interesting topic I’d like to understand better.    Codd’s 1970 paper is fairly readable, and this 1972 paper is related to the correspondences between the different approaches taken.

Two Common Reasons for Replication Lag

October 5, 2015

As a MySQL support engineer, I see this so often, that I felt it could help to write a post about it.

Customers contact us to ask about replication lag – ie, a slave is very far behind the master, and not catching up. (“Very far” meaning hours behind.)

The most common reason I encounter is databases having InnoDB tables without explicit primary keys. Especially if you are using row-based replication (“RBR”), you want explicit primary keys on all your tables. Otherwise, MySQL will scan the entire table for each row that is updated. (See bug 53375 . )  Maybe I’m a relational purist, but why would you want to have tables without explicit primary keys, anyway?  (On the other, less-purist, hand, for performance reasons, sometimes a short surrogate PK may be preferred to a lengthy logical one. )

The other common reason is that the slave is single-threaded, and single-threaded performance can’t keep up with the multi-threaded master.  In this case, if multiple databases are being updated, enabling the multi-threaded slave can help.  ( See the manual for more.)

Nothing new or groundshaking here, but I hope this helps someone who is googling “MySQL replication lag”.

MySQL for Oracle DBAs – slides available

August 21, 2015

For those of you who would like a copy of the slides from my webinar, they are now online at slideshare .  Thanks again if you attended!

I wasn’t able to answer all the questions that were submitted at the end, so I’m collecting those, and more from your emails, for the next post: Q&A.

MySQL for Oracle DBAs webinar

August 20, 2015

If you attended my webinar, MySQL for Oracle DBAs, thank you!  I hope you enjoyed it.

We ran out of time while I was answering questions for Q&A, so I apologize if I didn’t answer your question.  Please email any questions to me at ben-dot-krug-at-oracle-dot-com.  Also, if you were asking for a copy of the slides (or would like to ask), also email me – I’ll be happy to send them!

Thanks again!

MySQL as an Oracle DBA

June 3, 2015

A quick question, which I’ve asked once before –

if you were an Oracle DBA, then learned MySQL, what do you wish you knew before you got started?

Also, what was helpful to you as you learned?  (Websites, #mysql on irc, documents, etc.)  What do you wish you had (or want to have now) if you are a DBA for both Oracle databases and MySQL databases?

Ie, what would be good to give an Oracle DBA who wants to start learning or supporting MySQL as well?  Please respond with comments here, or directly email me at ben-dot-krug-at-oracle-dot-com.

Thanks!

isolation levels

March 25, 2015

I gots to know…

Big web companies who use MySQL – what isolation level(s) do you use? Do you take the default? Do you think about it at all? How about if you use, eg, memcached, what there? And, does your code think about optimistic or pessimistic locking?

I’ve been reading up on isolation levels after seeing Peter Bailis’
http://www.bailis.org/blog/understanding-weak-isolation-is-a-serious-problem/
(thanks to Mark Callaghan for re-tweeting about it).

I’m learning things…  here’s one for today –

I always thought that the “repeatable” in repeatable read applied to the data in general. Meaning that a query run twice in a transaction would “repeat” the same results. I guess it actually is supposed to apply to records?  Ie, other transactions’ updates to an existing row will not affect our view of the record.  According to ISO ANSI 92, apparently, repeatable read does allow the appearance of “phantom” (newly added) rows. (MySQL seems to be less permissive than that.)

But enough of my talking – I’m going back to my reading.  So, big MySQL users – let me know your thoughts on isolation levels, application-wide.  Do you care about them?  How do you enforce, or not enforce them at an application level, and why?

undo and redo

April 18, 2014

Here’s something that used to make my head spin a bit… A quick quiz: does undo generate redo? does redo generate undo?

When I heard that undo generated redo, it sent me for a loop. Undo is stored in the system tablespace (undo segments), as regular data, and therefore generated redo. Then I thought, OK, would redo generate undo? If so, we’re obviously in a vicious cycle. So, no. Why?

We need redo for undo so that if there is a crash, and some data has (have) been written to disk but not committed, and the undo wasn’t yet written to disk, we can recreate it from the redo, then use it to undo the uncommitted but written data. (Corrections if I’m inaccurate, please.  This is all from my head only.)

We don’t need undo for redo – if there is a crash, and a transaction was not fully completed, we need the undo to roll it back. And if we don’t need the redo, it’s not something we need to undo. The redo logs are special files outside of the database proper.

So, this may all be clear as mud from my writing, but I’ve always wanted to think it out loud.  Please correct me in the comments if I’m wrong on anything here.