Archive for the ‘MySQL’ Category

joins are bad, mkay?

February 18, 2016

Graph databases are great. But if I read more FUD about RDBMSs, I’m going to have another flashback to the 80’s where someone is telling me about how you don’t want to dismantle your car every time you park it in the garage (OODBMS reference for the oldies).
I recently read a graph-database book that explained that every SQL join requires computing a cartesian product. And yesterday, a post that explained that equijoins have an exponential order of complexity. (I guess… if the exponent is 1.) It’s getting very frustrating to read through the inaccuracies and FUD, and now, to me, it is all just turning into


Which is why it was nice to see @guyharrison‘s even-handed and accurate treatment in his new book Next Generation Databases, and then to see this on LinkedIn (thanks to Shahram Ghandeharizadeh at USC for posting it and @markcallaghan for tweeting it).  I’m reading the paper that used Neo4j and SQL-X first.  I look forward to learning more about the many forms of data storage, access and usage, based on real research and data.  Graph DB writers: don’t ruin your case by turning it all into “joins are bad, mkay” kinds of regurgitations.


Two Common Reasons for Replication Lag

October 5, 2015

As a MySQL support engineer, I see this so often, that I felt it could help to write a post about it.

Customers contact us to ask about replication lag – ie, a slave is very far behind the master, and not catching up. (“Very far” meaning hours behind.)

The most common reason I encounter is databases having InnoDB tables without explicit primary keys. Especially if you are using row-based replication (“RBR”), you want explicit primary keys on all your tables. Otherwise, MySQL will scan the entire table for each row that is updated. (See bug 53375 . )  Maybe I’m a relational purist, but why would you want to have tables without explicit primary keys, anyway?  (On the other, less-purist, hand, for performance reasons, sometimes a short surrogate PK may be preferred to a lengthy logical one. )

The other common reason is that the slave is single-threaded, and single-threaded performance can’t keep up with the multi-threaded master.  In this case, if multiple databases are being updated, enabling the multi-threaded slave can help.  ( See the manual for more.)

Nothing new or groundshaking here, but I hope this helps someone who is googling “MySQL replication lag”.

MySQL for Oracle DBAs – slides available

August 21, 2015

For those of you who would like a copy of the slides from my webinar, they are now online at slideshare .  Thanks again if you attended!

I wasn’t able to answer all the questions that were submitted at the end, so I’m collecting those, and more from your emails, for the next post: Q&A.

MySQL for Oracle DBAs webinar

August 20, 2015

If you attended my webinar, MySQL for Oracle DBAs, thank you!  I hope you enjoyed it.

We ran out of time while I was answering questions for Q&A, so I apologize if I didn’t answer your question.  Please email any questions to me at ben-dot-krug-at-oracle-dot-com.  Also, if you were asking for a copy of the slides (or would like to ask), also email me – I’ll be happy to send them!

Thanks again!

MySQL as an Oracle DBA

June 3, 2015

A quick question, which I’ve asked once before –

if you were an Oracle DBA, then learned MySQL, what do you wish you knew before you got started?

Also, what was helpful to you as you learned?  (Websites, #mysql on irc, documents, etc.)  What do you wish you had (or want to have now) if you are a DBA for both Oracle databases and MySQL databases?

Ie, what would be good to give an Oracle DBA who wants to start learning or supporting MySQL as well?  Please respond with comments here, or directly email me at ben-dot-krug-at-oracle-dot-com.


isolation levels

March 25, 2015

I gots to know…

Big web companies who use MySQL – what isolation level(s) do you use? Do you take the default? Do you think about it at all? How about if you use, eg, memcached, what there? And, does your code think about optimistic or pessimistic locking?

I’ve been reading up on isolation levels after seeing Peter Bailis’
(thanks to Mark Callaghan for re-tweeting about it).

I’m learning things…  here’s one for today –

I always thought that the “repeatable” in repeatable read applied to the data in general. Meaning that a query run twice in a transaction would “repeat” the same results. I guess it actually is supposed to apply to records?  Ie, other transactions’ updates to an existing row will not affect our view of the record.  According to ISO ANSI 92, apparently, repeatable read does allow the appearance of “phantom” (newly added) rows. (MySQL seems to be less permissive than that.)

But enough of my talking – I’m going back to my reading.  So, big MySQL users – let me know your thoughts on isolation levels, application-wide.  Do you care about them?  How do you enforce, or not enforce them at an application level, and why?

undo and redo

April 18, 2014

Here’s something that used to make my head spin a bit… A quick quiz: does undo generate redo? does redo generate undo?

When I heard that undo generated redo, it sent me for a loop. Undo is stored in the system tablespace (undo segments), as regular data, and therefore generated redo. Then I thought, OK, would redo generate undo? If so, we’re obviously in a vicious cycle. So, no. Why?

We need redo for undo so that if there is a crash, and some data has (have) been written to disk but not committed, and the undo wasn’t yet written to disk, we can recreate it from the redo, then use it to undo the uncommitted but written data. (Corrections if I’m inaccurate, please.  This is all from my head only.)

We don’t need undo for redo – if there is a crash, and a transaction was not fully completed, we need the undo to roll it back. And if we don’t need the redo, it’s not something we need to undo. The redo logs are special files outside of the database proper.

So, this may all be clear as mud from my writing, but I’ve always wanted to think it out loud.  Please correct me in the comments if I’m wrong on anything here.

MySQL for Beginners Training on Demand course, Day 3

December 14, 2012

It’s been a while since I posted about Day 1 and Day 2 of the MySQL for Beginners online Training on Demand course offered by Oracle (  Yesterday was 90 days from the first posting, so that’s how long you would have had to use this course!

Day 3 is a whirlwind tour of some further aspects of SQL.  Here’s what it covers:

Introduction to Data Manipulation Language (DML) in SQL
– introduces INSERT, REPLACE, UPDATE and DELETE statements and their syntax and uses

Intro to Functions
– string functions, temporal, numeric, control flow and aggregate functions

Intro to Joins
– describes cartesian, inner, and outer joins – their syntax and uses

Intro to Subqueries
– types and placements of subqueries
– how and when to convert to joins

This is intended for beginners, and if you know the material, you might look at the headings and think “I know this” and skip it.  Personally, I always like to review, and try to see what I do and don’t know.  While some of this material was covered at a high level, there was enough detail that I learned some things, and found several stepping-off points to research and learn more.  Here are some of my own notes on things I learned more about:

First, when I read about REPLACE, I did some tests to make sure I understood how it worked.  From the course, we know that “it is possible for a single row to replace more than one old row if the table contains multiple unique constraints.”  I haven’t used REPLACE in too much detail, so I created some tables with a primary key and two other unique, multi-column constraints, and did some playing around.  For example, I used a single-row REPLACE to get three rows replaced by the one new row.  (I also was able refresh my memory on how REPLACE works, by deleting then inserting entire rows, as opposed to UPDATE.)

I enjoyed the introduction to functions – I always find it good to review the built-in functions, and always find some functions I’d forgotten about, never really used, or didn’t even know existed.  I got practice on the syntax and usage of some of the less common functions.

For one of my stepping-off points, I did some more reading on ROUND() and how it deals with numbers ending in .5, depending on whether the datatype is an exact-value datatype, or an approximate-value datatype.  (You may notice a trend in my interests, if you read Day 2.)

Then, in the intro to joins, I found out, for example, that joining with “ON” is not exactly the same as joining with “USING”.  “USING” treats the join column as one, whereas “ON” treats it as two.  Here’s an example.  Table j1 has (1,’j1.1′),(2,’j1.2′) and table j2 has (1,’j2.1′),(2,’j2.2′).

mysql> SELECT * FROM j1 JOIN j2 USING (a);
| a    | b    | b    |
|    1 | j1.1 | j2.1 |
|    2 | j1.2 | j2.2 |
2 rows in set (0.00 sec)

mysql> SELECT * FROM j1 JOIN j2 ON (j1.a=j2.a);
| a    | b    | a    | b    |
|    1 | j1.1 |    1 | j2.1 |
|    2 | j1.2 |    2 | j2.2 |
2 rows in set (0.00 sec)

In the intro to subqueries, I reviewed correlated and uncorrelated subqueries, and where subqueries can be used (in the SELECT, the FROM or the WHERE).  Also, the possible result types (scalar, row, column, and table) and how the different result types can be used differently in queries (eg, for comparisons, in functions, etc).  These are fairly simple concepts, but somehow they keep slipping around in my mind, and reviewing them is very helpful to me.

So, the course covers a wide ground, but also gives enough detail to chew on and learn from.  Having it available for 90 days give a lot of good time for chewing.  I also find it useful in this regard that you get the streaming video of the classroom course, with captioning available, and the pdf student guide and activity guide – sometimes I prefer using one or the other, and for reviewing and “chewing”, the guides are particularly helpful.  This course covers a lot of material, and I’m glad I get to review it!  Stay tuned for day 4…

MySQL for Beginners Training on Demand course, Day 2

November 1, 2012

It’s been a while since my post on taking day 1 of this course; as promised, here are my notes and observations after taking Day 2.

First, this is one of the nice things about taking Training on Demand.  I could take Day 1, go about my other business, apply what I learned, etc, then, when ready, take Day 2, even a month or so later!  I was able to go back and review Day 1, then move on to Day 2, entirely at my own schedule.

I remembered something from earlier, and wanted to find it again, so I tried out the search capabilities.  I searched for the phrase I wanted, and up came links to the lessons it appeared in.  I could click on any of the results to view that lesson, and watch, or check the captioning to find the phrase I was looking for.  It worked perfectly!

Day 2 covers data types, basic SELECT queries, and how to create and maintain tables and databases (using DROP and ALTER, and how they work).  Again, I enjoy reviewing this kind of material.  And if it’s new, it will give you plenty to chew on and try out.

In my case, the discussion of data types sent me off on a tangent of my own, about floating-point data and rounding errors.  I was able to google and research, then get back to the course when I was done.  (By the way, I found a good link on the subject at at – “What Every Computer Scientist Should Know About Floating-Point Arithmetic” – in case you’re interested.)  So, again, that’s a nice thing about Training on Demand – you can start, stop, go off on tangents, and resume when ready.

Activities in Training on Demand include quizzes to test your knowledge (with answers in the back of the activity guide, or given by the instructor in a separate video segment for each activity).

In summary, I enjoyed Day 2, and look forward to Days 3 and 4, and the bonus material.  Stay tuned!

Oh for the Wait Interface, part 3

October 31, 2012

To repeat myself (from “part 2”):

“a while back I wrote a post expressing a longing for wait-interface tuning”.

The original post, and “part 2” were both written in 2007, while I was working for a startup in Pasadena, as a DBA.  Now I work for Oracle, supporting MySQL, and by now, MySQL has incorporated a wait interface.  So, to do justice, I am writing this “part 3” and saying – it’s been done!

MySQL now has the performance schema, which instruments a vast number of events and waits.  The man page for the current GA version (5.5) is at .   It’s a big chapter, as the wait interface is highly configurable and versatile. gives a quick overview about the performance schema (“P_S”).  To get an idea of the wide variety of things instrumented, see, for example,
and .

Because there is so much that can be configured with the performance schema (enabling the tracking of various events and waits), and because the queries to get the information out can be complicated, you might also want to check out Mark Leith’s collection of useful examples at .

So, there you go, now we have it!

(And now I have three posts on the subject, each with the heading capitalized a little differently, ’cause that’s the way I roll…)