January 19, 2016
The “CAP theorem” is often presented as something deep and hard to understand, and then discussed in confusing terms, to make sure it is hard to understand. Really, it’s almost common sense – eg, if you have a partition, you have to give up availability or consistency. Anyone familiar with, eg, drbd/split-brain/STONITH knows all about this.
I was reading a bit about it (again), and I use the sneer quotes on “theorem” because I think that the “proof” is really just adding some mathematical terms to the mix and restating it – QED. (If this is considered the proof – “Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services” .) “Conjecture” doesn’t seem right either. Maybe CAP observation?
It’s refreshing to find an update written by Eric Brewer (the originator) in 2012, “CAP Twelve Years Later: How the “Rules” Have Changed“. It’s very readable (down-to-earth), interesting, and informative. I recommend it to anyone interested in the subject who hasn’t already read it.
November 25, 2015
Thanks to google’s insatiable appetite for scanning and offering documents, you can take a trip back to a 1994 issue of Computer World magazine, read about Windows NT, cc:Mail, pentium processors, and see an ad for Oracle CDE, featuring yours truly looking over his shoulder. (Remember Oracle CDE?) This was my first Oracle DBA gig, and the team received a Smithsonian Award for a global client/server customer information system that we built on Oracle 7 (and DB2, via Oracle Transparent Gateway).
November 25, 2015
Wikipedia says that “Relational calculus is essentially equivalent to first-order logic, and indeed, Codd’s Theorem had been known to logicians since the late 1940s.” I couldn’t find the cited sources online, but did find these interesting papers:
“Applications of Alfred Tarski’s Ideas in Database Theory”
“Tarski’s influence on computer science.” (see the section starting “The final thing I want to tell something about is the connection of Tarski’s ideas and work with database theory.”)
If you’ve studied mathematical logic (or math, eg, topology), you are probably familiar with Tarski’s name. The historical development of mathematical logic and relational database theory is an interesting topic I’d like to understand better. Codd’s 1970 paper is fairly readable, and this 1972 paper is related to the correspondences between the different approaches taken.
October 5, 2015
As a MySQL support engineer, I see this so often, that I felt it could help to write a post about it.
Customers contact us to ask about replication lag – ie, a slave is very far behind the master, and not catching up. (“Very far” meaning hours behind.)
The most common reason I encounter is databases having InnoDB tables without explicit primary keys. Especially if you are using row-based replication (“RBR”), you want explicit primary keys on all your tables. Otherwise, MySQL will scan the entire table for each row that is updated. (See bug 53375 . ) Maybe I’m a relational purist, but why would you want to have tables without explicit primary keys, anyway? (On the other, less-purist, hand, for performance reasons, sometimes a short surrogate PK may be preferred to a lengthy logical one. )
The other common reason is that the slave is single-threaded, and single-threaded performance can’t keep up with the multi-threaded master. In this case, if multiple databases are being updated, enabling the multi-threaded slave can help. ( See the manual for more.)
Nothing new or groundshaking here, but I hope this helps someone who is googling “MySQL replication lag”.
August 21, 2015
For those of you who would like a copy of the slides from my webinar, they are now online at slideshare . Thanks again if you attended!
I wasn’t able to answer all the questions that were submitted at the end, so I’m collecting those, and more from your emails, for the next post: Q&A.
August 20, 2015
If you attended my webinar, MySQL for Oracle DBAs, thank you! I hope you enjoyed it.
We ran out of time while I was answering questions for Q&A, so I apologize if I didn’t answer your question. Please email any questions to me at ben-dot-krug-at-oracle-dot-com. Also, if you were asking for a copy of the slides (or would like to ask), also email me – I’ll be happy to send them!
June 3, 2015
A quick question, which I’ve asked once before –
if you were an Oracle DBA, then learned MySQL, what do you wish you knew before you got started?
Also, what was helpful to you as you learned? (Websites, #mysql on irc, documents, etc.) What do you wish you had (or want to have now) if you are a DBA for both Oracle databases and MySQL databases?
Ie, what would be good to give an Oracle DBA who wants to start learning or supporting MySQL as well? Please respond with comments here, or directly email me at ben-dot-krug-at-oracle-dot-com.
March 25, 2015
I gots to know…
Big web companies who use MySQL – what isolation level(s) do you use? Do you take the default? Do you think about it at all? How about if you use, eg, memcached, what there? And, does your code think about optimistic or pessimistic locking?
I’ve been reading up on isolation levels after seeing Peter Bailis’
(thanks to Mark Callaghan for re-tweeting about it).
I’m learning things… here’s one for today –
I always thought that the “repeatable” in repeatable read applied to the data in general. Meaning that a query run twice in a transaction would “repeat” the same results. I guess it actually is supposed to apply to records? Ie, other transactions’ updates to an existing row will not affect our view of the record. According to ISO ANSI 92, apparently, repeatable read does allow the appearance of “phantom” (newly added) rows. (MySQL seems to be less permissive than that.)
But enough of my talking – I’m going back to my reading. So, big MySQL users – let me know your thoughts on isolation levels, application-wide. Do you care about them? How do you enforce, or not enforce them at an application level, and why?
April 18, 2014
Here’s something that used to make my head spin a bit… A quick quiz: does undo generate redo? does redo generate undo?
When I heard that undo generated redo, it sent me for a loop. Undo is stored in the system tablespace (undo segments), as regular data, and therefore generated redo. Then I thought, OK, would redo generate undo? If so, we’re obviously in a vicious cycle. So, no. Why?
We need redo for undo so that if there is a crash, and some data has (have) been written to disk but not committed, and the undo wasn’t yet written to disk, we can recreate it from the redo, then use it to undo the uncommitted but written data. (Corrections if I’m inaccurate, please. This is all from my head only.)
We don’t need undo for redo – if there is a crash, and a transaction was not fully completed, we need the undo to roll it back. And if we don’t need the redo, it’s not something we need to undo. The redo logs are special files outside of the database proper.
So, this may all be clear as mud from my writing, but I’ve always wanted to think it out loud. Please correct me in the comments if I’m wrong on anything here.
January 30, 2014
I’m reading a 2012 google paper about Spanner. I saw Spanner described somewhere as “semi-relational” and wanted to read more. The paper I’m reading is “Spanner: Google’s Globally-Distributed Database” .
Early on, in page 4, is this paragraph:
“Spanner’s data model is not purely relational, in that rows must have names. More precisely, every table is required to have an ordered set of one or more primary-key columns. This requirement is where Spanner still looks like a key-value store: the primary keys form the name for a row, and each table defines a mapping from the primary-key columns to the non-primary-key columns. A row has existence only if some value (even if it is NULL) is defined for the row’s keys. Imposing this structure is useful because it lets applications control data locality through their choices of keys.”
This made no sense to me. It’s not purely relational because every table needs a primary key? In relational theory, every relation is required to have at least one candidate key. Is this confusion between “logical” relational theory and current implementations that allow duplicate rows in tables? Maybe because it’s an *ordered* set, is that the point?
To me, the not-relational part to me sounds like the fact that primary keys can include NULLs.
Or are they really referring to the fact that data are grouped somewhat hierarchically? (As explained later in the paper.) That would make more sense to me.
Anyway, those first three sentences confuse me. But I’m new to a lot of this. I’m just a simple caveman. Your modern ways confuse me. I’m not arguing that Spanner is purely relational, just saying that I don’t get those first three sentences. Maybe someone can explain them to me.