Thanks to google’s insatiable appetite for scanning and offering documents, you can take a trip back to a 1994 issue of Computer World magazine, read about Windows NT, cc:Mail, pentium processors, and see an ad for Oracle CDE, featuring yours truly looking over his shoulder. (Remember Oracle CDE?) This was my first Oracle DBA gig, and the team received a Smithsonian Award for a global client/server customer information system that we built on Oracle 7 (and DB2, via Oracle Transparent Gateway).
Wikipedia says that “Relational calculus is essentially equivalent to first-order logic, and indeed, Codd’s Theorem had been known to logicians since the late 1940s.” I couldn’t find the cited sources online, but did find these interesting papers:
“Tarski’s influence on computer science.” (see the section starting “The final thing I want to tell something about is the connection of Tarski’s ideas and work with database theory.”)
If you’ve studied mathematical logic (or math, eg, topology), you are probably familiar with Tarski’s name. The historical development of mathematical logic and relational database theory is an interesting topic I’d like to understand better. Codd’s 1970 paper is fairly readable, and this 1972 paper is related to the correspondences between the different approaches taken.
As a MySQL support engineer, I see this so often, that I felt it could help to write a post about it.
Customers contact us to ask about replication lag – ie, a slave is very far behind the master, and not catching up. (“Very far” meaning hours behind.)
The most common reason I encounter is databases having InnoDB tables without explicit primary keys. Especially if you are using row-based replication (“RBR”), you want explicit primary keys on all your tables. Otherwise, MySQL will scan the entire table for each row that is updated. (See bug 53375 . ) Maybe I’m a relational purist, but why would you want to have tables without explicit primary keys, anyway? (On the other, less-purist, hand, for performance reasons, sometimes a short surrogate PK may be preferred to a lengthy logical one. )
The other common reason is that the slave is single-threaded, and single-threaded performance can’t keep up with the multi-threaded master. In this case, if multiple databases are being updated, enabling the multi-threaded slave can help. ( See the manual for more.)
Nothing new or groundshaking here, but I hope this helps someone who is googling “MySQL replication lag”.
For those of you who would like a copy of the slides from my webinar, they are now online at slideshare . Thanks again if you attended!
I wasn’t able to answer all the questions that were submitted at the end, so I’m collecting those, and more from your emails, for the next post: Q&A.
If you attended my webinar, MySQL for Oracle DBAs, thank you! I hope you enjoyed it.
We ran out of time while I was answering questions for Q&A, so I apologize if I didn’t answer your question. Please email any questions to me at ben-dot-krug-at-oracle-dot-com. Also, if you were asking for a copy of the slides (or would like to ask), also email me – I’ll be happy to send them!
A quick question, which I’ve asked once before –
if you were an Oracle DBA, then learned MySQL, what do you wish you knew before you got started?
Also, what was helpful to you as you learned? (Websites, #mysql on irc, documents, etc.) What do you wish you had (or want to have now) if you are a DBA for both Oracle databases and MySQL databases?
Ie, what would be good to give an Oracle DBA who wants to start learning or supporting MySQL as well? Please respond with comments here, or directly email me at ben-dot-krug-at-oracle-dot-com.
I gots to know…
Big web companies who use MySQL – what isolation level(s) do you use? Do you take the default? Do you think about it at all? How about if you use, eg, memcached, what there? And, does your code think about optimistic or pessimistic locking?
I’ve been reading up on isolation levels after seeing Peter Bailis’
(thanks to Mark Callaghan for re-tweeting about it).
I’m learning things… here’s one for today –
I always thought that the “repeatable” in repeatable read applied to the data in general. Meaning that a query run twice in a transaction would “repeat” the same results. I guess it actually is supposed to apply to records? Ie, other transactions’ updates to an existing row will not affect our view of the record. According to ISO ANSI 92, apparently, repeatable read does allow the appearance of “phantom” (newly added) rows. (MySQL seems to be less permissive than that.)
But enough of my talking – I’m going back to my reading. So, big MySQL users – let me know your thoughts on isolation levels, application-wide. Do you care about them? How do you enforce, or not enforce them at an application level, and why?
Here’s something that used to make my head spin a bit… A quick quiz: does undo generate redo? does redo generate undo?
When I heard that undo generated redo, it sent me for a loop. Undo is stored in the system tablespace (undo segments), as regular data, and therefore generated redo. Then I thought, OK, would redo generate undo? If so, we’re obviously in a vicious cycle. So, no. Why?
We need redo for undo so that if there is a crash, and some data has (have) been written to disk but not committed, and the undo wasn’t yet written to disk, we can recreate it from the redo, then use it to undo the uncommitted but written data. (Corrections if I’m inaccurate, please. This is all from my head only.)
We don’t need undo for redo – if there is a crash, and a transaction was not fully completed, we need the undo to roll it back. And if we don’t need the redo, it’s not something we need to undo. The redo logs are special files outside of the database proper.
So, this may all be clear as mud from my writing, but I’ve always wanted to think it out loud. Please correct me in the comments if I’m wrong on anything here.
I’m reading a 2012 google paper about Spanner. I saw Spanner described somewhere as “semi-relational” and wanted to read more. The paper I’m reading is “Spanner: Google’s Globally-Distributed Database” .
Early on, in page 4, is this paragraph:
“Spanner’s data model is not purely relational, in that rows must have names. More precisely, every table is required to have an ordered set of one or more primary-key columns. This requirement is where Spanner still looks like a key-value store: the primary keys form the name for a row, and each table defines a mapping from the primary-key columns to the non-primary-key columns. A row has existence only if some value (even if it is NULL) is defined for the row’s keys. Imposing this structure is useful because it lets applications control data locality through their choices of keys.”
This made no sense to me. It’s not purely relational because every table needs a primary key? In relational theory, every relation is required to have at least one candidate key. Is this confusion between “logical” relational theory and current implementations that allow duplicate rows in tables? Maybe because it’s an *ordered* set, is that the point?
To me, the not-relational part to me sounds like the fact that primary keys can include NULLs.
Or are they really referring to the fact that data are grouped somewhat hierarchically? (As explained later in the paper.) That would make more sense to me.
Anyway, those first three sentences confuse me. But I’m new to a lot of this. I’m just a simple caveman. Your modern ways confuse me. I’m not arguing that Spanner is purely relational, just saying that I don’t get those first three sentences. Maybe someone can explain them to me.
It’s been a while since I posted about Day 1 and Day 2 of the MySQL for Beginners online Training on Demand course offered by Oracle (http://education.oracle.com). Yesterday was 90 days from the first posting, so that’s how long you would have had to use this course!
Day 3 is a whirlwind tour of some further aspects of SQL. Here’s what it covers:
Introduction to Data Manipulation Language (DML) in SQL
– introduces INSERT, REPLACE, UPDATE and DELETE statements and their syntax and uses
Intro to Functions
– string functions, temporal, numeric, control flow and aggregate functions
Intro to Joins
– describes cartesian, inner, and outer joins – their syntax and uses
Intro to Subqueries
– types and placements of subqueries
– how and when to convert to joins
This is intended for beginners, and if you know the material, you might look at the headings and think “I know this” and skip it. Personally, I always like to review, and try to see what I do and don’t know. While some of this material was covered at a high level, there was enough detail that I learned some things, and found several stepping-off points to research and learn more. Here are some of my own notes on things I learned more about:
First, when I read about REPLACE, I did some tests to make sure I understood how it worked. From the course, we know that “it is possible for a single row to replace more than one old row if the table contains multiple unique constraints.” I haven’t used REPLACE in too much detail, so I created some tables with a primary key and two other unique, multi-column constraints, and did some playing around. For example, I used a single-row REPLACE to get three rows replaced by the one new row. (I also was able refresh my memory on how REPLACE works, by deleting then inserting entire rows, as opposed to UPDATE.)
I enjoyed the introduction to functions – I always find it good to review the built-in functions, and always find some functions I’d forgotten about, never really used, or didn’t even know existed. I got practice on the syntax and usage of some of the less common functions.
For one of my stepping-off points, I did some more reading on ROUND() and how it deals with numbers ending in .5, depending on whether the datatype is an exact-value datatype, or an approximate-value datatype. (You may notice a trend in my interests, if you read Day 2.)
Then, in the intro to joins, I found out, for example, that joining with “ON” is not exactly the same as joining with “USING”. “USING” treats the join column as one, whereas “ON” treats it as two. Here’s an example. Table j1 has (1,’j1.1′),(2,’j1.2′) and table j2 has (1,’j2.1′),(2,’j2.2′).
mysql> SELECT * FROM j1 JOIN j2 USING (a);
| a | b | b |
| 1 | j1.1 | j2.1 |
| 2 | j1.2 | j2.2 |
2 rows in set (0.00 sec)
mysql> SELECT * FROM j1 JOIN j2 ON (j1.a=j2.a);
| a | b | a | b |
| 1 | j1.1 | 1 | j2.1 |
| 2 | j1.2 | 2 | j2.2 |
2 rows in set (0.00 sec)
In the intro to subqueries, I reviewed correlated and uncorrelated subqueries, and where subqueries can be used (in the SELECT, the FROM or the WHERE). Also, the possible result types (scalar, row, column, and table) and how the different result types can be used differently in queries (eg, for comparisons, in functions, etc). These are fairly simple concepts, but somehow they keep slipping around in my mind, and reviewing them is very helpful to me.
So, the course covers a wide ground, but also gives enough detail to chew on and learn from. Having it available for 90 days give a lot of good time for chewing. I also find it useful in this regard that you get the streaming video of the classroom course, with captioning available, and the pdf student guide and activity guide – sometimes I prefer using one or the other, and for reviewing and “chewing”, the guides are particularly helpful. This course covers a lot of material, and I’m glad I get to review it! Stay tuned for day 4…