I’m reading a 2012 google paper about Spanner. I saw Spanner described somewhere as “semi-relational” and wanted to read more. The paper I’m reading is “Spanner: Google’s Globally-Distributed Database” .
Early on, in page 4, is this paragraph:
“Spanner’s data model is not purely relational, in that rows must have names. More precisely, every table is required to have an ordered set of one or more primary-key columns. This requirement is where Spanner still looks like a key-value store: the primary keys form the name for a row, and each table defines a mapping from the primary-key columns to the non-primary-key columns. A row has existence only if some value (even if it is NULL) is defined for the row’s keys. Imposing this structure is useful because it lets applications control data locality through their choices of keys.”
This made no sense to me. It’s not purely relational because every table needs a primary key? In relational theory, every relation is required to have at least one candidate key. Is this confusion between “logical” relational theory and current implementations that allow duplicate rows in tables? Maybe because it’s an *ordered* set, is that the point?
To me, the not-relational part to me sounds like the fact that primary keys can include NULLs.
Or are they really referring to the fact that data are grouped somewhat hierarchically? (As explained later in the paper.) That would make more sense to me.
Anyway, those first three sentences confuse me. But I’m new to a lot of this. I’m just a simple caveman. Your modern ways confuse me. I’m not arguing that Spanner is purely relational, just saying that I don’t get those first three sentences. Maybe someone can explain them to me.