XML vs. Table DBs [was: Re: [nycphp-talk] Many pages: one script]
Kenneth Downs
ken at secdat.com
Wed Aug 29 07:51:23 EDT 2007
Elliotte Harold wrote:
> Kenneth Downs wrote:
>
>> Select title
>> ,SUBSTRING(text ...insert regexp here...)
>> from chapters
>> where book_name = 'XML in a Nutshell'
>>
>
> Regexps can't do that though. Regular expression are an insufficiently
> powerful tool for processing XML. Trying to do that is just a world of
> pain.
????
The example shows a query of a table, not XML. The purpose is to
demonstrate with a quick snippet that all examples of a supposed
indispensable need for the "XML Database" stem from an ignorance of the
abilities of other tools.
Say that you prefer XML, say that you like it, say that you are used to
using it, but don't say that it is a fundamental requirement of the data
itself because it just ain't so.
>
>> Rusty, you appear to be arguing from ignorance, very unusual coming
>> from you.
>
> Funny how you confuse different experiences with ignorance. Have you
> ever worked in publishing? Or in library science? Or on anything that
> operates at web scale like Yahoo or Google? There are many use cases
> where a couple of months of hard labor will rapidly disabuse anyone of
> the belief that relational databases are the one true solution to all
> problems. Your career just happens not to have taken you down those
> paths yet.
My observation on your arguments stems from your repeatedly ignoring
obvious examples of where tables do just fine to store data, and the
claim that 80% of the world's apps need an XML database.
If you have gotten used to using XML for text, then say so. If you like
it, then say so. Don't say it is the only tool available because it is
not. It has many very serious drawbacks, verbosity being the very
first, not to mention the confounding of structure and implementation,
encouraging the illusion of "structureless" data, and so on.
>
>> The true difference between us in this argument is that I understand
>> that I have a prejudice for relational over hierarchical, based on my
>> knowledge and use of both, and based on judgment calls as to how to
>> get through the day. I daresay however that you are promoting a
>> religious favoring of XML w/o a working knowledge of the alternatives.
>
> Ken, you know me. Do you really think I don't know the relational
> model or what it's good for? I use relational databases all the time,
> and I'm using them now. However unlike you I've hit their limits.
> While I'm sure many people can profitably spend their life doing
> nothing but relational databases, I happen to be working on
> applications where neither the relational model nor the actual SQL
> databases out there can come close to managing my data. I've never
> said that all applications should use XML databases or other
> non-relational systems, You keep trying to put those words into my
> mouth. I do say that some applications, especially in publishing and
> web publishing, do not fit the relational model well and can better
> served by XML databases.
I do know you, and that is why I was struck by your pro-XML stance for
"80% of applications", in which you must either be ignorant of what most
applications really need, or what modern RDBMS's can do, or both.
Forget about EF Codd and the relational model for a moment, lets just
look at the real products that have come along, the table-based servers
we call RDBMS's. These have all solved the very basic issues of data
storage. Most of their power comes from so-called "ACID" compliance,
the ability to allow multiple simultaneous users to access a data store
with assurances of predictable behavior. Your XML databases must solve
these same issues.
What about security? The modern RDBMS defines security on all objects.
Your XML databases will have to provide the ability to define security
on the complete tree. (By the way, I'm sure they'll get there, just keep
reading).
But there is one aspect of the relational model where XML, as a format,
takes a huge leap backward. Codd realized the incredible productivity
gains that could be had if a programmer could access data by name and
not worry about its internal storage structure. He separated the
implementation from the interface. XML, as a format (file, data,
whatever), confounds these two. It is a verbose format for hierarchical
data. There are better formats for nearly all uses.
Here's the clincher. Let's say the XML database grows up and has all of
these things. On this day the only thing it will have in common with
XML is a hierarchical model, the XML format itself will be the first to
go. The ability to accept XQuery statements will be a historical
footnote, and people will end up hating XQuery as much as they hate SQL
(everybody's least favorite part of the RDBMS world). These databases
will end up supporting output formats as YAML, JSON, and others, and
probably inputs as well. There is just not a lot in the XML format that
really makes up data storage.
We can thank XML for making us conscious of the ubiquitous need for
hierarchical data. I use it all of the time. Personally I store my
database definitions in YAML, a hierarchical data format that is human
readable/writable (unlike XML) as well as machine readable-writable.
My programs return hierarchical data from AJAX requests as JSON, because
that's what the browser works best with, and all of my PHP programs
handle all data universally as associative arrays, which are just
hierarchical data in yet another disguise. I love hierarchies, but have
not use for a format that is not human readable/writable, which is
incredibly verbose, and which
So when I say you are arguing from ignorance, I am saying that you are
generalizing your own experience with heavy-duty text management, and
since you have never mentioned any of the topics above, you may not have
the entire picture.
Now, to your point about my own limited experience, I picked a path some
years ago that has made me an expert in some areas and ignorant in
others. But I don't go claiming that "80% of the worlds applications
cannot use RDBMS". In fact, the examples you raise are all examples of
text management. This is a new area that the RDBMS was never intended
to solve. Many people have found it easily possible to extend the RDBMS
in a few areas, but others (such as you) are saying we need to start
over. But it is amusing that the look-again crowd has started over with
hierarchical data. In the end it won't be the format that is used, but
the basic abilities to manage and store text. I submit that the clear
solution has yet to emerge from that pursuit.
>
>> You simply cannot defend a file format as a foundation for frameworks
>> and databases. The best you can do is defend the model, such as the
>> hierarchical model.
>
> XML is not a file format. We've been down this road before. A native
> XML database is no more based ona file format than MySQL is based on
> tab delimited text.
But you are not saying what it is based upon. My statements above about
ACID compliance, security, and separation of implementation from
interface provide a basis for a database. The structure of the data is
given by tables. This makes a complete system.
If you cannot provide the basis for the entire picture of data
management, we are left with what the XML books tell me: how to format
the file.
>
>> Going further, you cannot defend a file format as a foundation for
>> anything based on how it handles large text (or binary) fields.
>> There are three issues here:
>>
>> -> Data model, hierarchical vs. relational. -> File format, XML vs
>> YAML or JSON or any other format you like
>> -> Handling of large text (and binary) columns.
>>
>> Finally, if we can all admit that XML is just a file format, then the
>> entire framework crumbles as soon as somebody comes up with a better
>> one, because let's admit it, XML is just about the worst you're going
>> to find.
>
> Troll. Troll. Troll.
???? Geez Rusty, come on. My conclusion is worded harshly yes, but do
you really label as a troll a description of the larger issues of
formats, data models, and everything else that makes up the larger picture?
>
>
>> In conclusion, the examples you provide appear to give advantage to
>> XML because tools exist to handle data that has been buried in opaque
>> formats and poorly defined structures. If the data had been
>> structured properly in the first place and put into formats that were
>> not so opaque, using (pardon me for saying) a *real* database,
>> designed on solid principles, the examples you give become child's play.
>
> LOL. Seriously, try storing a book or an encyclopedia in a relational
> database with anything approximating 1NF, not even 2NF. Then try and
> make it perform adequately.
>
> Not all data fits neatly into tables.
>
Actually most data does not, not at first glance. But since a table is
simply a mapping of properties to entities, it turns out that most data
does when you look at it closely. It takes about the same effort as
deciding upon a set of tags, since it is of course exactly the same process.
The crucial question is, does your book have structure? Can you make up
tags as you go or are you limited to a pre-defined set, such as Docbook?
Once you commit to a specific set of tags, you have committed to a
structure, and you may as well use tables as anything else. Methinks
however that at this point it comes down to what you are comfortable
with. If you want to use XML, go for it, if you want to use tables, go
for it, just don't confuse the structure of the data with a fundamental
need for either system.
--
Kenneth Downs
Secure Data Software, Inc.
www.secdat.com www.andromeda-project.org
631-689-7200 Fax: 631-689-0527
cell: 631-379-0010
More information about the talk
mailing list