[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[comp.text.sgml] On Life, Death, and Other Matters (long)

	Best regards, -- Boris.

---- Begin included message ----
Hi all,

Some days ago, I privately replied to a question Brad McCormick posted on
comp.text.sgml: "Is SGML a dead language?" Brad asked me if I would be
willing to have what I wrote publicly distributed, since he felt it had
value as a contribution to the history of computing. I realized that my
initial response was neither clear nor well argued, and that Brad's question
was worth something better (furthermore, a recent posting from Bill Trippe
shows that the question is still bothering people). May I request once again
your kind attention, despite the additional inconvenience that this message
is considerably longer than my previous one?
Thank you in advance. Here we go.

[I thank Brad McCormick for many editorial corrections and useful
suggestions. However, any mistake, misunderstanding, or otherwise
controversial statement, must be considered mine.]

                        XML: THE BIG SECRET

Many people in the SGML community have had their attention caught recently
by a number of apparently unrelated signals from organizations, journals,
individuals, and software vendors. A paper in <TAG> about the end of SGML; a
new peer newsgroup next to comp.text.sgml; SGML conferences renamed; SGML
products re-packaged. That's not very much. That's enough to trigger lots of
rumors, and this is neither new nor important. That's enough, however, to
raise a number of legitimate questions, because the above-mentioned faint
signals are just the visible part of a dramatic change in the SGML world.

This change has a name: it's XML [eXtensible Markup Language]. I don't care
for the time being if XML is a plain SGML subset; or if it is an entirely
new thing under SGML disguises. XML's birth is of utmost importance because
it introduces completely new business perspectives and cultural paradigms
into both the SGML world and the Web world. Regardless of its technical
essence, XML is an economic and cultural *change*. It is not an evolution.
Hence the questions, hopes, fears, and rumors, around it.


Information systems have their own evolution history, made of a succession
of small, discrete steps. From time to time, however, an evolutionary step
is understood, often in retrospect, as having had a much larger stride and
impact than others, and above all, as being of a very different nature. Such
evolutionary steps are actually breaks in an otherwise nearly linear
evolution. They are *changes*, "revolutions" if you prefer. Such breaks
change the way people think about information systems; they change the way
information systems are applied to human endeavours; and of course, they
change the economic deal in the information systems industry.

True changes in information systems have not been so many. In the recent
past, I can see three of them: cheap and powerful personal computers in the
eighties; new user's operating modes (GUIs) in the nineties; a worldwide
network of computing systems today. Please observe that I am speaking of
economic and cultural changes, not of technical ones. Economic and cultural
changes may become real years after a technical revolution made them
possible: it took 25 years to turn the Internet into a mass media.

Observe, too, that it would be over-simplistic to chop the information
systems history into successive, disjoint "eras". The birth of a new
economic and cultural paradigm does not imply that the old ones disappear.
Rather, they would co-exist, at least for a period of time, which can be
moreover very long. Old information systems precisely become socially
deprecated as "old" because new paradigms exist to evaluate them, both
economically and culturally; conversely, "new" information systems become
praised because there is a social consent about them as being
materializations of the new paradigms. If you can manage with
over-simplifications, then "new" means "good" to "progressive" people, while
it means "bad" to "conservative" ones.

Co-existence is not peace. Economic pressures with opposite goals are at
work. On the vendors' side, the competition is between "progressive" and
"conservative" products and services; between different "progressive"
vendors; and between "conservative" vendors as well, who would compete about
what and how to be conservative about! On the customers' side, from line
employees to top management, different and often opposite interests are
conflicting inside a single company or organization, regarding investment
policies, users' training, operating and maintaining costs, ergonomics, and
many more factors, up to and including personal motivations, tastes,
prejudices, feelings, and emotions.


Back to SGML. Since my first professional exposure to SGML, around 1990,
I've been fascinated by the way the SGML community has reacted to
paradigmatic changes in information systems: personal computers,
user-friendly interfaces, and now the Web. Symmetrically, I've been
interested too in the way other people in the information systems milieu
reacted to the SGML phenomenon. Both "reactive systems" originated in the
conditions in which SGML has been put at work ten years ago.

At that time, SGML lived on mainframes and, more rarely, on mini-computers.
Expensive and organization-centric machines, with very basic GUIs or no GUI
at all. Yet at the same time personal computers had began their massive
spread into work places. Cheap and... well, personal machines, with nice and
easy-to-use GUIs. In the office next door, people could compare both
environments, and make up their mind.

A conceptual gap was born. On one side, long-term investments, long and
intricate design phases, concerns in reusability, mistrust of proprietary
environments, expensive and slow training curves, data consolidation,
presentation issues viewed as a separate ancillary matter, focus on
structure and contents, etc... On the other side, fast short-term investment
turn-over, quick-and-dirty design phases, litle reusability concerns, love
for proprietary systems, fast and cheap training curves, data dispersal,
presentation as the central issue, structure and contents haphazardly
derived from presentation.

The spread of personal computers, GUIs, and extremely sophisticated low-end
publishing software introduced conflicting values into the companies and
organizations who were using SGML, or who planned to use it. SGML became, in
many situations, extremely difficult to sell. Whizz-bang software paradigms
had so much polluted people's, including managers', perceptions, judgements,
and prejudices, that the lethal argument against SGML amounted to that it
was not *immediately rewarding*, and especially not *visually rewarding*. Of
course, a manager could not state such a silliness so crudely. Rather, he
would argue about "ergonomics", "learning curves", and so forth.

I don't mean that technical issues are unimportant. To the contrary,
technical improvements can make new solutions possible and desirable, they
can make existing ones cheaper and easier, they can boost quality and
shorten delays. However, human beings always evaluate technical matters
through a complex filtering process, in which rational thinking (e.g.
carefully weighting economic implications) interacts with irrational
backgrounds and impulses (e.g. blindly sticking to a software brand).
Collective evaluation processes -- and decisions -- may be even more

I am not arguing that personal computers and GUIs are inherently bad and
useless. The point here is that their introduction into mass markets has
been something radically new. Computer users had never been exposed before
to such marketing pressures. Marketing hype turned everybody into a computer
expert. Rationality levels have dramatically lowered regarding the
evaluation of information systems: just listen at any of your colleagues (or
yourself!) talking about her/his new laptop...

Another widespread argument against SGML has been that it is "complex and
difficult". Yes, SGML is complex and difficult. MacOS is complex and
difficult. The MS Windows API is complex and difficult. POSIX + sockets +
TCP/IP + PPP + POP3 + MIME is complex and difficult. Anything powerful and
generic is likely to be complex and difficult. Again, deciding to stay away
from SGML and to use InterLeaf + Lisp, say, instead, is a matter of human
choice. Such a choice can be defensible and make good sense. What is
irrational is to argue about SGML difficulty and complexity on this
occasion, thus inferring that InterLeaf + Lisp is simple and easy.

Somehow, many SGML users feel frustrated and guilty: They missed the First
and Second Information Systems Revolutions: the exploding needs for cheap
and powerful personal computers and new user's operation modes (GUIs). The
SGML community is suffering a kind of "original sin" syndrome. I don't mean
that SGML users are actual sinners in any respect! I mean that there is a
*social feeling* (this implies a fundamental degree of irrationality) within
the SGML community, and among the broader computing milieu regarding the
SGML community, a social feeling of *lateness* about SGML, SGML providers,
and SGML users. I know very well that SGML users have now their own
whizz-bang software. But again, please note that I'm talking about something
irrational: feelings. Nevertheless feelings are real. They have actual
consequences, including economic ones.


Now the Third Revolution is on the way. The Web. Electronic mass business.
The SGML community cannot afford to miss this one. Until now, successful
SGML software vendors had a sales figure in the thousands (of units sold)
worldwide (correct me if I'm wrong). SGML-on-the-Web means a jump to tens,
maybe hundreds, of thousands. Nobody wants to miss that.

Unfortunately, before putting SGML on the Web (actually, before putting
anything on the Web), SGML vendors had to overcome an absolute prerequisite:
a "nihil obstat" from Microsoft. You don't want to invest time and money in
a new Web product, if you know that Microsoft is cooking up its own
(remember Netscape?). You don't want to commit yourself and your customers
to a new protocol, language, format, standard, or whatever, if at Microsoft
there is a big no-no (remember Java?). That's a huge problem. That's the
*main* problem with SGML-on-the-Web. As you probably noticed, I do not think
this is primarily a technical problem.

This leads us to SGML-at-Microsoft, a seldom evoked topic. Microsoft
consistently ignored SGML. There have been many reasons to that. One of
them, the most unseemly, but having had measurable consequences, is that
SGML was said to come from IBM. It was almost a death sentence. This
particular misconception, regarding SGML, is a part of a more general trend
toward ignorance of foreign matters at Microsoft.

Whenever a company builds a strategy, this strategy ends up in some sort of
embodiment into people's brains in the company. That's not necessarily a bad
thing. But when the strategy mainly consists in fencing a fortified island,
trying to attract and enslave customers into the fence, and fighting very
hard to keep competitors away, the net result is *isolation*. Even highly
talented and open-minded people at Microsoft (there are several) had a
second-hand, rumor-based SGML knowledge. When the company eventually set up
a small SGML working group in 1995, SGML specialists at Microsoft remained,
so to speak, "in partibus".

Another reason to Microsoft's attitude toward SGML is that SGML is an ISO
standard, and Microsoft does not love standards. True standards mean a high
degree of user's freedom. You can count the standards Microsoft adheres to:
less than ten. Microsoft reluctantly adopts a standard when doing otherwise
has become really impossible, or would prove commercially detrimental. For
example, Microsoft included TCP/IP and socket support into MS Windows years
after public domain and third party software had appeared for these
platforms. Microsoft did not suddenly fall in love with TCP/IP. It happened
that so many Microsoft customers had connection needs to Unix sites they
implemented with -- "horresco referens" -- non-Microsoft software, that
significant market losses for Windows NT became very likely.

The last reason is that, if Microsoft have had to consider doing SGML at
all, it would have been *against* the MS Office suite. A strategic nonsense.
SGML users typically don't throw away their tools every summer like bathing
suits, to buy new ones just because they are available. The MS Office suite
was (and still is) aimed at low- to middle-end desktop publishing, because
it is a true mass market with fast product renewal rates.

SGML paradigms do not apply there. One may well find the situation
unfortunate, but the fact is: desktop publishing software users rarely
bother about document reuse, content structuring, platform and vendor
independence, etc. They care about low costs, ease of use, rapid learning
curve, WYSIWYG presentation, "features". They are happy with built-in
templates they can use inconsistently. They are happy if they can swap their
files between MS Windows and MacOS. They are happy if they can cut and paste
across files (that's apparently what "document reuse" means to them). They
are happy if their printed pages look exactly like their displays. Full

NOTE - Everybody has a personal repertoire of Microsoft horror stories.
Here's my favorite: Three years ago, a small group of Microsoftees have had
an in-house SGML training. One of them decided to practice a bit. He tried
for days to shoehorn elements from the DocBook DTD into MS Word templates.
Eventually he achieved something not bad at all, if one considers the
initial challenges. The SGML evangelist then came in, looked at the job
done, and said: "Great, indeed! Let's try another DTD now." The hacker's
face expressed the utmost stupor and consternation: "Hey! Do you mean... do
you mean there are *many* SGML DTDs?" he asked.


Besides Microsoft, SGML-on-the-Web had another nightmare: HTML. We all
learned the First Web Axiom: "HTML is an SGML application". So why bother
with SGML at all, since HTML is *already* SGML?

The often-described HTML limitations are quite paradoxical, since HTML is
claimed to be an SGML application. Yet SGML allows you to create new
document types, or to enrich existing ones, ad infinitum. It allows you
notations for foreign objects, whatever they are, including active objects.
It allows platform-independent entity management with exactly the
granularity level you wish. It allows arbitrary character sets. It allows
sophisticated HyTime links. It allows independent DSSSL rendering for any
media. It allows strict conformance checking. And so on.

How come an SGML application (that's what HTML is supposed to be, isn't it?)
cannot do anything like that? How come you *must* hack ugly and inherently
non-standard features into HTML in order to get what you want? How come
"enriched" versions of HTML become mutually incompatible, and are used
mainly in a suicidal war between software vendors, since SGML is a
vendor-independent standard?

The first HTML implementors did not feel necessary to dig very deep into ISO
8879:1986, and considering in retrospect their limited goals, they have been
right. They paid lip service to SGML: as we all know, "HTML is an SGML
application". At the time, it was just wishful thinking. Forgive them. But
years later, the legend goes on, carefully maintained by the W3C.
Ladies and gentlemen, I'm proud to bring to public evaluation the Revised
First Web Axiom: "HTML is an *empty* SGML application", because the most
widely used HTML *systems* are *not* SGML systems. HTML systems do not
support entities at all. HTML systems do not support markup declarations at
all. Etc.

Around 1996, the Web had tens of millions of users. HTML and HTML systems
were demonstrating everyday their weaknesses. Meanwhile, vendors of
browsers, servers, messaging tools, and database systems, released new
plug-ins, new add-ons, new bells-and-whistles, almost overnight, increasing
the Web chaos. It was just about time to get rid of quick-and-dirty HTML
implementations, to try to tame the anarchy, to redesign HTML from the
ground up, to agree on really SGML-conforming HTML systems -- or maybe to
design something entirely new, and unrelated to SGML, why not?

Instead, the W3C wasted its time in exhausting vendor wars. Because of
unwillingness, true ignorance, short-sighted visions, stubbornness, the
daunting SGML-on-the-Web issue has been consistently procrastined while HTML
versions and revisions were piling up.


Eventually, the W3C could not delay any more giving a clear answer to the
question: Should we do SGML-on-the-Web, or get rid of it? That's what the
Editorial Review Board (ERB, originally called SGML ERB) working group at
the W3C was set up for.

Quite quickly, it appeared that doing SGML was impossible, because Microsoft
(and others) disagreed. Getting rid of it, on the other hand, seemed
reckless: the ISO label is highly praised by W3C members, and the SGML
market is of significant weight. Therefore, the only solution was to stay
somewhere in between. The ERB chosed to keep an SGML look-and-feel the SGML
community could be fooled with, and to remove from SGML the things Microsoft
disliked. That's XML. XML comes with a new First Web Axiom: "XML is an SGML
subset". That's really an axiom, this time: take it or throw it, but don't
try to prove it. Unfortunately, many people spent lots of time on
comp.text.sgml, arguing that XML was not an SGML subset; or just the
opposite as well. Too bad. Nobody tries to "prove" the Peano's axioms any

Incidentally, ISO 8879:1986 deals with conformance, variants, optional
features...; it does not deal with "subsets". It is common practice in
standard texts to make explicit provisions for variants, options, subsets,
and the like. So standard implementors can choose to limit themselves to a
variant, an option, a subset, or whatever limitation the standard permits,
and still legitimately claim they are conforming to the standard. However,
as long as a given standard text does not explicitly allow either variants,
or options, or subsets, something claiming to be anyway a variant, an
option, or a subset of the said standard, no longer conforms to the said
standard. It's something *else*. Hence, the much-debated question: "Is XML
an SGML subset?", has no sensible answer inside of the conceptual framework
provided by ISO 8879:1986.

OK. XML is something else. Of course, it is *related* to SGML (much like
cans of Budweiser are related to beer, would I say). So why is it so
important to refer to XML as "an SGML subset"? Why is it so important to
refer to Budweiser cans as "beer", giving that anybody knows that they are
something *else*? Well, you probably already know the an$wer.

I think that some of the ERB members sincerely (and naively) expected that
Microsoft would embrace SGML. At least SGML-on-the-Web. They were ready for
bloody compromises. Maybe were they sharing the widespread belief that
something that Microsoft does not embrace is something doomed to die (My
personal view is rather that they kill everything they hug!). They wanted
SGML to live. They didn't want SGML to share the fate many predict to Unix,
the NC, and non-Intel chips, to name but a few. They didn't notice the trap
leading straight to the elephants' graveyard.

I've said that before fighting against SGML, Microsoft has been ignoring it.
Microsoft's scorn in the pre-ERB era contributed a lot to SGML users'
frustration, as I've described it above. Now Microsoft loves XML. We can
then understand why so many SGMLers have been eager to bury SGML, to promote
XML, to maintain public confusion about both, and to do and tell anything
Microsoft wants. At last, <CHIMES>Microsoft</CHIMES> moves and speaks. Mind
you! The SGML community were expecting that since years! The end of ghetto
life! Their pathetic relief has been literally *readable* into the
enthusisastic prose from the former ERB, and in many postings to

So, the ERB threw away a significant part of ISO 8879:1986. Roughly
speaking, they simplified and froze the concrete syntax, and they allowed
DTD-less document instance sets. (I don't want to deal with syntactic issues
here, not because they are unimportant, but because I have to keep this
document inside reasonable size limits.)

The central issue with XML, as far as SGML comparison is meaningful, is the
alleged "degree of freedom" introduced by the ability to build entire XML
systems without any DTD at all. Document type declarations are the heart of
SGML. They have been put into the standard because they are invaluable tools
in consistent design, reusability, exchange, data structuring, the very
basic and strong concepts SGML has been built upon.

Of course, doing without any DTD is an XML option. XML does allow DTDs. But
you can bet your parser (or purser) that 90% of the XML applications that
will spread over the Web (and elsewhere) will be DTD-less applications, if
there is the faintiest opportunity to do so. And opportunities to dispense
from DTDs are plenty. DTD design is time-consuming. It is expensive. It is
skill-demanding. It forces you to build long-term plans about your data. It
is in many situations a death sentence against your previous structuring
practices, or lack of them. All of these requirements go contrary to the
"Web culture".

The Web culture is about "freedom". To most Web geeks, DTDs are just dull
and boring stuff, intolerable obstacles to their creative freedom. Joe
Webmaster wants bouncing logos, blinking commercials, 3D buttons, flashy
fonts, background music like in Starwars. Joe Webmaster had extensive
training with MS InterDev, JavaScript, ActiveX. He had no training with SGML
and does not plan to have: old-timer stuff. Joe Webmaster's freedom is about
what amazing font to use, how many frames to pack into a 21' screen, and so
forth. Joe Webmaster's freedom is immense. Do you want an indication of how
immense it is? Just have a look to the shelves at your favorite computer
store. Awsome, isn't it? (Then try to find the SGML books -- if there are

G.W.F. Hegel had his definition for freedom: "a well-understood necessity".
Electronic mass business and professional intranets will need Joe
Webmaster's skills and talents, sure. They will need much more: inventories,
customer profile databases, payment tracking systems, transaction design and
support, robust electronic forms, etc. I'm confident that smart software
vendors will flood us with plenty of fully-featured XML tools for
everything, from business cards to IRS forms. And I'm pretty sure that they
will not support any kind of DTD, because a DTD is something *you* choose,
not the software vendor.

Removing DTDs from SGML is a lethal strike. SGML environments *cannot
survive* DTD removal, any more than an animal can survive removal of its
skeleton. I'm afraid that many SGML addicts recently turned into XML
enthusiasts do not yet realize that. SGML users will have to build a fence
against DTD-less XML systems, if they want to keep their SGML systems and
applications alive. DTD users will become a minority, an endangered species
in the XML world (as the <TAG> article Brad cited says: like Latin).


ISO did a lot of work on network protocol standards. They were based on the
well-kown "ISO OSI Model". Well, I'm not so sure if the OSI model is
actually well-known. At least, it is very often cited. You could hardly find
any text about some network protocol, which does not have a respectful
reference to the OSI model, usually inside a short paragraph in the front
matter, with a small picture of the seven little boxes quietly piled up.
Some pages later, you eventually discover that the protocol under
consideration has actually little, if anything, to do with ISO standards and
recommendations. Yet the OSI model still remains a useful paragon for
network protocol designers.

The truth is that the ISO OSI model is now dead. This certainly has
technical justifications. But the main reason is that network hardware and
software vendors no longer bother with standards, be them from the ISO or
from other organizations. Vendors want do deliver new modems, new routers,
new switches, new software stacks, at will. They call for standardization
*afterwards*. Organizations who were traditionally responsible for network
standardization no longer *issue* network standards. They merely *register*
the ones kindly provided by the industry.

This situation has far-reaching consequences. Network standards (I would
definitely prefer to call them "registrations") have become commercial
weapons. Their life-cycles are short. They overlap and contradict one
another. They no longer provide their users (end-users and systems
implementors) stability and inter-operability. Making investment decisions
based on such registrations has become unwise. New hardware and software
must deal with competing and unstable registrations, and they are born
crippled. Operating and maintaining costs raise to the unimaginated. Network
registrations today mean network chaos.

The registration wars do not limit their battlegrounds to networks. They
pervade application APIs, file formats, character encodings, fonts, graphic
widgets, object models, database systems, programming languages, up to and
including hardware interfaces and chipsets. They eventually reached the area
of document management architectures and systems. It was unavoidable. There
is no more room for real, collaborative, long-term standardization processes
in the information systems industry. Forget about that.

I think ISO 8879:1986 is now in the situation of the ISO OSI Model a few
years ago. New document management systems will pay lip service to it, just
to get rid of it. We will see more and more vendors registering dozens of
mutually incompatible ?ML specializations, each one religiously stating in
its preamble: "This is an SGML subset". We will see more and more two-hatted
salespersons for the same product, claimed to be SGML- or ?ML-compliant,
depending on the sales target. We will see more and more quick-and-dirty
"SGML Lite" implementations, "subsets", and "stripped-down applications",
deviating more and more from the standard, until vendors feel free to not
refer to SGML at all. We will see more and more naive users fooled by SGML
look-alikes, who will contribute devastating SGML reputation. Not within ten
years. Just now. The primary agency for the confusion of tongues (The Tower
of Babel...) is now ourselves.

Meanwhile, some others will continue to do their best to build reliable,
stable, open systems. To share and communicate, rather than to fight and
dissimulate. To compete for usefulness and freedom, not for colonization and
flashiness. To look for consent and harmony, not for coercion and chaos. ".
. . and History continues . . ."

                        -= END =-

Laurent Sabarthez
14 August 1998

---- End included message ----