Content Types

With the concept "Content Types," we come to an interesting juncture in our top-down hierarchical review (from outermost Sitemap down to the atomic Element), in that this particular subject (coming subsequent to "Module") is not a descent into Module, but a change of perspective in looking at a Module. That is, to now, we have been looking at the visible structure of things (Sitemap, directory structure, Navigation, Site Pages, Modules). Now, although these items can be described as "visible," they cannot really be said to be presentational (as in our famous dictum, "to separate content from presentation"). On the other hand, they have not really had much to do with content per se, either. What they have been is relatively benign, strictly functional, containers, basically, doing a good job of nesting, (or rolling up) (—depending on how you look at it). They perform the structural, skeletal, architectural job of holding the site together and making it possible to provide a coherent set of well-organized, well-labelled objects, within which a site visitor can find information.

But the what of what the site visitor has come to find has not (yet) really been touched on. (The closest we've come is to have a look at the "Tree" version of some of the XML files.) But in addressing Content Types, we begin to give shape to this, and to provide a place to record information about it, in the set of site artifacts.

This is to say, each structural Module is but a neutral container for "Content," and so, to capture a more informative indicator about what it is inside that's being contained, we give a name, or classification (see scheme below) to every instance of a Module, to label or identify the "Type" for that content. As we will see, this typing will be important to the site build processing, since different content types will be transformed by different XSLT stylesheets (transformation sheets). Once the content type is known, certain assumptions can be made concerning the document type rules (DTDs), such that automated processing efficiencies can be devised. (To give one example: for the Press Release ("CT10"), the header to the press release summary module can be automatically made into the correct link to the URL for the full press release page).

[Note]Note

Please note: as will be seen, we are still describing Content at the level of identifying types. We are not yet arrived at the even more advanced stage where the metadata of the topics or subjects of what the actual Content "instances" are about would be recorded.

(In fact, this website project (and this documentation), never did actually go so far as to address that issue—entirely in accordance with project goals. It will become more germane (and worth doing) when a true Content Management System is in place. Thus far, simply achieving a higher level of organization at the level of identifying and using Content Types is already a significant step forward from the previous state of non-modular, non-consistent, entirely page-cenric content development.)

The task of finding "types" in this sense can be most simply expressed as the discovery of similarities in information objects. (For our practical purposes, we are referring to information as recorded in plain text, though "Content Type" discussions can include various multimedia objects as well.)

Similarity. The essence of type, or "class" identification is the discovery in the objects under consideration of aspects or parts or "facets" of them that are either the same, or sufficiently similar, such that each of those aspects, or parts, or facets of the whole:

Granularity. Perhaps the first challenge to Content Type discovery is the design decision to determine appropriate level(s) of granularity at which to begin looking for "types." While some are straightforward, others might go one way (up, absorbed into the larger enveloping container) or another (down, dividing into its parts), when it comes to carving up the page, or page fragment, into its component pieces. This is especially true of pieces of content that lend themselves to aggregation (e.g., numerous brief journal publication records: all in a single component, or the whole made up of many small components?).

Quantities. The next factor to be considered in "type" identification concerns the number of instances found. There is no ironclad rule about this, but numbers will influence design decisions in various ways. Perhaps a key one is the avoidance of creating too many Content Types, especially in too many "ad hoc" (one-off) ways. In plain terms, to qualify as a "type" begs that there is some reasonable number of examples of the type. (The corollary challenge is how to then resolve each of those ad hoc "type candidates" that fail the "sufficient quantities" test, into a more abstract, higher-level, generalized content type.)

Semantics. Usually, though not in all cases, the name given a Content Type will answer the question, "What kind of information is this?" with a response like "a press release," or "a journal publication," etc. For the less clear-cut, the name of the Type might be more generic ("general" or "article"), or more loosely defined (e.g. see the "logic" example below, which is used for both HTML forms and ASP code). Other (future) Content Types may be more strongly related to the constructs of a particular Content Management System (CMS) (particularly in the realm of "related" or "cross-sell" modules of "related links" and similar, which work especially well with personalization profiles, etc.). Other (future) growth in Content Types may stem from the greater granularity that a full CMS can accomodate, such that smaller pieces of the page can be managed individually with ease (e.g. subheaders, perhaps, or captions, callouts, etc., etc.)

For the present, semantic meaning at the "type" level is captured in the set of Content Types for this website project as outlined below. More than half (approx. 30% "general" and 25% "article") have been "typed" without much semantic significance, with an additional 22% of site Modules accounted for by the premier, unmistakeable Content Type, namely the "press release." This leaves less than a quarter of the identified page fragments (Modules) as having a "type" that can be described semantically—with some indication of what it is, of what it contains. These include Management Biographies; site "Highlights"; Glossary; HTML Forms and ASP Code ("Logic"); Journal Publications; Patents. (Please note: these last two in fact were not processed through the site system as "Content Types." In fact, patents were eventually suppressed from publication to the website altogether, but they still can serve as a useful example in discussing content types.)

To further demonstrate some of the considerations in establishing content types, two examples are addressed below of candidate "types" that were not implemented in the website re-design project.

Home Page Section Descriptions. The home page has a brief paragraph-like description for each "Site Section" (see screen capture below). These could be considered a content type, although most likely the benefit in doing so would only be realized with the use of a CMS. The CMS could provide the necessary degree of granularity and control, to extend to the desktop of the business unit owner of each site section editing capability, such that she or he could modfiy this home page introduction to their site section, and to immediately preview the results (in a testing environment, of course). Needless to say, for the present, the home page introductory descriptive paragraphs are not likely to change often, so working any changes through the website staff is quite acceptable in this case.

More interesting, perhaps, are two other questions raised by looking at that home page screen capture:

  1. One Type, or Two? We see that the site design reflects two different presentations of arguably very similar content: the six "white" blocks (lower middle of page), vs. the two "yellow" blocks (upper right). While the "yellow" treatment version of presentation incorporates an additional photograph, and each treatment uses a slightly different rollover graphic and colored "shim" graphic line, arguably they could both come from a single content "type."

    [Tip]Tip

    One important distinction is that the character length (number of words, or characters) for the yellow treatment is much shorter than for the white treatment. You might prepare two descriptive paragraphs for each module: one longer (for whenever the module gets the "white" treatment), and a shorter one (in case the module needs (for whatever reason) to be instead put through the "yellow" treatment).

    As you can see from this dicussion, the kinds of special cases almost invariably found on site Home Pages are the largest reason why content "typing" is not as frequently done here. Too much ad hoc design to warrant content types. However, this is not to say it can't be done—in fact, with the introduction of a true CMS, oftentimes the Home Page becomes regarded as a more utilitarian asset, one that can be "put to work" harder, with the introduction of standardized, editable components. Then, a home page regularly updated with new information becomes seen for the real value it can deliver, in addition to the original important task it had, of making a good first impression, design-wise.

  2. Granularity (Again). The "red", "blue" and "pink" rectangles outlining the content in the screenshot below depict lower, middle, and higher degrees of granularity.

    • The least granular (red) really wouldn't be a content type at all, but instead a structural page component, assembling together a number (six, for "white"; two, for "yellow") of smaller (more granular) components. (Having said that, it still would be possible to consider this as a content type, but it's not likely to be a very successful one. In practice, items like this are brought under the large, generic umbrella of the "CT01" "general" type: ct01_home-body.html.)

      [Note]Note

      One further interesting comment on this: if broken down to the "blue" level of modularity (see next bullet item), then the "author" for each Module ought to be the business unit owner of the site section—decentralized. If broken down only to the "red" level, then arguably the "author" of this larger Module really ought to be someone in a commensurately more central role, like corporate communications staff, or possibly website technical staff.

      And so you see the impact that Content Type discovery can have on the whole approach to thinking about website development and maintenance.

    • The middle level (blue) would reflect a content type that needed to capture information not only about its paragraph description, but also the image filename to the graphic that shows its header (e.g. "> R&D"). This is not unreasonable—probably even a very good thing. But the blue rectangle also includes the purely presentational orange vertical line (single, stretched pixel graphic), which raises a couple of deeper, interesting questions: "Where does the content type leave off holding information, and where does the XSLT pick up?". Related to that is the question of "Where does the outer container start to pick up the information (e.g. Page Layout perhaps, or even an aggregating "super-Module"), vs. holding it in the 'local' Module?"

      (In this case, the "SiteHTML" XSLT could attend to the writing in of the <img src="images/shim.gif"> etc., into the proper <td> table data cell, but it could only do so in concert with the Module Type and the "Element-Zone Assignment" (EZA) Module Recipe file. More information on these concepts is found elsewhere in the documentation.)

    • Finally, the highest degree of granularity (pink), in this case, would divorce the descriptive paragraph from its own header (perhaps not such a good idea), and would also require the assembly of the overall Module (the one indicated by "blue") from an overall Page Recipe (see Buildlist) that would begin to get overly complex. This is probably breaking things down too far. (Though again, with a CMS, given improved management of images, approaches like separately tracking (graphical) headers from text content paragraphs may well prove the better design decision. Depends on the system.)

Home Page "Site Section Descriptions"

"Resources for Patients". These sets of off-site hyperlinks to various information resources on the World Wide Web are used in two locations on the current site: in the Clinicians section, as "Information Resources for Your Patients," and in the Patients section, as "Important Sources For Information About...".

While rich in semantics, the modules of this type were not many in number, and so did not warrant the work effort to process them into their own Content Type. These paragraphs and lists of links were instead simply subsumed under the larger overall "CT40" (Article) that encompassed the whole page (ct40_article_pat_inflam.html).

As an example of how such a content type might be used beyond the presentation seen here, if there were more such "Resources" links, and they were well organized by topic, you could easily program (XSLT) a single "convenience" page that brought together the links on all four major subjects (Cardiovascular, Oncology, Inflammation, Metabolic Disease) onto a single page, within which they would of course be grouped by major subject, with further groupings on sub-topics. Further, additional groupings or organizations could be applied, or the list(s) could even be made sortable on the client side (for visitors with newer browsers).