Published in Brighton, UK

Clagnut

Headings defining document structure

Tomas Jogin has started an interesting discussion reflecting how heading level choices (h1, h2, etc) can give a different perception of document structure. For example, running the Clagnut home page through the W3 validator in Outline mode reveals this document structure:

  H2 Site contents
H1 The website of Richard Rutter, a web producer from...
  H2 (blog post)I'm back posted 1 day ago
    H3 (subheading in blog post) So what's been happening...
  H2 (blog post) Back in a week posted 2 weeks ago
  H2 (blog post) Multimap redesign posted 2 weeks ago
  H2 (blog post) Tiger posted 3 weeks ago
  H2 (blog post) Gmail invites posted 3 weeks ago
  H2 (blog post) iTunes Music Store UK is empty posted...
  H2 (blog post) Dynamically underlining accesskeys posted...
  H2 (blog post) Footie mad posted 1 month ago
  H2 (blog post) British Sea Power posted 1 month ago
  H2 (blog post) Collaborative Design posted 1 month ago
  H2 (side bar box) Search blog
  H2 (side bar box - random photo) Back seat drivers
  H2 (side bar box) Switch typefaces
  H2 (side bar box) Blogmarks
  H2 (side bar box) One year ago
  H2 (side bar box) Listening right now
  H2 (side bar box) New Music
  H2 (side bar box) Weather in Brighton
  H2 (side bar box) Webring
  H2 (side bar box) Syndicate
  H2 (blog roll category) Web design
  H2 (blog roll category) Burgled
  H2 (blog roll category) Looks, listens & reads
  H2 (blog roll category) Affiliated efforts
  H2 (blog roll category) Acquaintances

The implication here is that every blog post and every side bar box is an equally important sub-section of the whole page. So based on headings alone, blog posts are not distinct from side bars, and my blog roll categories are not distinct from other side bar boxes. However I use other mark-up for this purpose – blog posts are contained in the their own list, as is the blog roll – so what’s the problem?

Well there isn’t really a problem as the document is still quite well structured, but better use of headings could be useful. Headings can be used for automatic generation of tables of contents (there is already a Mozilla sidebar which does this); they are used by JAWS to quickly navigate through a document; and headings are used by Google in its ranking algorithms.

So it seems my home page structure could be more useful by changing all the h2s to h3s and adding in a couple of sub-headings:

H1 The website of Richard Rutter, a web producer from...
  H2 Site contents
  H2 Most recent ten posts
    H3 (blog post)I'm back posted 1 day ago
      H4 (subheading in blog post) So what's been happening...
    H3 (blog post) Back in a week posted 2 weeks ago
    H3 (blog post) Multimap redesign posted 2 weeks ago
    H3 (blog post) Tiger posted 3 weeks ago
    H3 (blog post) Gmail invites posted 3 weeks ago
    H3 (blog post) iTunes Music Store UK is empty posted...
    H3 (blog post) Dynamically underlining accesskeys posted...
    H3 (blog post) Footie mad posted 1 month ago
    H3 (blog post) British Sea Power posted 1 month ago
    H3 (blog post) Collaborative Design posted 1 month ago
  H2 Tools    
    H3 (side bar box) Search blog
    H3 (side bar box) Switch typefaces
  H2 Additional Info  
    H3 (side bar box) Random Photo
    H3 (side bar box) Blogmarks
    H3 (side bar box) One year ago
    H3 (side bar box) Listening right now
    H3 (side bar box) New Music
    H3 (side bar box) Weather in Brighton
    H3 (side bar box) Webring
    H3 side bar box) Syndicate
  H2 Recommended Links
    H3 (blog roll category) Web design
    H3 (blog roll category) Burgled
    H3 (blog roll category) Looks, listens & reads
    H3 (blog roll category) Affiliated efforts
    H3 (blog roll category) Acquaintances

Which brings up a question asked by Andy Budd: does it make sense to write a complete structured document only to then hide some of the content? That would be my case with the afore-mentioned structure: I already hide the h1 (it’s there for Google’s benefit) and I would also hide the new h2s. Given the reliance on headings to determine document structure, a few hidden ones to help add clarity would not be a bad thing.

That we’re having this discussion at all is due to the origins of HTML. It was originally conceived as a way of marking up scientific documents with a conventional heading, sub-heading, sub-sub-heading structure. Nowadays we are trying to apply the same methodology to more mature, hypertextually complex Web pages which, visually and functionally are somewhat different to academic papers.

Next

Previous

Related posts

Keywords

Machine tags

Comments

  1. 1

    “Given the reliance on headings to determine document structure a few hidden ones to help add clarity would not be a bad thing.”

    my thoughts exactly.

    Patrick H. Lauke
    Patrick H. Lauke’s Gravatar
    21 Jul 2004
    10:21 GMT
  2. 2

    I tend to use multiple <h1> to markup sections instead of one <h1> for the site name. I like the idea of separating the content from the rest of the page. The content section in your exapmple would be “Most recent ten posts”. This should be the first <h1> and should be in sync with the <title> IMHO. (I think the “Site contents” could be replaced with unstructured text because it is the only section before the content section).

    Tonico
    21 Jul 2004
    11:46 GMT
  3. 3

    Yeah, I liked Tomas’ observation a lot and dove right into changing the markup of my blog – I like the document structure better now and can see how it was confusing before to alternative user agents.

    I agree with Richard (that his header tags now look like they need cleanup) as well as with Tonico – the ‘cleaned up’ version or proposed new header order needs some pruning. The h1 text should be in the title only, no need for redundancy, the ‘site contents’ h2 should also go (or gives us more verbage about why it’s there in the first place, is it an accessibility thing?) and everything else should be bumped up a header level.

    And just so it doesn’t come across otherwise, clagnut already rocks in so many ways and this would be just a minor tune-up. Wow, that made me reminisce about how long I’ve been lurking around this blog and I dug up Eric’s original link to it – holy crap, that was a year and a half ago! Man, I’m feeling old…

    Al Abut
    Al Abut’s Gravatar
    21 Jul 2004
    18:48 GMT
  4. 4

    This topic is explored about as fully as is possible CollyLogic with contributions from me, Jason Santa Maria, Andy Budd, Andy Clarke, Mike Davidson, D. Keith Robinson, Jon Hicks and Paul Scrivens.

    Rich
    Rich’s Gravatar
    2 Aug 2004
    16:15 GMT
  5. 5

    Amen to your last paragraph. We tend to forget that HTML is, basically, rubbish. Utter rubbish. It describes documents. Who says a blog is a document? It’s more a sort of website/journal/magazine/discussion forum hybrid. In other words: it’s a blog.

    I’ve never quite figured out why we haven’t all ended up using an XML-defined language for things like this. Well, I guess it’s obvious: backwards compatibility. But just imagine it.

    Blogs are a great case in point. A very, very large degree of similarity in content, that’s evolved by consensus over a few years. Instead of this faffing around deciding which header tag best semantically describes a post of a sidebar (clue: none of them really do), someone could write BML, just like they did RSS, and sort it all out. An entry element. A blogroll element. A sidebar element. A comments element. Actual semantics in the mark-up.

    I dunno. I guess BML would be a sledgehammer to crack a nut. Plus, without HTML allowing us creative freedom to play around, the very workable consensus would probably never have been reached.

    Still. HTML. Not that great.

    Small Paul
    Small Paul’s Gravatar
    3 Aug 2004
    14:54 GMT

Add your comment

Comments are now closed on this post. If you have more to say please contact me directly.

Outside interest

Top Referrers