About a year ago I wrote about how tagging had become more than just a way to annotate content with keywords. Tags had become the wires by which disparate content could be connected. Through use of consistent and detailed tagging, I could begin associating my blog posts with my content on other web sites, such as my Flickr photos and my ma.gnolia links. This is, of course, the joy of API in action, and it made me think differently about how I was tagging my stuff.
Part of that thought was the inclusion of tags such geo:lat=50.382 which enabled me to associate a physical location with a blog post. Called triple tags at the time, we now know this format as machine tags thanks to Flickr. A year on, and other services such as Upcoming and Last.fm are encouraging their users to tag Flickr photos with machine tags such as upcoming:event=144002 and lastfm:event=97947 thereby associating photos with events and gigs, and enabling these services to illustrate content with relevant photography.
In terms of searching and API goodness, I believe Flickr is still the leader in terms of what can be done with machine tags. When tagging, the format is crucial and must be of the form namespace:predicate=value. Flickr recognises the three parts and stores them separately, thus enabling one to search within any namespace, for any predicate and/or for any value. Powerful stuff, as explained in Rev Dan Catt’s post earlier this year.
Anyone can invent any namespace and nomenclature. For example, I have tagged this photo with clagnut:blogid=1900 to explicitly associate it with this blog post. Flickr recognised the machine tag and now I can use the API to search for any photo tagged with the clagnut namespace.
The ability for anyone to create a namespace opens up an interesting future as we begin to see patterns and conventions emerge: amazon:asin=B000AA4I1M anyone? Which brings me to the point of this post. This time last year I was tagging photos and posts which discussed books in the following manner isbn=0713998393. In and of itself this works fine, but it is not a valid machine tag, which means we cannot make use of the afore-mentioned API goodness within Flickr (and where Flickr is leading so others will follow). We therefore need a triple-tag version of the ISBN tag, and here’s my suggestion: iso:isbn=0713998393. ISBN is a standard recognised by the International Organisation for Standardization (ISO) so I thought it made a certain sense for ISO to be the namespace. Other standardised entities could be tagged in a similar way, such as iso:issn=15340295.
Is an ISO namespace something likely to be used in the future? Is it fatally flawed, and if so how would you suggest machine tagging ISBNs instead? Discuss.
Update Further to Peter Firminger’s comment it seems Machinetags.org might become a good place to propose and develop define new namespaces.
Angus McIntyre wrote:
The use of ‘iso’ makes lots of sense to me.
However, the fact that namespaces can consist of a single keyword that anyone may create seems potentially problematic to me. Cases like ‘amazon’, ‘flickr’ or even ‘clagnut’ are perhaps not controversial, but what happens when five online photo galleries all decide to start populating a ‘photo’ namespace with tags? I could see a case arising where you had several different sites endorsing, for example, ‘photo:category=...’, all with wildly-varying values and semantics.
Java uses domain names for package namespaces. Would it make sense to adopt a similar convention, at least for tags in a taxonomy that is specifically linked to a particular domain (such as the ‘lastfm’ and ‘upcoming’ examples)?
Rich wrote:
Aaron from Flickr has this to say on the subject of multiple namespaces:
For example :
dc:subject=tagsxmlns:dc=http://purl.org/dc/elements/1.1/
Bruce Boughton wrote:
I had the same thought as Angus about namespace collisions. At first, Aaron’s solution of tagging the namespace seems a litle kludgey, but it’s actually probably more sensible than
or (Java-style)
if you’re going to be using unambiguous namespaces predominantly, as I think XML has shown us is the common usage.
As to whether iso is a valid namespace for isbn, it seems to be so. (Perhaps those with greater knowledge of ISO can offer greater enlightenment)
I had not previously really encountered this machine tagging phenomenon but if it takes off it is surely one of the more powerful approaches to the Semantic Web. HTML has shown us it is not sufficient to assume that user agents will be able to understand human-oriented content in a universally agreed manner, so it seems we must bow to the needs of our boxes.
Bruce Boughton wrote:
Perhaps we could replace co-comment (which I found to not be that good at its job) with comment tagging which associates a blog comment with an OpenID?
Get your blog pinging a comment aggregator and you’ve got decentralised comment tracking (or maybe not)?
lqd wrote:
IIRC ISBN are URNs, so why not urn:isbn=12345 ?
Paul Watson wrote:
ISBNs are indeed URNs (I know Wikipedia isn’t an authoritative source but check http://en.wikipedia.org/wiki/Uniform_Resource_Name )
Even more interesting about tagging ISBNs is the fact that ISBNs aren’t just randomly generated unique identifiers, but within each ISBN is encoded the language of the book, the publisher, and a unique identifier for that book (plus a checksum digit). That’s quite a bit of machine-readable information!
Paul Wib wrote:
For books I’d be tempted to keep the namespace as general as possible, you know, like “book”:
The generality increases the possible richness of tags, but at a trade off with the specificity of standards based tags like “iso:” or “urn:”. But maybe the concept “book” is a little more enduring than any standard?
kellan wrote:
I’d second Paul’s suggestion of using book:isbn= as the format. Neither iso: nor urn: add any really meaning/specificity to the tag.
Or, to think of it another way, how often will you want to search for all photos which have been marked up with absolutely any ISO standard? (iso:*=*)
Paul Watson wrote:
I like Paul Wibs’ suggestions – but I’d point out that “book:lang=en” is just duplicated information. The 1st digit of a 10 digit ISBN (or the 4th digit of the new 13 digit ISBNs which have become the standard since Jan 1st this year) identifies the language.
Using Paul’s example of “0976072696” – in this case the first digit “0” identifies it as an English language book (a 0 or 1 indicates English language, a 2 indicates French, a 3 indicates German etc (full list at http://www.isbn-international.org/en/identifiers/allidentifiers.html).).
I think any tagging standard should definitely use the new 13 digit ISBN standard (introduced because the number of available ISBNs was going to fall short of the expected number of new books), so 0976072696 becomes 9780976072690 (note: the final digit of the ISBN is a checksum digit, so you can’t just prefix a 10 digit ISBN with “978” – there’s a conversion script at http://www.isbn.org/converterpub.asp))
James Aylett wrote:
The problem with book: is that it doesn’t actually make semantic sense for everything. Not all things with ISSNs are books, for instance (doesn’t ALA have an ISSN now?).
Also, we don’t need book:title et al, because (as per Rich’s example in one of the comments) we have Dublin Core covering a lot of this. I’d favour using either iso: or urn: for the ISBN/ISSN – the former is probably slightly better because URNs don’t have the traction they perhaps should have, but ISO is pretty well recognised.
book:text and book:pdf are more interesting. I’m not yet convinced they’re worth it. Tags are (generally) for expressing things in a machine-readable and human-understandable way that we can’t do otherwise – but we have URIs and links. (Which gets us into content negotiation territories, but that’s not something machine tags should try to solve.)
There’s another problem I have, with the semantics: book:pdf:<uri> is linking to the text of the book, just as book:text:<uri> is, merely in a different format (and possibly a different fidelity). These are, at the end of the day, only representations of the resource that is the book (urn:isbn:FOO as a URI, possibly iso:isbn:FOO as a tag). There have been discussions of ISBN-URI to deferenceable-URI services in the past. Machine tags shouldn’t be used to boil the ocean :-)
Paul Wib wrote:
James, ISSN stands for International Standard Serial Number and is used for things published in successive parts, like journals, comics or possibly a website (although apparently the British Library, who control ISSNs in the UK, are not too keen on giving them to websites).
ISBN stands for International Standard Book number and is for any one off publication which (in the UK) the British Library deems worthy of one. So yes, an ISSN can be for many things, but an ISBN is always a book (hence my example).
I totally agree the semantics in my example weren’t rigorously thought out, but isn’t that exactly why tagging is so successful – it’s an easy way for your average human to create and understand meta-data without having to think too hard or know anything about schemas and standards. This is not OWL/RDF! Maybe book:pdf=url would emerge as a convention, maybe not, it doesn’t really matter when you don’t have hard semantics up front.
So in this context I’d still argue ‘book’ makes the most sense as a namespace. Most people I know don’t have a clue what a URN is and think Dublin Core is some new style of breakbeat with fiddles, but I’d wager most of them know what a book is :)
Paul Watson wrote:
The format of the tag needs to be dependent on how it’s used. Is the function of the tag is to link to the electronic form of a book then that’s one thing, but if the tag is going to be more commonly used in a blog post to link to a place to buy a book (e.g. a link to the product record page on Amazon)?
I’m sure the latter would be far more commonly used in blogs – and therefore far more quickly adopted. And besides, links to electronic versions of books can already be made machine-readable by the use of a DOI link.
I guess one alternative is to use something like “book:buy” for a link to Amazon or the Publisher’s online shop.
Pelle wrote:
My thought is this:
ISO is an organisation that makes standards. Until they make this perhaps one shouldn’t make it look like as if they had? Wouldn’t it be better idea to do something under your own flag? Like Microformats, DC and start something new up?
Like Machinetags.com and on that site standarize machinetags like these. For example mt:isbn=887r89w, mt standing for Machinetags.com
It could perhaps even be a project in the scope of Microformats since machine tags are to tags what microformats are to html.
Rich wrote:
Paul Wib – I really like the idea of using ‘book’ as the namespace. It’s much more in the spirit of tagging (machine or otherwise).
As you say, it seems unlikely that anyone would want to search for all stuff tagged with the namespace ‘iso’, whereas searching for stuff tagged with the namespace ‘book’ seems potentially more useful.
Peter Firminger wrote:
Pelle - I think you're referring to http://machinetags.org/wiki/ instead of .com.Submit the "book" or "isbn" namespace (and join the mailing list) if you think it's important.
Pelle wrote:
I refered to the concept rather than a specific site, nice to see that such a site exists – it or something like it should be built upon but instead of defining different namespaces it should define predicates under a namespace shared by the complete project…
Tim Beadle wrote:
For ISSNs, there’s already the PRISM (Publishing Requirements for Industry Standard Metadata) standard namespace. You can see it in action in any of IOP Publishing’s RDF (RSS 1.0) feeds, e.g.
Inverse Problems Latest Papers along with DC and Taxonomy metadata.
While I understand the need to take this stuff out of the ivory tower of academia and make it as usable as possible, it would make sense to minimise the number of wheel-reinventions as well.
Dan Champion wrote:
I’ve posted a draft book: namespace on machinetags.org to get the ball rolling there. All comments and input on the wiki and the machine tags mailing list very welcome.