|
Comments by Phil Archer

The Internet Content Rating Association is a global non-profit body funded largely by the internet industry. Having consulted International opinion-formers, we produced a matrix of questions designed to address parental concerns about electronic content from as many cultures as possible. This questionnaire forms the basis of the ICRA self-labelling system, against which parents can set filters to block/allow content as they see fit. Currently the ICRA rating vocabulary is available using the PICS standard but will soon also be available as an RDF schema.
Our efforts to encourage content providers to self-label continue to evolve and we are now actively seeking ways in which labelling using the ICRA vocabulary can be seen simply as an adjunct to self-labelling for a variety of other purposes. Increasingly we're talking about "ICRA vocabulary in the metadata" rather than just using the term "self-labelling." This group's work is therefore of significant interest and I am pleased to see that Ernie Miller is actively involved. He and I are working on aspects of RDF and RSS as relevant to these issues.
ICRA's experience highlights a number of substantial problems with self-labeling that I'll run through quickly.
1. Webmaster inertia. Webmasters typically worry about things like browser and screen size compatibility, design, functionality and so on. Metadata is barely mentioned in any webmaster manuals, it's largely ignored or misunderstood by people at the front end. You begin therefore with an educational task.
2. Website structure. How do you define "a document?" A nice HTML file sitting on a server that someone can access through its URL is a nice idea and has some relevance as a historical curiosity, but, of course, most pages on the substantial websites are created dynamically on request. How do you label a document that doesn't exist until the server responds to a request to create it, perhaps pulling elements from multiple servers in several locations?
3. Trust. How do you know that the self-label is trustworthy? Most search engines ignore such metadata as they may find precisely because it is so notoriously inaccurate. A few webmasters still add "sex" as a keyword just to try (in vain) to improve their search engine rankings.
4. Chicken and egg. What's the point in labeling your site if no one's looking for labels, what's the point in looking for labels if no one's labeling?
These are issues we've been working on, from a different perspective and with different emphases than will apply to Democracy Inventory for some time. I don't pretend we have found a magic solution but I can tell you where we're beginning to focus our efforts.
First of all, as mentioned above, I'm working with Ernie Miller and others on aspects of RDF. An RDF description is designed to apply to a single URI - a severe limitation. From ICRA's perspective, we need a single RDF description to be applicable to multiple URIs, as defined by a part of their URL, a regular expression etc. Then you can produce an RDF description, probably including Dublin Core and other metadata as well as an ICRA set and apply it to lots of content. One can imagine, perhaps a film classification board making its 5 or 6 RDF descriptions available (and mirrored around the world) with films pointing to the relevant classification.
The webmaster inertia and website structure issues are leading us towards a new project to build a tool designed to be used by an editor/policy person/archivist - i.e. not a webmaster or server engineer. This would interact with their organisation's servers, wherever they may be, and allow annotations/descriptions/labels to be added to both electronic and physical resources using a variety of standards (RDF, PICS etc.) This is a nascent project I'm hoping to launch in the new year.
Trust? Chicken and egg? More tricky, but we're looking at Artificial Intelligence to help us out here. An AI analysis of a page/site can reveal in broad terms what the subject matter is, but it will never give a fully accurate description. An AI usint might tell you that a document is "something to do with Europe and the law" but it would take a human-written label, preferably the human that created the site, to tell you that it was about article 10 of the European Human Rights Act. In other words, the incentive to add accurate metadata remains strong, but, the user/search engine can have greater faith in the metadata if it matches the AI analysis with a few percentage points, and you have a working system for sites that have no metadata.
I would be very pleased to discuss these issues and others with members of the group. Different people are looking for ways to apply different metadata vocabularies to resources, a system that allows plurality through a common approach reduces the requirement for specialist education and increases the incentives.
Phil Archer
CTO
ICRA
Comments by Richard Swetenham

Ernest's idea about automating the updating process is excellent.
However, while it may be obvious to him, I have to admit I wouldn't be sure where to start with an RSS feed for my QuickLinks site.
I therefore agree with Robert Heverly that we should not overestimate the average contributor's technical skills.
Perhaps EM can help develop a blog-like tool. You could start with using existing software (open source so it can be tweaked?) for an e-democracy blog, where registered members could contribute items and could also upload documents.
[little plug - in all due modesty I think that "Status of EU initiatives" http://www.qlinks.net/quicklinks/status.htm counts as e-democracy resource - although I see I need to add in to that page what the Europena Commission is doing on the "participatory" side]
b) avoid getting obsessed by the database paradigm and beware being captured by your programmers. The database should be as unobtrusive as possible.
It is too easy to design a search form with too many choices, which actually makes finding stuff more difficult.
There should be a "simple" query level and an "advanced" level.
An alternative approach using pre-cooked queries should be available to the user, with links which produce catalogue-like lists where can be browsed either by theme or by country / region.
c) what does Google tell us? Can Ernie explain?
Comments by Steven H. Johnson

Topic Label/Prompt suggestions
December 22, 2002
I see this as a “starter list” – the broad categories may well be stable and permanent, but the topics within the broad categories will evolve, change, expand.
In other words, this is as much a list of “prompts” as a list of “topics”
The users – seeing any existing list – may wish to classify their interest using an existing term, or they may wish to expand their search by offering a related term that isn’t on the list yet.
Government Program & Institution Categories
Social Security
Medicare/Medicaid
Health and Medicine
Higher Education
Public Education/Public Schools
Police, Judiciary, Prisons
Transportation (Highways, Ports, Railroads, Subways, Buses)
Tax System/Tax Policy
Treasury Dept
Federal Reserve
World Bank
United Nations
World Health Organization
World Trade Organization
etc
Environmental Categories
Water (Rivers, streams, aquifers, wetlands, lakes, oceans, rain)
Air
Plant life (land, marine), Animal life (land, marine), Biodiversity
Air pollution
Water pollution
Soil pollution
etc
Health Categories
AIDS/HIV
Tuberculosis
Malaria
Alcoholism
Drug addiction
Obesity
Fitness & exercise
Holistic medicine
Western medicine
Nutritionals
Diet strategies
Prescription drugs
etc
Technology Categories
Internal combustion engine
Genetic engineering
Hydropower
Chemical warfare
Nuclear energy
Germ warfare
Land mines
Biological weapons
Internet
High tech
Biotech
etc
Industry Categories
Oil Industry
Auto industry
Chemical industry
Biotechnology industry
Defense industries
Pharmaceutical industry
Medical industry
Hospital industry
Entertainment industry
etc
Economic Categories
Economic growth
Globalization
World Bank
International Trade
World Trade Organization
Underdevelopment
Global Poverty
etc
Social Categories
Crime
Divorce
Abortion
Aging
Drug addiction
Christianity
Judaism
Islam
Interfaith Dialogue
Human rights
Rights of women
Slavery
etc
Political Categories
Conservatism
Liberalism
Radicalism
Libertarianism
Populism
Zionism
Radical Islam
etc
Geographic Categories
Urban problems
Middle East
Europe
Africa
Pacific Rim
Indian Subcontinent
Asia
Former Soviet Union
North America
Latin America
South America
Caribbean
Australia/New Zealand
Oceania
East Asia
etc
Trend Categories
Population explosion
Globalization
Violence
Volunteerism
etc
|