»    Version 1
       Consultative Group
       December Transcript
       Draft Questions
  »    Group Comments
       November Summary
       November Transcript
       Prototypes
       Version 2
       Visualizations

Comments by Phil Archer

spacer

 

The Internet Content Rating Association is a global non-profit body funded  largely by the internet industry. Having consulted International opinion-formers, we produced a matrix of questions designed to address parental concerns about electronic content from as many cultures as possible.  This questionnaire forms the basis of the ICRA self-labelling system, against which parents can set filters to block/allow content as they  see fit. Currently the ICRA rating vocabulary is available using the PICS standard but will soon also be available as an RDF schema.

 

Our efforts to encourage content providers to self-label continue to evolve and we are now actively seeking ways in which labelling using the ICRA vocabulary can be seen simply as an adjunct to self-labelling for a variety of other purposes. Increasingly we're talking about "ICRA vocabulary in the metadata" rather than just using the term "self-labelling."  This group's  work is therefore of significant interest and I am pleased to see that Ernie  Miller is actively involved.  He and I are working on aspects of RDF and RSS  as relevant to these issues.

 

ICRA's experience highlights a number of substantial problems with  self-labeling that I'll run through quickly.

 

1. Webmaster inertia. Webmasters typically worry about things like browser  and screen size compatibility, design, functionality and so on. Metadata is barely mentioned in any webmaster manuals, it's largely ignored or  misunderstood by people at the front end.  You begin therefore with an  educational task.

 

2. Website structure. How do you define "a document?"  A nice HTML file  sitting on a server that someone can access through its URL is a nice idea  and has some relevance as a historical curiosity, but, of course, most pages  on the substantial websites are created dynamically on request.  How do you  label a document that doesn't exist until the server responds to a request to create it, perhaps pulling elements from multiple servers in several locations?

 

3. Trust.  How do you know that the self-label is trustworthy?  Most search engines ignore such metadata as they may find precisely because it is so  notoriously inaccurate.  A few webmasters still add "sex" as a keyword just  to try (in vain) to improve their search engine rankings.

 

4. Chicken and egg.  What's the point in labeling your site if no one's  looking for labels, what's the point in looking for labels if no one's  labeling?

 

These are issues we've been working on, from a different perspective and with different emphases than will apply to Democracy Inventory for some time.  I don't pretend we have found a magic solution but I can tell you where we're beginning to focus our efforts.

 

First of all, as mentioned above, I'm working with Ernie Miller and others on aspects of RDF.  An RDF description is designed to apply to a single URI - a severe limitation.  From ICRA's perspective, we need a single RDF description to be applicable to multiple URIs, as defined by a part of their URL, a regular expression etc.  Then you can produce an RDF description, probably including Dublin Core and other metadata as well as an ICRA set and apply it to lots of content.  One can imagine, perhaps a film classification board making its 5 or 6 RDF descriptions available (and mirrored around the world) with films pointing to the relevant classification.

 

The webmaster inertia and website structure issues are leading us towards a new project to build a tool designed to be used by an editor/policy  person/archivist - i.e. not a webmaster or server engineer.  This would  interact with their organisation's servers, wherever they may be, and allow annotations/descriptions/labels to be added to both electronic and physical resources using a variety of standards (RDF, PICS etc.) This is a nascent project I'm hoping to launch in the new year.

 

Trust?  Chicken and egg?  More tricky, but we're looking at Artificial  Intelligence to help us out here.  An AI analysis of a page/site can reveal  in broad terms what the subject matter is, but it will never give a fully accurate description.  An AI usint might tell you that a document is  "something to do with Europe and the law" but it would take a human-written label, preferably the human that created the site, to tell you that it was about article 10 of the European Human Rights Act.  In other words, the incentive to add accurate metadata remains strong, but, the user/search engine can have greater faith in the metadata if it matches the AI analysis with a few percentage points, and you have a working system for sites that have no metadata.

 

I would be very pleased to discuss these issues and others with members of  the group.  Different people are looking for ways to apply different metadata vocabularies to resources, a system that allows plurality through a common approach reduces the requirement for specialist education and increases the incentives.

 

Phil Archer

CTO

ICRA

 

 

Comments by Richard Swetenham

spacer

 

Ernest's idea about automating the updating process is excellent.

 

However, while it may be obvious to him, I have to admit I wouldn't be sure where to start with an RSS feed for my QuickLinks site.

 

I therefore agree with Robert Heverly that we should not overestimate the average contributor's technical skills.

 

Perhaps EM can help develop a blog-like tool. You could start with using existing software (open source so it can be tweaked?) for an e-democracy blog, where registered members could contribute items and could also upload documents.

 

[little plug - in all due modesty I think that "Status of EU initiatives" http://www.qlinks.net/quicklinks/status.htm counts as e-democracy resource - although I see I need to add in to that page what the Europena Commission is doing on the "participatory" side]

 

b) avoid getting obsessed by the database paradigm and beware being captured by your programmers. The database should be as unobtrusive as possible.

 

It is too easy to design a search form with too many choices, which actually makes finding stuff more difficult.

 

There should be a "simple" query level and an "advanced" level.

 

An alternative approach using pre-cooked queries should be available to the user, with links which produce catalogue-like lists where can be browsed either by theme or by country / region.

 

c) what does  Google tell us? Can Ernie explain?

 

 

Comments by Steven H. Johnson

spacer

 

Topic Label/Prompt suggestions

December 22, 2002

 

 

I see this as a “starter list” – the broad categories may well be stable and permanent, but the topics within the broad categories will evolve, change, expand.

 

In other words, this is as much a list of “prompts” as a list of “topics”

 

The users – seeing any existing list – may wish to classify their interest using an existing term, or they may wish to expand their search by offering a related term that isn’t on the list yet.

 

Government Program & Institution Categories

Social Security

Medicare/Medicaid

Health and Medicine

Higher Education

Public Education/Public Schools

Police, Judiciary, Prisons

Transportation (Highways, Ports, Railroads, Subways, Buses)

Tax System/Tax Policy

Treasury Dept

Federal Reserve

World Bank

United Nations

World Health Organization

World Trade Organization

etc

 

Environmental Categories

Water (Rivers, streams, aquifers, wetlands, lakes, oceans, rain)

Air

Plant life (land, marine), Animal life (land, marine), Biodiversity

Air pollution

Water pollution

Soil pollution

etc

 

Health Categories

AIDS/HIV

Tuberculosis

Malaria

Alcoholism

Drug addiction

Obesity

Fitness & exercise

Holistic medicine

Western medicine

Nutritionals

Diet strategies

Prescription drugs

etc

 

Technology Categories

Internal combustion engine

Genetic engineering

Hydropower

Chemical warfare

Nuclear energy

Germ warfare

Land mines

Biological weapons

Internet

High tech

Biotech

etc

 

Industry Categories

Oil Industry

Auto industry

Chemical industry

Biotechnology industry

Defense industries

Pharmaceutical industry

Medical industry

Hospital industry

Entertainment industry

etc

 

Economic Categories

Economic growth

Globalization

World Bank

International Trade

World Trade Organization

Underdevelopment

Global Poverty

etc

 

Social Categories

Crime

Divorce

Abortion

Aging

Drug addiction

Christianity

Judaism

Islam

Interfaith Dialogue

Human rights

Rights of women

Slavery

etc

 

Political Categories

Conservatism

Liberalism

Radicalism

Libertarianism

Populism

Zionism

Radical Islam

etc

 

Geographic Categories

Urban problems

Middle East

Europe

Africa

Pacific Rim

Indian Subcontinent

Asia

Former Soviet Union

North America

Latin America

South America

Caribbean

Australia/New Zealand

Oceania

East Asia

etc

 

Trend Categories

Population explosion

Globalization

Violence

Volunteerism

etc


Creative Commons License
This work is licensed under a Creative Commons License.

Copyright 2004 - 2009 Beth Simone Noveck