All chapters

Page Last modified 20 Apr 2016, 02:30 PM

Summary

Metainformation1 for the EEA metadatabase (hereinafter CDS2), discussed in this report, are descriptions of written reports, magazine articles, CD-ROMs, maps, data files, databases, WEB pages and the EIONET directory. The objective for the CDS is to provide the European environmental community with relevant metainformation at European level. A common understanding that there should be a core set of metainformation available on the CDS may be discerned among the actors on the EIONET3scene.This report presents criteria on how to select that metainformation.

The methodology used reflects the current use or demand for data as the most important criterion. In addition, different weights are given to sets of data and other items depending of the pressure of demand for them, thus giving a method of selection. Quality aspects are discussed and it is proposed that poor quality data records be improved before they can be accepted for entry in the metadatabase. In the future it should be regarded as normal procedure to deliver metainformation to the CDS with quality information built in into the records delivered.

A simple procedure is proposed, by which it is possible to designate a score for the metainformation candidate record for the CDS. The scoring procedure operates in a way closely resembling a taxonomic system for species. The scores vary in between 0 and 5, where 5 is the highest possible score.

The following are proposed:

To qualify at all for mention as a relevant metainformation it should have some relevance from an international or a European point of view.

It is proposed that the CDS should be kept as a high quality information tool with metainformation of high quality having the highest relevance to the assessment activities within the EIONET.

It is proposed that a scoring system be used to find a way of determining which metainformation should populate the CDS. The proposed scoring system takes into account availability of metainformation, metainformation quality, its sectorial and thematic relevance, the cost of the production of metainformation, geographical coverage, update and maintenance.

It is proposed that metainformation scoring 4 and 5 in the proposed scoring procedure will be entered in the CDS.

It is proposed that metainformation be collected on items from 1994 onwards. However, metainformation on databases covering long time series starting before 1994, and still being updated, should be accepted.

It is proposed that the metainformation collection should start in 1998 by making full use of the proposed selection criteria and that the metadatabase will be in a steady-state operational mode by the end of year 2000.

As a result of the proposals the CDS should consist of metainformation records on the following environmental items:

  • Data deliveries to the EU as a result of legislative reporting
  • Data requested by the EEA/EIONET on a regular and scheduled basis
  • Data requested by several international bodies
  • Items produced by the EEA/EIONET
  • Environmental databases operated by UN, OECD, EU, FAO and environmental conventions such as HELCOM etc.
  • Official National State of the Environment Reports
  • Official National Environmental Monitoring Programmes
  • National Environmental Resource Libraries
  • National metadatabases or reference databases on the environment

For national resource libraries and databases only metainformation on the databases and the libraries themselves (and not on their real data sets) should be kept in the CDS.

As a result, if the proposed selection criteria are used, it is estimated that approximately 300 metadata records per country and year would be produced, at an estimated annual cost of 4500 ECU per country.


1. Introduction

1.1 ETC/CDS work

The work plan of the ETC/CDS comprises the development activities for a common multilingual thesaurus GEMET, the development of different software to help to collect and disseminate metainformation and to introduce its tools within the EIONET framework. It is of great importance for the EEA to create a metadatabase that will be recognised as a quality metadatabase meeting high standards of reliability and consistency. The most important dissemination tool would be the WebCDS residing on the Internet and which became operational as long ago as May 1997.

The scope of this report is to focus on the need for metainformation as a means of gaining access to real data sets in the environmental work at European level. Metainformation of interest to users in the environmental community not only concerns environmental conditions but also includes metainformation on a wide range of conditions of society as a whole. One of the reasons for this is the more common use of the DPSIR methodology in environmental assessment studies.

1.2 Stepwise approach

This report and its proposals reflect the current varying treatment of metainformation in the different European countries. The very varied situation ranges from no national activity at all to managing a full metainformation database. The proposed methodology allows a stepwise approach to development of a European metainformation CDS tool. In the future, when the level proposed in this report has been attained, it will be possible to make a new decision to lower the threshold, thus making it possible to enter more information into the database.

In the last decade there has been an increasing demand for background materials for better policy and decision making in the environmental field. This has led to an increase in evaluations and integrated environmental assessments. Activities of this kind are also one of the key functions of the EEA itself.

Environmental sciences are multidisciplinary per se. To achieve a broader understanding of the environmental problems in an evaluation sense, still more fields will need to be appended to the list of disciplines. Examples of such new fields recently recognised are the different economic sectors of society and basic demographic conditions.

The basis of this report and its methodology reflects the need for data in all these sectors and different areas of interest to meet the demand from evaluators and assessment makers.

1.3 CDS users

It has been assumed that the following groups are the main users of the CDS:

  • EEA and its affiliated bodies such as the ETCs
  • National Focal Points
  • National Reference Centres
  • Main Component Elements
  • EU Commission and its DGs including EUROSTAT (with GISCO)
  • EU Parliament
  • Global Environmental Conventions
  • Regional Environmental Conventions
  • NGOs , the General Public and organisations

This year the EU Commission is revising basic regulation R1210/90 EEC, governing the EEA and its operations. It has been announced that the EEA "should develop into a European Reference Centre, a one-stop-shop for environmental information and data with modern Internet-based communications to facilitate access across Europe". The CDS fits well into such a framework.

1.4 Metainformation

Data and information resources required for environmental assessment and evaluation processes are stored in many hands in many countries. Some data might also be restricted and made available only on the basis of a detailed contract between the user and data provider. It is therefore impossible for any single user or evaluator always to be able to rely upon his or her own databases. Co-operation between many actors is therefore essential in order to make data available.

The use of metainformation is a shortcut to finding the real data sets or other information resources and to gaining access to them. Typically, metainformation consist of formalised descriptions of information resources. These may be of many different kinds, e.g, books, reports, CD-ROMs, magazine articles, databases and digital data files. When compiled in a register the metainformation will provide an excellent catalogue of where to find what data, given a common classification system. The objective for the CDS is to provide the European environmental community with relevant metainformation at European level.

The technical solutions so far available in the ETC/CDS context are the GEMET, the WinCDS and the WebCDS. Ideally, the different national reference databases could use the WinCDS to operate and maintain national CDSs. Extractions from national CDS into the European CDS, for presentation on the Internet by the WebCDS, could be made on schedule. It might be possible, (as has already been done with the WinCDS) to disseminate the WebCDS software for national use as well. This report, however, does not discuss technical solutions for the CDS in any further detail.

1.5 The Selection Criteria project

Intensive discussions in the ETC/CDS Advisory Committee, among the National Focal Points and within the European Environment Agency itself, have concentrated on the lack of clear aims as to the kind of metainformation that should be stored and updated in the CDS. Fears have been expressed that a mandatory request would have to be made to NFPs to deliver metainformation on all (or a huge number of) data sets produced in the various countries to the CDS, restricted information included. Some countries have also expressed concern that the CDS might not be of interest to them in a national sense and there would be therefore no incentive to maintain deliveries. Some of the fears and doubts may be due to the present lack or scarcity of national resources in many countries for the production of metainformation, including delivery to a European CDS. In addition, in many of the member states currently have no national metadatabases in operation from which to extract data.

However, it may be possible to find a common understanding in the member states for developing a core set of accessible metainformation on the CDS. As a result of the work in this project (task 6.1 in the ETC/CDS Work programme for 1996/97), criteria have been proposed for selection metainformation for the CDS.

2. Limitations

This report does not propose general methodologies for indexing or rules for the classification of metainformation and the classification systems. These are of course very important issues which must be dealt with. This could be a task for future joint project with the experts on GEMET and library classification systems.

The question of who delivers metainformation records is not dealt with in this report, although this question has been raised several times during the processing of reviewing the report. It thus seems important to develop delivery plans for future maintenance and updates of the database. This report assumes that the main national deliveries to the CDS will pass through the NFPs (or their nominees) in the various countries.

This report does not discuss technical software or hardware solutions for the CDS.


3. Methodology

3.1 Metainformation at European level, the very basic selection criterion

To qualify at all as relevant metainformation for the CDS, metainformation should at least have some relevance from an international or European viewpoint. It could be said that this fact is the very basic or axiomatic selection criterion for an ETC/CDS metainformation record. More specific criteria are discussed in the following sections.

3.2 The demand and current use of information resources are the main selection criteria

It seems reasonable that real data or information resources currently in use by international bodies for evaluation and assessment purposes, are those that should also be the most important for the CDS.

A simple procedure is used, whereby a score is given to a metainformation candidate record for the CDS. Scores can vary from 0 to 5, where 5 is the highest possible score.

The current use or demand for data is the most important criterion. Different weight is given to data sets and other issues depending of the pressure of demand, thus providing a method for selection. This is explained in further detail below.

An inventory of data sets and reports currently in use for international reporting has been made and is presented in Annex 1. Most of the information resources listed originate from a national inventory on international reporting carried out at the Swedish Environmental Protection Agency. Data requests as they appeared in the guideline for the Dobris+3 report have been added as well as some (albeit incomplete) information from central and southern Europe. The material is grouped by sector. Some records appear on more than one row. This is due to the multi-sectorial reporting in some inventories, eg, CORINAIR and the chosen grouping. The inventory in Annex 1 is used to produce the first proposal for an actual selection of metainformation records to be included in the CDS, the Annex 1 column headed "score".

3.3 Additional criteria

The sectorial and thematic relevance of real data sets and other items are discussed. Issues of a great importance for the establishment of environmental action plans should be regarded as important for the CDS. Among all sectors, themes and issues it may particularly be noted that some of those entering the environmental scene in a late phase do not yet have fully developed methodologies for monitoring, analyses and assessment. Metainformation from those sources should therefore be treated with particular care. In the scoring procedure described below, allowances should be made if their metainformation records are not as good as those originating from the more traditional environmental actors. The themes, sectors and issues that should be given relevance in this sense are discussed further below.

Metainformation quality, the actual possibility of obtaining metainformation, keeping it updated, and the cost for obtaining it should also be considered selection criteria. These criteria are also discussed below. The quality criteria, dealing with comparability and the common standard for subject indexing, are very important. Another awkward question to be dealt with is the possibility that some of the metainformation records might be restricted in some countries. It may also be possible for metainformation records to be produced and published within the frame of the CDS even though the real information resource or data set is restricted. One case already familiar to the data processors within the EIONET concerns the production of activity rates to accompany the emission factors within the CORINAIR inventories.

3.4 Scoring procedure

A simple procedure is used, whereby a score is given to a metainformation candidate record for the CDS. The scoring procedure operates in a way closely resembling a taxonomic determining system for species. The scores vary from 0 to 5, where 5 is the highest possible score.

It is proposed that the scores 4 and 5 will render a metainformation record eligible for inclusion in the CDS. The basis for this proposal is the recognition and reviewing process for this report among peers during its development.

3.5 Different media used for real data

In this report the main prerequisite for developing metainformation criteria does not concern the media on which the real data is originally appears. However, it might be useful to go through those media which may be considered to be carriers of real data sources and from which metainformation can be generated.

3.5.1 Published written material, CD-ROMs and WEB-pages

Written material as reports, articles, CD-ROMs etc are taken into consideration. They should be indexed according to the GEMET in order to fit into the ETC/CDS system of harmonised indexed metainformation. A list of possible written material candidates for the CDS can be found in Annex 1, Table 2.

Materials from web sites are considered of interest in the same way as printed material. The rapid turnover of pages on the Internet, however, makes it more difficult to apply the selection criteria for the ETC/CDS database to these resources. Moreover, a web site might consist of one single page or many pages. A metainformation record can be connected to virtually every page, to the web site as such or to suitable grouping of pages.

3.5.2 Maps

Maps produced for environmental purposes are of interest. They can be in any format or reside on several media available for use by the environment community.

3.5.3 Data files

Descriptions of data files or other digital information stored in databases are the data most commonly thought to inhabit an environmental metadatabase. This will probably also apply to the CDS.

3.5.4 EIONET directory (address database)

The EIONET directory is the address database and thus describes the institutions and persons connected to the EIONET. It has already been decided and agreed among NFPs that the EIONET directory should be included in the ETC/CDS database


4. Proposed selection criteria

4.1 Discussion

4.1.1 Requirements

Since the ETC/CDS database should serve the European environmental community in its efforts to create evaluations and assessments of the environment in Europe, it is an axiomatic prerequisite that data should have at least some relevance from an international or European point of view. This means that information resources merely mirroring national conditions not of international interest does not qualify as a resource identified in the ETC/CDS database.

National reference metadata of interest to the international community and identified during the review process for the report are descriptions of

  • Official National State of the Environment Reports
  • Official National Environmental Monitoring Programmes
  • National Environmental Resource Libraries
  • National metadatabases or Reference databases on the Environment

Metainformation on these issues should be included in the CDS.

A large number of data sets and other information resources are already in circulation in the field of international and European environmental reporting. An inventory of this material is presented in Annex 1 with a view to showing what is included as important from a reporting viewpoint. It contains the current requests and the demand for data for use in compiling scheduled environmental evaluations, creating scenarios or reporting to various international fora on the state of environment or pollution rates for compliance reasons. The inventory is more complete for northern Europe than for the south, where some information is clearly lacking. However, this may have little effect on further discussion of the selection criteria. The actual proposed selection will be affected however, since full information on the existing data requirements is not available.

Data sets, apart from those generated for international environmental reporting purposes (and shown in Annex 1), can also be found. They are probably produced ad-hoc within the international community, being generated to carry out different projects or as the results of specific research activities. It is not possible or meaningful to set up an inventory of such data sets. However, from time to time they may be of sufficient importance to meet the criteria and then qualify for inclusion in the CDS.

Records of metainformation for possible entry in the ETC/CDS database might originate from the indexing of articles, books and web pages. The flood of information into the general library classification systems is very large. It is far beyond the scope of the CDS to include general library classification within its walls. The inclusion of links to relevant library services already available on environmental topics should be considered, however. It is therefore proposed that metainformation concerning National Resource Libraries themselves be collected in order to create metainformation records that can be used to link those libraries into the CDS.

4.1.2 Quality assurance

The role of quality assurance as a selection criteria for metainformation in the CDS should also be discussed. This is dealt with further below.

4.1.3 Scores

Different criteria will affect the investigated metainformation record examined and give each metainformation set a relative score:

  • very high 5
  • high 4
  • medium 3
  • low 2
  • unusable 1

The given score will determine whether the metainformation record qualifies for inclusion in the CDS. The designation procedure and scoring are discussed in the next section and the limit for inclusion of the record in the database is discussed in the recognition section,

4.1.4 Formal EC legislative demands

Many data sets and reports are delivered to different EU institutions pursuant to EC environmental legislation. Metainformation describing these items would be of interest for the CDS and they should have the highest score if they are produced on a regular basis.

Proposed score: 5

It is worth noting that the formal regulatory demand made of the EEA itself is that the EEA should improve data availability, data harmonisation, comparability and consistency. These matters are more of a general nature and are discussed in the section 4.1.11 on quality.

4.1.5 The potential use of information resources

4.1.5.1 The pressure of demand for data periodically requested or used

Many data sets and other items produced are presented internationally to many recipients such as global conventions, regional conventions and the EU commission and /or the EEA. In addition neighbouring countries frequently have mutual agreements on information- sharing on issues of common interest.

Data available in different places vary with the relative importance they have been given in the past. One useful approach is to grade importance in relation to the obligations and requests made to governments from various international bodies to deliver real data sets and reports. It is reasonable to assume that if many such bodies require data on a certain issue, that piece of information is likely to be important. Metainformation on such data sets or reports should be given a high score.

Proposed score if the information resource is requested from at least three bodies: 5
Proposed score if the information resource is requested from at least two bodies: 4
Proposed score if the information resource is requested from at least one body: 3

The EEA/EIONET should be regarded as an authoritative international body and should be treated in the same way as the other international bodies discussed in the previous paragraph on "pressure of demand". Metainformation describing information resources requested or used should therefore be given a high score. Since the CDS is the EEA metainformation tool it is fair to score EEA/EIONET data one point higher, however.

A distinction should be drawn between metainformation describing information resources that are requested and used on a regular basis and those used ad hoc or for a single project. The score should be higher for data produced on a regular basis.

Proposed score if the information resource is requested or used regularly by the EEA/EIONET: 5
Proposed score if the information resource is not requested or used on a regularly basis by the EEA/EIONET: 47

In order to maintain high credibility as a quality metadatabase the quality rules, requirements or conditions for metainformation should be the same for the EEA itself as for other metainformation providers. Please refer to the discussion below on metainformation quality, 4.1.11.

4.1.5.2 Different geographical aggregation levels required for a specific determinant

Various international fora frequently request data on the same issue. It is also common that these fora require different levels of geographical aggregation to meet their specific needs. In these cases an attempt should be made to confine the metainformation records at European level in the CDS to data sets representing country levels. But if EC legislation requires more detailed data, the metainformation records in the CDS should meet those requirements.

4.1.6 Sectorial and thematic relevance

Metainformation describing information resources from sectors and themes ranked highly because of their great environmental interest should be considered important for the CDS. Some of these sectors and themes are still emerging and knowledge is evolving. It is therefore important to encourage developments in those emerging fields and the EEA should try to place additional emphasis on this metainformation production.

Some real data records from evolving areas may not yet fulfil all quality assurance requirements. In order not to discriminate against metainformation records from emerging sectors with a high environmental interest potential, these metainformation records should be given an extra scoring point.

Sectors and themes to be taken into account should be those raised in the Fifth Action Programme and the Dobris+3 report: Industry, Energy, Transport, Agriculture, Fishery, Tourism, Climate change, Acidification, Chemicals, Radiation, Ozone depletion, Air quality, Water resources, Marine environment, Nature and biodiversity, Natural Resources, Urban environment, Noise, Coastal zones, Soils, Waste management and Land use.

Among all those sectors, themes and issues the following should be given a high priority according to the conclusions and findings of the European Environment Agency Review of the 5th Action Program: 

  • CO2 emissions
  • Traffic related issues: NOx emissions and noise
  • Water abstraction
  • Quality of ground water
  • Quality of marine waters
  • Chemicals in the environment
  • Erosion and desertification
Proposed increase of score for metainformation from the sectors and themes in the list above: 1.

4.1.7 Possibilities of obtaining metainformation

Since it is important to present high quality data in the CDS (e.g. consistent data with no geographical gaps), no attempt should be made to collect a certain category or type of metainformation unless it is clear that it is practicable from administrative and networking conditions. The reader of this report should also refer to the comments below on metainformation updating requirements, 4.1.13 and the comments in section 4.1.12 on "Geographical coverage".

Obstacles to be taken into account are:

  • Lack of specification of the real information resource that is described by the metainformation
  • Real data item and its metainformation records are restricted
  • Data providers do not have the knowledge and skills to process information/data of a specific kind or on a specific issue
  • Geographical coverage is severely affected by delivery problems

If there is great interest in proceeding with development in such areas, the ETC/CDS or the EEA itself should endeavour to start a development project on this specific issue before collecting metainformation records for the CDS.

It might be the case, however, that metainformation can nonetheless be delivered from a vast majority of countries on a certain issue. Although the consistency between countries will suffer, it may still be useful to include important metainformation in the CDS.

Proposed reduction of score for metainformation which is difficult to produce: 1.

4.1.8 Guideline on the classification of depth and level of metainformation record aggregation

Although this report does not deal with classification methodologies, it is important to discuss the degree of detail with which a real data set or report should be described so as to avoid overflow or counterproductiveness in the system. The following guidelines of the depth of the classification should therefore be followed. Please refer also to the comments under section 4.1.10 below on the costs of producing metainformation.

  • Written material and the like: classification at a level sufficing to describe the report on an upper thesaurus (GEMET) level, preferably level 2.
  • Data series such as those listed in Annex I: Classification at country level even if the actual data series contains broader and more detailed material. The information should relate to main areas of a river basin, lake, coastal area etc.
  • The determinants of (e.g, total phosphorus, total Hg etc.) should be given at an aggregate level.
  • Times resolution should be confined to a yearly basis (if appropriate)

With regard to the current status of GEMET (version 1.0), it must be stated here that additional items or lists will have to be introduced to develop a high consistency metadatabase for the CDS; geographical information such as countries, cities, municipalities, rivers, lakes, drainage areas, coastal areas etc. There is also a pressing need for the main determinants related to international reporting on pollution, such as total phosphorus, CO2, etc.

It is also of importance to create a forum for people involved in the indexing process, in order to facilitate the transparency and comparability of the material being indexed. It is proposed that this index forum meet once a year within the framework of the ETC/CDS. The forum might also propose to develop a guideline for indexing if needed.

4.1.9 Proposed limit for retrospectiveness

In order to avoid overloading the information supply channels available, a strict approach to retrospectiveness is recommended. It is proposed that, as the general rule, metainformation should not be gathered on written material available prior to1994. This is the same year as the EEA started its operations in Copenhagen. This rule should apply particulary to written items such as reports and magazine articles etc. Databases covering long time series, starting before 1994, and still being updated, should be accepted.

4.1.10 Cost of producing metainformation

There is a cost involved in producing metainformation. Whenever a decision made to start indexing a certain type of material this must be considered a long-term cost. Given good computer software with a well-integrated thesaurus, indexers themselves argue, the level of indexing is not the main factor for determining the cost. Instead it is of greater importance to decide the very type of material to be indexed.

It has been estimated that the initial creation of a metainformation record will take 30 min (15 ECU) per item as an average, including proper indexing and registration in a database. The same effort, however, is involved in creating the (same) national metainformation record, which already occurs in several member states. Hence, it is not self-evident, that the need for metainformation at the ETC/CDS will always increase costs. Nevertheless, the cost will be evident to countries not currently operating metatdata or reference databases of their own. On the other hand, countries already operating a national metainformation system might meet the cost of selecting records to submit to the CDS. Additional costs incurred by countries will then be unevenly distributed among the EEA member states.

Different information resource material generates different costs because their abundance varies. Magazine articles and web pages currently dominate the information flow. It is therefore necessary to have a conservative view on how much material of this kind should be indexed for the cause of ETC/CDS. Please refer also to the comments above under section 4.1.8, "Aggregation of metainformation".

If the proposals in this report are put into practice, some 300 metainformation records per year and country will be created, at a cost of approximately 4500 ECU.

In certain cases the creation of metainformation records may be difficult to index or create due to a lack of classification systems in a specific area etc. Costs could then rise dramatically. In such cases the score should be reduced by one point.

Proposed decrease of score for metainformation involving high production costs8: 1.

4.1.11 Metainformation quality

4.1.11.1 Metainformation standards

The CDS is designed to follow the GEMET (thesaurus) standards for metainformation so as to improve accessibility, comparability and consistency between countries and within issues, themes and sectors. Indexing of metainformation in the ETC/CDS metatdatabase should therefore follow GEMET set of index terms. The better the compliance to GEMET the more useful the database record will be. Metainformation records not indexed or not transferable into indices according to GEMET should be given a relatively low score: 3. The efforts already made in 1997 to collect metainformation have not fully taken this requirement into consideration. This gap should be recognised and filled in conjunction with subsequent updates. Please refer also the proposals in section 8 on the implementation of the criteria.

Proposed highest score for metainformation not using GEMET indexing terms: 3

4.1.11.2 Real data quality reflected by the metainformation record

One of the overall objectives of the EEA work is to improve consistency and comparability of environmental data. Quality controlled data sets and the identification of quality control procedures will gradually develop over time. At present, however, the are no general quality guidelines for European environmental data. This means that this report only discusses these matters on a very basic level.

Many of the real data sets produced on schedule throughout Europe are the result of environmental monitoring activities or emission/discharge monitoring. In most cases they are probably sampled, analysed and presented by using national or international standard methods described in guidelines for monitoring and analyses in their respective areas. It should be regarded as normal procedure to deliver metainformation to the CDS accompanied by such quality information. To make basic quality information on data of this kind available is a matter of good housekeeping rather than lack of information or knowledge.

There is a need to harmonise the description of standardised metainformation records so as to bring quality control procedures into play. Deliveries including quality descriptions should be encouraged. The EEA should try to initiate a general project on harmonisation of quality control procedures for metainformation records.

Proposed change of score gained in the earlier procedure for metainformation on sets of data produced according to authoritative guidelines: 0.

Other real sets of data produced ad hoc may or may not follow common guidelines or standard procedures in sampling and analysing. It is therefore suggested that descriptions of such data not being produced according to common guidelines should be considered somewhat less reliable; the score should be lowered by 1 point.

Proposed reduction of score for metainformation on sets of data produced ad hoc data not in accordance with authoritative guidelines: 1. 

4.1.12 Geographical coverage

Metainformation covering the whole of Europe is much more valuable than metainformation displaying gaps in geographical coverage. A reduction of the score is proposed where it is evident that data will not be obtained from a certain area which would be useful and relevant in terms of coverage. Please refer to the comments in section 4.1.7 "Possibilities of obtaining metainformation".

4.1.13 Update and maintenance

Real data sets and reports produced on a regular basis and reported internationally may be said to be ideal items when updating metainformation records. In most other cases it is less clear when and from whom an update will eventually arrive.

The credibility of the CDS is dependent on updates and proper maintenance, otherwise it will soon have earned a poor reputation and will not be used by people searching for quality information. Before giving a certain set of metainformation the go-ahead for inclusion in the CDS, it should also be considered how and when and who is going to update these specific metainformation records.

A distinction must be drawn between metainformation concerning frequently produced items and items produced in projects or ad hoc. Information resources produced on a scheduled basis should be accompanied by a clear commitment on maintenance and updating procedures, whilst other records might be viewed somewhat less strictly in these respects.

Proposed reduction of score for metainformation to describe data sets or other items produced on a scheduled basis and without a maintenance and updating procedure: 2.

Proposed reduction of score for metainformation to describe data sets or other items produced on an ad hoc basis and without a maintenance procedure: 1.

It is also proposed that the CDS should contain an additional data field showing the name of the person responsible for the update so as to ease updating and maintenance procedures. This field should be kept internally in the database and not be accessible to the general public. The content of this proposed field might differ from the fields currently presenting information on the holder of the real information resource.

4.2 Proposed selection criteria and scoring path

4.2.1. Criteria proposal

Metainformation to be selected to the CDS should meet the following criteria:

4.2.1.1 Main criterion

  • The metainformation should have international or pan-European relevance

4.2.1.2 Other proposed criteria

  • The metainformation records should describe items that have been delivered on a formal or regulatory basis to the EU/EEA/EIONET
  • The metainformation records should describe items that have an intensive current international or pan-European use
  • The metainformation records should describe items produced within the EEA/EIONET frame or work plan
  • Records should link to the EEA/EIONET directory (address list) when produced within the EEA/EIONET frame
  • A metainformation record may describe an official national State of the Environment Report, an official national environmental monitoring programme, a national environmental resource library or an official national metainformation database

4.2.1.3 Conditions that might affect the relevance of the selection criteria and make them less valid

  • The possibilities of obtaining the metainformation
  • Metainformation quality
  • The costs of producing the metainformation
  • Metainformation update and maintenance procedures

4.2.1.4 Conditions that might affect the relevance of the selection criteria and make them more valid

  • Metainformation records describing information resources representing sectors and themes with a high thematic and sectorial relevance

4.2.2 The selection criteria path

To enter a certain set of metainformation in the selection process, please follow the path below. Start at part A and end at part C, following the instructions. The score remaining after passage of the selection criteria path is then compared to what has been agreed between data providers and users in section 5.2.

 

Path, part A

Criteria

Score

Information resource of some international/European relevance? No=0, leave Yes= continue
Information resource produced according to EC regulation? Yes=5, goto B No=continue
Information resource requested by EEA/EIONET on a regular basis ? Yes=5 goto B No=continue
Information resource requested by EEA/EIONET on a non-regular basis and at least 1 more international body? Yes=5 goto B No=continue
Information resource requested only by EEA/EIONET9 not on a regular basis Yes=4 goto B No=continue
Information resource requested by 3 or more international bodies other than EEA/EIONET? Yes=5 goto B No=continue
Information resource requested by 2 international bodies other than EEA/EIONET? Yes=4 goto B No=continue
Information resource requested by 1 international body other than EEA/EIONET? Yes=3 goto B No=continue
Information resource is a national environmental resource library, or an official national metainformation database, or an official "National State of the Environment Report" or an official national environmental monitoring programme Yes= 4 goto B No= 0, leave

Path, part B

Criteria

Scoring

Problems in obtaining the metainformation10 Yes -1, continue No=continue
High costs of producing metainformation Yes -1, continue No=continue
Metainformation not meeting GEMET Yes score <=3, continue No=continue
Information resource not produced according to authoritative guidelines Yes -1 No=continue
Frequently produced metainformation without updating and maintenance procedures Yes -2 No=continue
Non-frequently produced metainformation without updating and maintenance procedures Yes -1 No=leave

Path, part C

Criteria

Scoring

Information resource with high thematic and sectorial relevance Yes +1, ready No= ready



5 Consensus and level of recognition

5.1 Recognition procedure

The recognition procedure when developing the selection criteria is considered to be very important if the EEA, data providers and the users of the CDS are to be able to find a common platform.

Since it has been important to involve all the main ETC/CDS users and data providers in the selection criteria project, a special reference group has been set up. Before the project started the project plan was discussed within the EEA, the ETC/CDS consortium and the reference group.

The project work has been as open as time limits have permitted. During the fist phase of the work the unfinished first draft report was reviewed as "pre draft" by peers, the ETC/CDS itself and by EEA staff. Valuable comments and suggestions were collected and taken into account.

The first draft was distributed for review to NFPs, EEA staff, ETC/CDS consortium, ETC/CDS Advisory Committe, ETC leaders and other peers in mid-July 1997.

In September the first draft report was presented and discussed in Hannover at the ETC/CDS consortium meeting, at the ETC/CDS workshop and at the ETC/CDS Advisory Committee meeting.

In October 1997 the first draft report was presented and discussed at the NFP/EIONET meeting at the EEA in Copenhagen. The report was discussed in great detail at the second meeting of the Reference group in October 1997.

During the recognition procedure it was found that data achieving a "Point 4 and 5 score level" should be included in the database. "Point 3 score level" might be considered for inclusion in the CDS at a later stage. Data from this category should be developed in respects that are currently weak and eventually be brought up to a higher level. This is preferable to merely lowering the threshold value.

The Hannover and Copenhagen meetings resulted in e-mail correspondence and telephone discussions. At the meetings a common understanding was expressed on the main results and the methodology used. Official national reference databases and official national environmental monitoring programmes were proposed for inclusion in the selection system since they are of interest to most users. The stepwise approach suggested in the methodology should also be more clearly described in the final report.

5.2 Results of the recognition procedure

A very high degree of common understanding between the different players was found. It also seems that a consensus regarding national deliveries and dissemination of metainformation to populate the CDS has been reached as follows.

Score 5: Metainformation should be included in the CDS

Score 4: Metainformation should be included in the CDS

Score 3: Metainformation should be further developed and might be included at a later stage

Score 2: Metainformation should not be included in the CDS

Score 1: Metainformation should not be included in the CDS

Score 0: Metainformation is useless for the CDS


6 Proposal for the first selection

Annex 2 contains illustrative examples of how to use the selection criteria path.

The data resources described and listed in Annex 1 have been processed through the selection criteria path proposed in this report. The scores achieved after passage of the selection criteria path are given in the column headed "score". About 200 items will gain a score >=4 and their metadata will be allowed to inhabit the metadatabase.

6.1 Number of metainformation records per year and annual cost per country

Approximately 200 real information items from the international reporting scene may be expected to qualify per year per country for the CDS. In addition about 100 reports, web pages, maps and magazine articles may be expected to qualify per year and country. This will entail a cost of about 4500 ECU per country and year.

6.2 Additional metainformation records of special interest

It has been mentioned earlier that a few types of metainformation are of particular interest for the CDS even if they do not obviously achieve a high score during the scoring procedure. These types of metainformation records are descriptions of:

  • National environmental resource libraries
  • National environmental reference or metadatabases
  • Environmental metadatabases and databases operated by UN, OECD, EU, HELCOM and similar bodies
  • Official national monitoring programmes
  • Official national "State of Environmental Reports"

It is proposed that metainformation records describing those resources should be established and maintained jointly by the ETC/CDS and the respective institution.

7 Delivery of metainformation

It is outside the scope of this report to propose how metainformation for the CDS should be delivered and from whom. Nevertheless, this question affects some of the criteria previously discussed: the quality assurance and the possibilities of obtaining metainformation.

7.1 National submissions

It has been assumed above that the main national deliveries to the CDS will occur via the National Focal Points.

It is clear that the original data producer has the very best knowledge of the real data and is therefore also the one best equiped to provide a set of metainformation. Metainformation produced in different countries from original data producers could be transferred to the CDS through the NFP or the transfer, at least, could be supervised or coordinated by the NFP so as to ensure that metainformation deliveries comply to the requested format and really do meet the selection criteria. Once the telematic EIONET has become operational the data transfer to the ETC/CDS could use EIONET as the transfer medium.

7.2 EEA and ETC submissions

Metainformation created from material produced by the EEA itself could be transferred to the ETC/CDS directly over the telematic EIONET.

A special group of data processing institutions are the ETCs. They collect copies of data from the original data providers in different countries or institutions and store these real data in databases of their own. Most often they also create new aggregated and condensed data sets from collected data. It is then possible, and desirable, for the ETCs to generate metainformation descriptions of their databases and deliver them to the CDS. It is important, however, to avoid duplication of metainformation records referring to the same data resource. It would be advantageous, if a standard id-description for metainformation sets could be developed in the near future. It is proposed that this could be achieved in international co-operation between the EEA and other international bodies managing information storage and retrieval.

Another solution would be to find other metainformation providers of the real data and information resources but NFPs and ETCs. This might ease the burden on the NFPs and ETCs since they then not would be responsible for metainformation deliveries. However, this method of metadata generation may adversely affect the quality of the metainformation with a factor that is hard to predict. Moreover there is no such general metainformation provider available at present, although it should be possible to develop a certain metainformation generating unit connected directly to the ETC/CDS .

7.3 Quality checking

The ETC/CDS is responsible for the operation and maintenance of the metadatabase. Checking deliveries before updating the database with new or corrected records should be the responsibility of the topic centre. This task could be very time-consuming. In addition, ensuring that all scheduled deliveries are made promptly to the ETC/CDS from various parties might involve a constant huge demand for resources the topic centre. Deciding who will be responsible for correct indexing and classification of records collected and submitted to the CDS is very much a question of management within the EEA itself and between EEA and its member states. This report cannot propose how this should be dealt with, merely identify this area as deserving attention in the future.


8 Time Frame for Criteria Implementation

It is proposed that, following possible adoption of the selection criteria, the member states to the EEA and the ETCs start to deliver metainformation to the CDS. In practice this could start in 1998. However, it is not realistic to expect full participation of the countries until 1999. It should be possible to have full proposed retrospectiveness (metainformation from 1994 and onwards) for all countries by the end of year 2000. A delay of approximately half a year before a metainformation record appears in the metadatabase after the creation of the original information resource, must be expected.

This timetable is proposed for the collection of MI (metainformation):

1997 Start of MI collection ETCs
1998 Quality control of collected MI material to comply to GEMET etc ETCs
1998 Start of MI collection for 1997 items NFPs
1998 Start of MI collection on international DBs ETC/CDS
1998 Start of MI collection on older items NFPs
1999 Full work ETCs and NFPs
2000 Full work ETCs and NFPs
late 2000 The metadatabase in steady-state operation  

9 Proposed further developments

Various needs and proposals for future work mentioned in the report are listed below, in no particular order.

  • There is a need to develop a delivery plan for metainformation to the CDS.
  • There is a need to harmonise metainformation records so that quality control and quality control procedures are also brought into play
  • There is a need for metainformation to be given a unique identification tag to avoid duplication of records and to knit metainformation as closely as possible to the creator/author of the real data
  • There is a need for the EEA and the ETC/CDS to determine who is to deliver metainformation describing real data sets stored in databases and web pages. Attention should be paid to delivery procedures
  • It should be investigated whether it is possible to disseminate the WebCDS software to interested countries for possible use at national level
  • The question of responsibility for indexing and classification of the records collected for the CDS should be addressed by the EEA and between EEA and its member states
  • Additional items must be introduced into GEMET (or connected to GEMET as accessory lists) such as geographical information (countries, cities, municipalities, rivers, lakes, drainage areas, coastal areas etc). There is also a pressing need for lists of the main determinands related to international reporting on pollution
  • A forum should be created so that people involved in the indexing work could meet and exchange views, in order to facilitate transparency and comparability. It is proposed that this index forum meet once a year within the framework of the ETC. The forum should, if necessary, initiate the development of indexing guidelines.
  • A specific field should be created in the metadatabase for maintenance purposes. It should contain the name of the person working on update and maintenance of the record.

1 Metadata and metainformation are terms used to describe different information resources. Most often they are mixed but the ideal would be to use "metadata" to describe databases and "metainformation" to describe information resources as a whole.

2 CDS: Catalogue of Data Sources

3 EIONET European Environmental Information and Observation Network

4 GEMET, General European Multilingual Environmental Thesaurus

5 DPSIR, Driving Forces, Pressure, State, Impact and Response indicator approach

6 NFP, National Focal Point (to EEA)

7 If there is an additional higher pressure of demand from outside the EEA/EIONET the score is 5.

8 The cost of creating a normal metainformation record is estimated to be 15 ECU

9 NB - If the address of the data producer is missing it should be entered in the CDS as part of the EIONET directory.

10 Including access restrictions

up.gif (859 bytes)
Table of contents Aanex 1
Subscriptions
Sign up to receive our reports (print and/or electronic) and quarterly e-newsletter.
Follow us
 
 
 
 
 
European Environment Agency (EEA)
Kongens Nytorv 6
1050 Copenhagen K
Denmark
Phone: +45 3336 7100