Creating metadata for datasets on FinBIF

When sharing data through the Finnish Biodiversity Information Facility (FinBIF), comprehensive metadata is essential. Metadata serves as a detailed guide to the dataset, outlining its content, context, and utility. This not only enhances data transparency but also facilitates its accessibility and usability by a diverse audience ranging from researchers to policymakers. This guideline delineates the specific types of metadata that are essential for integration into FinBIF.

While only some of the metadata elements are compulsory, providing a full spectrum of metadata significantly enhances the data’s value, ensuring that users can effectively apply, interpret, and benefit from the biodiversity information provided.

For an example of comprehensive metadata, see the 4th Bird Atlas observations from Notebook -dataset (partly in Finnish).

Mandatory metadata

Name of the collection in Finnish and English (Swedish optional): Descriptive, distinctive name that separates the collection from other possible collections. The name should be understandable to others as well as the collection maintaining team. (Thus the collection name cannot be "Northern Fennoscandia", unless it actually consists of collections of Northern Fennoscandias. "Lichens of Northern Fennoscandia" or something similar would be more correct.)

Even though the collection is a part of other collections higher in the hierarchy, the organisation name holding the collection should be added to the name of the lower lever collections, too, to keep names distinctive. For "Lichens of Fennoscandia of Taka-Hikiä natural history museum" is distinctive from other "Lichens of Fennoscandia" collections in other organisations.

Collection Type: Is it a museum specimen collection, a living garden collection, monitoring scheme or something else.

Description: The Finnish name and description need to be understandable to the public. The English name and description can be written from a more scientific point of view, f.ex. by using established professional terms, if description by general terms doesn't work.

License for use: With what licenses can the data be used. All data is  open data according to FinBIF's data policy. The license is usually Creative Commons Attribution 4.0. An exception can be made, if an agreement on the use of another license has been made with FinBIF. See  FinBIF's data policy for further information.

Collection quality: Data quality assessment for the collection using three categories. More information: Collection quality rating

Person responsible: The person, who is responsible for the collection and has the power to decide for example on the the collection's accumulation, preservation and information publication. In form "Lastname, Firstname".

Contact email: Email of the contact person. It can be personal or general contact info through which you can reach the contact person.

Optional metadata

Collection code: Use this only if the collection already has an official and established abbreviation. New ones don't need to be made up. In laji.fi, collection code is automatically added to the end of the collection name in brackets, if filled in the metadata.

Taxonomic coverage: Give the lowest possible taxon rank name, e.g. family or order. You can enter e.g. "Biota" as coverage, if the data includes observations or specimens of many kinds of taxa.

Temporal coverage: Give simple, machine readable years in form beginning year - ending year. (e.g.: "1860 - 1910"). If there's no specific information, give an educated guess. Here you don't have to explain that the data is not necessarily very specific. If the collection accumulation is ongoing, leave the ending year open (e.g. "1970 - ").

Geographic coverage: Give a name that most accurately describes the smallest possible area that the collection covers. Preferably use terms that are used in the collection or that are contemporary (e.g. for a collection, which temporal coverage would be "1950 - 1970", the geographic coverage could be"the Soviet Union"). If the collection includes observations/specimens from around the world, the geographic coverage could be "World".

Coverage basis: Give concise definition for the collection, if not apparent from the collection name. For example "Winter birds of Finland".

Methods: Different kinds of standardized methods used when creating this collection, at any stage of the process. E.g. sampling method, census method, instruments, tools, software.

Publisher name (en): Give a name for the organisation or institution publishing the data (used when data is published secondarily outside the primary system). If not filled in for collection or one of its parents, FinBIF will be shown for collection in data sharing (CETAF RDF). (This used to be copyright owner)

Secure level: Determines how tightly the data will be concealed, unless they have been concealed on other grounds such as the list of sensitive species or by concealing single specimens/observations. The levels are compatible with the security levels of sensitive observations. No security level is used by default.

The choice has to be justified in "Basis for concealment or quarantine" field. The basis of concealment has to concern the whole collection. "The data includes some observations of endangered species" is not enough to conceal the entire collection.

Embargo in years: This defines the embargo time in years. During quarantine the data is processed as if it was secured to 100x100 km2 level. Public authorities (viranomaiset) can see the full, unconcealed observations in the public authority portal. Others can make a data request (aineistopyyntö) that the data owner approves or rejects.

The choice has to be justified in "Basis for concealment or embargo" field.

Basis for concealment or embargo: The basis for concealment and/or embargo (e.g. the name of a research project), a basis for possible exceptions to the data policy. Needs to be filled if Secure level or Embargo is used.

Special terms for data use: Free text description for any special terms for data use, for example restrictions for commercial use etc.

Size (approx.): Estimation of the number of specimens, observations or records in the collection, as a number. Explanations can be added to notes field. Used to count Luomus digitisation statistics (digistat.luomus.fi)

% digitised (approx.): Estimation of the proportion of the collection that is digitised/databased in a system or a file. Explanations can be added to notes field.

Location of data and backups:  Here it would be most useful to give the name of the information system that contains the digital data (e.g. Kotka, Vihko). If the data is located on a pc hard drive, this description is only useful if the data can be found based on it. E.g. "table on my floppy disc" isn't useful, but "excel file collection.xls in Karen Curator's file folder in network drive" is already a lot more useful. 

Location of collection: The physical location of the collection specimens. The building or room number should be accurate enough.

Citation recommendation: Recommendation for the kind of citation that ought to be used when citing the data in e.g. a scientific paper. This will be shown to the data users.

Language: language the data in mainly written in.

URL: If there's more information about the collection online, add the address here. (e.g. Winter bird monitoring website, where methods are described and linked to results, or the history of Mannerheim's collection.)

Data Quality description: Free text description for the data quality, reasons and explanations, known errors etc.

Notes about the data: Diary style notes about the data belonging to the collection, for example corrections done, etc. Notice that there is a separate field for notes about this metadata.