| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

View
 

Data

Page history last edited by Tim Davies 11 years, 2 months ago

The important bit (summary)

 

When you store or share information as information - then it only has a limited set of possible uses. When you store and share the raw data behind information, and you provide people with the meta-data to interpret and make sense of it - then you allow:

 

  • The data to be used in many different ways - generating useful information for people in many different settings;
  • Citizens to perform their own analysis of the data, rather than relying on one single interpretation;

 

(In more detail): Understanding Data

Data > Information > Knowledge > Wisdom (in pyramid)

The Oxford English Dictionary defines data as '“a thing given or granted; something known or assumed as fact” (OED Online 2000).

 

It can be useful to distinguish clearly between data and information. The DIKW hierarchy provides one way of thinking about their relationship. Information is what human minds work with day-to-day. We are limited in how much 'data' we can hold in our heads (how many tables of numbers can you remember?) - but we can process lots of information. Information can provide the answer to "What?", "Why?", and "Where?" types of questions (Ackoff, 1989), but the 'contents' of those answers comes from data. 

 

Data becomes information through the addition of 'meaning' or 'context' to data. Often we need to 'reduce' the data, or to sort, sift and visualise it in order to make it into usable information.  

 

A formal philosophical definition of information (The General Definition of Information) states that there can be "No information without data representation" (Floridi, 2004). The representation of data may take many different forms:

 

  • Writing a summary of the data;
  • Providing summary statistics; 
  • Plotting data points on a map;
  • Drawing graphs; 
  • Providing a search interface for looking at the data;
  • Providing a browse interface onto the data; 

 

Representation of data can be 'static' (only one possible representation is presented, as in a printed/PDF report or map), or 'interactive' (where the user of the information source can customise what data is being represented, such as in an interactive map). (See Improving Visualisation for more exploration of Static and Interactive visualisation). 

 

Types of Data

There are different types of data. Floridi (2004) lists four key distinctions:

 

  • Raw data or primary data - the product of original measurements or data collection. 
  • Meta-data -  a description of the data. For example, how it was collected; how often it is updated; what different codes mean.
  • Derivative data - for example, summaries of columns in a spreadsheet, cross-tabulations of different variables in a primary dataset, a sub-set of a dataset, or a combination of multiple datasets. 
  • Operational data - describing how data is currently being used - for example, service-status data on the database servers, or details of how often the data has been accessed. 

 

The first three are all important for open data initiatives. Releasing raw data without any meta-data makes it harder for people to re-use it. Releasing raw data without letting people create derivative data (i.e. licensing it in ways that prevents re-use) can limit the potential for users to be creative in the ways they turn that data into information. 

 

Raw data

Open data initiatives are generally concerned with promoting access to 'raw data': although frequently local and national government only publish 'derivative data' (i.e. spreadsheets of summary statistics). 

 

There is sometimes a balance to be struck between providing completely raw data (which may contain private information, or allow private information to be discovered - See Ohm 2009) and data which aggregates information to remove records which can identify individual people. 

 

One way of checking whether the data you have is good raw data is to look at it in a table (or spreadsheet) and to look at just the first two rows. The first row will usually give column headers, and the second row should:

 

  • Be about one (and only one) entity - for example, a school, a hospital, a councillor or a local ward
  • Contain different facts about that entity in each column - for example, a column for the postcode, for the number of places / people, and so-on
  • Refer to other entities by common names or codes - for example, when referring to a ward, using the ONS Ward Code. This could be used alongside a human-readable name of the ward, but using the common code makes it easier to link together different raw datasets. 
  • Contain all the facts about that entity that the dataset contains - for example, the date the facts were last updated, the person who updated them and so-on. Often these 'facts' get left in the file-name (.e.g. SchoolsData-LastUpdatedMay2010-BH.xls) - but this important data can get lost when the file gets renamed. 

 

By providing well structured raw data, and licensing the data in re-use friendly ways, you maximise the possible range of ways your data can be turned into useful information for citizens.

 

Extra issues

The second below deals with a few additional issues relating to data.

 

Five stars to open, structured, linked data

This post outlines a progressive approach to publishing data online.

 

Is data neutral?

No. Datasets generally include all sorts of assumptions in the way they are created (that's why good meta-data is important) - and any dataset will involve reducing facts about the world into easy-to-record dataset entries. The way data is reduced down will affect the ways it can be re-used and what sort of information can be generated from it in future.

 

Creating information and creating data both involve value judgements as to what is important. As you collect and share raw datasets think about the sorts of interpretations and information generation they can support. Balancing the practical requirements of creating the dataset with the addition of extra fields and facts which may facilitate different re-uses is an important thing to think about. 

 


References:

 

  • Ackoff, R.L., 1989. From data to wisdom. Journal of Applied Systems Analysis, 16(1), 3–9.  
  • Floridi, L., 2004. Information. In L. Floridi, ed. The Blackwell Guide to the Philosophy of Computing and Information. Wiley-Blackwell.  
  • OED Online, 2000. datum. In Oxford English Dictionary. Oxford University Press. Available at: http://dictionary.oed.com/cgi/entry/50057802/ [Accessed May 22, 2010].   
  • Ohm, P., 2009. Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization. SSRN eLibrary. Available at: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006 [Accessed March 4, 2010].   
  • Rowley, J., 2006. Where is the wisdom that we have lost in knowledge? Journal of Documentation, 62(2), 251–270.  
  • Rowley, J.E., 2007. The wisdom hierarchy: representations of the DIKW hierarchy. Journal of Information Science, 0165551506070706v1.  

 

Page History

Page originally drafted by Tim Davies based on notes for the Open Data Impacts study. As of 6th June 2010 still very much a work in progress. Comments on usefulness of page / relevance / other elements to address very welcome. 

Comments (0)

You don't have permission to comment on this page.