Introduction

About 25 years ago, Tim Berners-Lee designed the basics of what we know today as being World Wide Web. The WWW is a system of hypertext documents accessible on the Internet which are linked together via hyperlinks. This allows a user to navigate and access relevant information through links which point to other documents.

If you read this, you are also a user of Word Wide Web. Probably you are using it mostly for entertainment, for keeping contact with your friends and also for gathering information. But this information has to be presented in a concise way for you to understand it, to filter it and to get other relevant information. There’s where open data becomes important.

Accessing and reusing available information

The information on the web can be presented in many forms. Websites have different designs, structures and styles. Sometimes the information isn’t too well organized or maybe you would like to structure it or order it by certain characteristics. When you’re reading a webpage there isn’t too much to do in this case. Things are getting worse when you’re working with a PDF file or an office document file. Of course, it would be great if all this information could be displayed in a structured manner in a custom application where you could manipulate it as you want. But the information has to be somehow processed in order to be displayed in that application. There are several ways to do this:

Manual input of data. The least reliable way, as this takes a lot of time, requires human power which it is prone to mistakes.
Web scrapping. This means parsing a web page, extracting the required data and storing it in a proper way for later organizing. While this is faster than the manual input of data, a little change in the structure of the source webpage could break the parser and stop the data gathering.
Using an Application Program Interface (API). This is the best way of collecting data, as it can be done in a programmatically way, quickly and without worrying about webpage layout changes or copying mistakes.

Another important fact in reusing information from the web is its licensing terms. There are many licenses out there, more or less restrictive. One of the best known open license is Creative Commons licenses which allow creators to indicate which rights they want to reserve and which rights they waive for the benefit of recipients or other creators. This license applies not only to information data, but also to music, pictures or other artistic content.

Data disponibility

There are two primary ways in which information is stored, when it comes to WWW: 

  • on the web: data is accessible through world wide web, but in an “opaque” format, that is without referencing other related resources (e.g. PDF documents, DOC documents, OTD documents, etc.)
  • in the web: data is stored in an open and structured data format, with links and references to other related data or resources and also can be parsed and processed in a platform-agnostic way.
Five star Open Data
Five Start Steps

It's great when you can access information on the web without any restrictions, e.g. paid subscriptions. But sometimes this is not enough. Sometimes you would like to be able to alter that information, to organize it or to find other related information. Given that, we can categorize open data in five categories, by the format and the ease of manipulation available to us:

One Star Data: open-licensed data available in a format which can be viewed, printed, stored or shared, but it cannot be easily processed as it comes into an unstructured format (e.g. raster images or scanned documents).
Two Star Data: data is available in a structured way, but in a format which requires proprietary software to be viewed, edited or parsed. This also means it can be exported in another similar format.
Three Star Data: mostly the same case as the Two Star Data, but it can be viewed, edited or parsed without using a proprietary software. Still no hyperlinks to related data, so we can't make references or queries to it.
Four Star Data: now we're talking about “data in web”. It may contain URIs (Universal Resource Indicator) which are references to other related resources and can be shared on the Web. Parts of data can be also reused. Usually the references are identified via RDF (Resource Descriptive Framework), which is a World Wide Web Consortium standard.
Five Star Data: the information are interconnected. You can discover (more) related data while you are processing the data. Both the consumer and the publisher benefit from the network effect.

Sources of open data

The most sources of open data come from national governments which offer information about institutions, land borders, public procurement, activity reports, etc. It is important for governments to open their data to public for increasing their transparency and accountability. Also it helps developers to create applications which address public and private demands.

DBPedia is also a source of open data which is extracting content from the information created as part of the Wikipedia project. It allows users to query relationships and properties using an SQL-like query language called SPARQL, which is not easily possible just by scrapping Wikipedia's HTML webpages content.

Linking Open Data Diagram
Conclusions

Knowing the importance of open source software it's easy to also understand the importance of open data. For the usual user, open data means easier ways to get information, to find related information and also to replicate and improve that information. Keeping your data in an open format also helps you in easier locating, processing for later use and improving it.

Sources:

ASSIST Software Logo

Share on:

Want to stay on top of everything?

Get updates on industry developments and the software solutions we can now create for a smooth digital transformation.

* I read and understood the ASSIST Software website's terms of use and privacy policy.

Frequently Asked Questions

ASSIST Software Team Members

See the past, present and future of tech through the eyes of an experienced Romanian custom software company. The ASSIST Insider newsletter highlights your path to digital transformation.

* I read and understood the ASSIST Software website's terms of use and privacy policy.

Follow us

© 2024 ASSIST Software. All rights reserved. Designed with love.