The technology behind DataCleaner: Open Refine
Our DataCleaner tool is based on OpenRefine software, to which we have added a number of extensions to work with the Platform.
Open Refine is a Java tool based on an Open Source license (BSD-3) that uses a Microsoft Excel-style web interface to allow you to load data from different sites and in different formats, to understand it, to clean it, to reconcile it and, of course, to improve it.
Firstly, bear in mind that the concept of OpenRefine consists of being able to make the transformations from your own computer, only that, instead of using a client application, you will do it from your browser (although, as always, there are ways to take this concept to the Cloud).
Interested in learning more about OpenRefine? You can find it in its GitHub repository, and find more information in its Wiki. Anyway, let’s talk about it some more.
OpenRefine was initially developed by Metaweb under the name Freebase Gridworks, and was later acquired and evolved by Google under the name Google Refine. In 2012, Google released the code, which became an open source project called Open Refine.
When Google gave the software to the community, well, it really had a late start. To get it into your head, let me show you how it has developed over time:
|2013||– Google Refine 2.5||Latest version with Google branding.|
|2015||– Open Refine 2.6-rc1||Took two years to generate a Release Candidate, from which no final version came out.|
|2017||– Open Refine 2.7 Release|
– Open Refine 2.8 Release
|At least we got one release – well, we actually got two.|
|2018||– Open Refine 3.0 Release|
– Open Refine 3.1 Release
|It has been five years since there was a major release of Open Refine.|
|2019||– Open Refine 3.2 Release|
|2020||– Open Refine 3.3 Release|
The current version is 3.4.1, which was released at the end of September 2020. As you can see in the table, the project has been reactivated since 2018-2019.
This is very interesting and such but, how can I install OpenRefine? Well, that’s very simple, really.
As we have said, OpenRefine is designed to be used in the local computer, so to use i,t you just have to download the distribution for your operating system.
In the releases page, you can find the installers for each type of environment:
Once you have downloaded the software and launched the executable, a browser will open in localhost, pointing to port 3333 (http://127.0.0.1:3333).
Easy, wasn’t it? Well, from here you can start working with your files and rinse them.
One thought on “The technology behind DataCleaner: Open Refine”
Pingback: A Look at the DataRefiner – Onesait Platform Blog