Category Archives: openrefine

On the way to conciliator 3.0

When I started writing conciliator, I focused solely on implementing the main URL endpoint for reconciliation, since that’s what I needed at the time. There are actually several APIs that work together in OpenRefine: the Preview, Suggest, and recently added Data Extension APIs provide functionality that complements Reconciliation. And there may be more in the future as OpenRefine continues to evolve.

My code didn’t extend very easily, so I’ve rewritten a ton of stuff. This is currently in the master branch and it’s running on http://refine.codefork.com. I will release 3.0 probably sometime in January.

Most users don’t need to do anything. Out of the box, it should work as the previous version did. If you modified the conciliator.properties file, you should look at the changes in that file.

Here are the changes under the hood so far:

– Spring controllers and components are now used more effectively for better separation of concerns, less manual plumbing, better extensibility and maintainability.
– The classes representing data in/out for the various APIs are more fully fleshed out.
– The conciliator.properties file allows for less configurability than before, but I don’t know how useful that ever was, really.
– Tests have been rewritten for more “real world” coverage.
– Custom cache code has been replaced with Ehcache.
– Requires Java 8.

With this scaffolding in place, I can start actually implementing Data Extension for specific data sources. I plan to start with OpenLibrary as it provides the richest data.

Stay tuned.

Thanks to everyone who contributed bug reports and suggestions! I’m amazed and gratified that this software gets used as much as it does.

Announcing conciliator

I’ve just created a github repository for conciliator, a growing collection of OpenRefine reconciliation services, as well as a Java framework for creating them.

conciliator is a major refactoring of my refine_viaf project and supercedes it. This new project cleanly separates the VIAF-specific parts and the more “boilerplate” pieces needed for any OpenRefine reconciliation service. The result is a framework that allows you to easily write new reconciliation services. My intent here is to make some existing code way more flexible, so that it might be useful to more users and have a longer lifespan.

http://refine.codefork.com has already been running conciliator for a week now; if you’ve been using it, you don’t need to make any changes in OpenRefine.

Currently, conciliator out-of-the-box can query VIAF exactly like refine_viaf does, down to the same URLs. Additionally, conciliator can now query ORCID names. This was a somewhat arbitrary choice; I’ve been doing some ORCID integration at work so it was convenient for me to implement a data source for it as a proof of concept.

With VIAF and ORCID, conciliator acts as an intermediate or “bridge” service, but it would be possible to use conciliator to query other types of data sources as well: files, SQL databases, etc. Right now, you’d have to write your own code to read and parse files, open database connections, etc. But in the future, I hope to add support for these options to make them easier to implement.

For details on how to write your own service in Java using conciliator, see the README.

Are there data sources you’d like to see available as a reconciliation service? Leave a comment to this post. No promises, but I’ll at least consider all requests. And if you write your own service for a data source, please consider submitting your code as a pull request so that others can use it too!