Default screenshot
POC

Poc-opendata-indexing

A poc showing how primary data search would revolutionize searching in open data portals.

Description

Opendata indexing - PoC

This proof of concept has been developed to index the primary data that is available in the
datastore of all datasets that can be gathered via the API of opendata.swiss.

In theory this can also be used to index the datastore of other CKAN-Portals.

Installation

This PoC uses Drifter for provisioning a VM.

Assuming that Vagrant is installed on your machine:

bash
git submodule update --init
vagrant plugin install vagrant-hostmanager
vagrant up

The API endpoint will be available as opendata-indexing.test

Running Solr

A Solr instance should be configured and running as soon as the VM is provisioned.
It is reachable at opendata-indexing.test:8983

Indexing Data

Use the script scripts/index-demo.sh to index some selected datastores.

You can also use the script scripts/index-all.sh which aims to query the package_list-CKAN-API-Endpoint
to get all available datasets of a portal and then tries to index all their primary data that is contained
in the datastore.

API Endpoint

Examples:
- Search for Küsnacht
matching a record of the a resource/package

  • Search for Geburtsort matching the package title in any indexed language: DE, FR, IT, EN.

Example Result

This example shows the first three results when requesting data for Küsnacht

json
{
"items": [
{
"packageName": "1-personenhaushalte-anz",
"packageId": "57366922-1e5b-416c-aeb2-2b4d885262ae",
"packageAuthor": "Statistisches Amt des Kantons Zürich",
"resourceId": "2d924e1c-df2b-4dd0-92a2-0e6854e37544",
"resourceFormat": null,
"packageTitleDe": "1-Personenhaushalte [Anz.]",
"score": 6.0147357
},
{
"packageName": "2-personenhaushalte-anz",
"packageId": "98bfdac8-a34f-40fe-ba40-b5d0a53fa84d",
"packageAuthor": "Statistisches Amt des Kantons Zürich",
"resourceId": "526e34fb-c50e-4cd5-93df-0473df046b9a",
"resourceFormat": null,
"packageTitleDe": "2-Personenhaushalte [Anz.]",
"score": 6.0147357
},
{
"packageName": "3-personenhaushalte-anz",
"packageId": "294ccfcd-4cc0-4f0a-87df-424b10e5bfaf",
"packageAuthor": "Statistisches Amt des Kantons Zürich",
"resourceId": "9553d86c-9cdd-4779-9e17-5ce44b4a6dcf",
"resourceFormat": null,
"packageTitleDe": "3-Personenhaushalte [Anz.]",
"score": 6.0147357
}
]
}

Development

This project uses phive for libraries management (beta), it is installed in the VM by Drifter.

Ensure to run phive install from the root project dir, inside the VM to download all the libraries.

The code is analyzed with both php-cs-fixer and phpstan: run the script ./scripts/php-prereq.sh to check the
current code.

To manually check the PHP CS, run: ./tools/php-cs-fixer fix --dry-run.

Learnings and outcome of this PoC

Though we as developers for the portal see a lot of value in being able to search also in the
primary data, the client's main goal remains to only focus on a search that queries the meta-data.