Screenshot
POC

Swissinfo.ch search and topic clustering

Crawl the swissinfo page, and make the results available as a REST api.

Demo

Description

What is it?

This proof of concept has been developed to crawl and index the English-language articles available on the swissinfo.ch website.

The PoC exposes a set of APIs to search for articles, and also to analyze which topics are most common among all indexed pages (= clustering).

Additional API endpoints demonstrate some strategies for search auto-completion and misspelling suggestions - both quite common features of search interfaces.

Link to DEMO or 30s VIDEO of the project

Online demo

Benefit

  • For users: Gives them a feeling of what kind of content they can look for on the site (through auto-complete and spell-check corrections, but also through the automated overview of most common topics.
  • For swissinfo.ch: Better search leads to happier users and less dropping out. The topic clustering analysis could also help revise the website's navigation and tags to better reflect actual content.

Effort

About 10 person-days

Data

  • swissinfo.ch pages crawled via their API (primary pages only)

Technologies used:

  • Symfony: Web APIs and crawling
  • Solr: Search engine and indexer
  • RabbitMQ: queue processing for crawling requests
  • VueJS: API frontend and UI
  • Platform.sh: online hosting for the demo

Vision

Which things could be build with this POC if you had more time:
- Integrate more third-party clustering algorithms
- Integrate Solr Semantic Knowledge Graph analysis