POC search and topic clustering

Crawl the swissinfo page, and make the results available as a REST api.



What is it?

This proof of concept has been developed to crawl and index the English-language articles available on the website.

The PoC exposes a set of APIs to search for articles, and also to analyze which topics are most common among all indexed pages (= clustering).

Additional API endpoints demonstrate some strategies for search auto-completion and misspelling suggestions - both quite common features of search interfaces.



  • For users: Gives them a feeling of what kind of content they can look for on the site (through auto-complete and spell-check corrections, but also through the automated overview of most common topics.
  • For Better search leads to happier users and less dropping out. The topic clustering analysis could also help revise the website's navigation and tags to better reflect actual content.


About 10 person-days


  • pages crawled via their API (primary pages only)

Technologies used:

  • Symfony: Web APIs and crawling
  • Solr: Search engine and indexer
  • RabbitMQ: queue processing for crawling requests
  • VueJS: API frontend and UI
  • online hosting for the demo


What can be done beyond the prototype phase:
- Integrate more third-party clustering algorithms
- Integrate Solr Semantic Knowledge Graph analysis