---
post_type: "project"
title: Knowledge Recognition
blurb: "An early system that let a collection of books and files understand itself and be searched by idea, not keyword."
chapter: "HCI Research · Suslib · 2018–2020"
org: suslib
year: [2018]
tags: ["AI", "semantic-search", "ML", "knowledge systems"]
stack: ["YOLOv3", "Mask R-CNN", "Tesseract OCR", "spaCy", "LDA", "Word2Vec", "Elasticsearch", "TensorFlow", "PyTorch", "REST API", "Streaming API"]
link: https://suslib.com/core/knowledge-recognition
---

Knowledge Recognition was the first big experiment at [Suslib](https://suslib.com). The question was whether a collection of books, images, or files could understand itself — stop being a flat archive and become something you could search by idea and explore by context.

This was 2018, before transformer models or ready-made semantic search APIs, so we built the pieces ourselves: YOLOv3 and Mask R-CNN for object detection, Tesseract for OCR, spaCy for entities, LDA for topics, TextRank for summaries. For meaning we trained Word2Vec embeddings and put them in Elasticsearch so a collection could be searched by idea rather than keyword. I led the team that built it — backend, ML, and MLOps engineers — driving the product architecture and the way the parts fit together, while also doing model training and pipeline work myself. It shipped as a REST and streaming API that sat between a CMS and a database, light enough to run our own AR inventory app and still useful to outside developers.

KR is what put Suslib on the map: it got us into Dutch Design Week and helped secure funding. It was also early work on the thing I still do — making software understand context, so people can reach knowledge without translating it into keywords first.

`with` [Martijn de Heer](https://suslib.com), [Homayoun Moradi](https://suslib.com)