A comprehensive document management and ingestion service built with Flask, SQLAlchemy, and Postgres+pgvector. Features OCR capabilities, semantic search, and AI-powered question answering for academic research and homework preparation.
Knowledge Hub is a sophisticated document management system I developed to streamline my MS in CS coursework at USC. This project addresses the common challenge of managing and searching through vast amounts of academic materials, research papers, and course documents. The system combines modern web technologies with AI capabilities to provide intelligent document processing, semantic search, and question-answering functionality. Built with Flask and PostgreSQL with pgvector extension, it offers both traditional full-text search and advanced vector-based semantic search. Key features include automated OCR processing for PDFs and images, intelligent document chunking, vector embeddings for semantic search, and integration with local LLM (Ollama) for AI-powered question answering. The system is containerized with Docker for easy deployment and includes comprehensive API endpoints for document management. This project demonstrates my expertise in full-stack development, AI/ML integration, database design, and system architecture. It has significantly improved my academic workflow by enabling quick retrieval of relevant information from course materials and research papers.
Upload, store, and organize documents with automatic metadata extraction and categorization
Automatic text extraction from PDFs and images using OpenCV, PyMuPDF, and Tesseract
Vector-based similarity search using pgvector and Sentence-Transformers for intelligent content discovery
RAG-powered Q&A system with local LLM integration for contextual answers with citations
Comprehensive technical documentation covering architecture, implementation details, and system design decisions.