Read our latest Blogs on AI, Data Science and Machine Learning

Simplify your research by Multiple document querying and summarization

‍

This article consists of a brief overview of how query-focused multi-document summarization (QFMS) works, along with a small demo using Python. In the world of the web, humans can face the challenge of redundant information when going through multiple web pages and textual documents. The evolution of Natural Language Processing (NLP) and computer hardware has assisted us with efficient summarization and querying, saving us a lot of time and solving the issue of redundant information.

‍

‍

Summarization can be mainly categorized into general summarization and query-focused summarization. In the context of general summarizing, the content of the documents affects the significance of the summary, In contrast, the query-based summarizing chooses what data is essential and whether it should be included in the output summary or not.

‍

Relevancy, diversity, and redundancy are the three main bottlenecks of the QFMS. The user’s query ought to deliver a summary, for which text from our query is taken to measure distance metrics with a high number of sentences to capture the relevant sentences, they should check semantic distance or connections with sentences to introduce diversity and answer all aspects of the user’s question and finally the summary should avoid being repetitive/redundant.

‍

Figure: General architecture of QFMS

‍

The demo of our QFMS consists of Python libraries, which have functions that mimic the above architecture. The libraries or packages are from the popular OpenAI repository. The brief overall functioning of our demo is as given below:

‍

Initialize the query engine, which is nothing but our OpenAI LLM Model, along with the appropriate prompt for more relevant conversation/summary (For our case, it is challenges of small business owners)
Read three documents from three different data sources using web scraping for a blog, PDF reader for a PDF, and converting audio to text for a YouTube video
Store the contents of the three documents in an appropriate format so our query engine can answer the user’s query taken through an input
All the queries and the output can be stored in a required format, in our demo it has been stored in a JSON format.

‍

Lotus Labs is currently working on multiple such NLP use cases utilizing LLM’s. Connect with us to explore more use cases and give shape to your business's transformative ideas.

‍

Querying and summarizing text using LLMs

Blog Posts