Large language model topic modelling

Extract topics and summarise outputs using Large Language Models (LLMs, Gemma 3 4b/GPT-OSS 20b if local (see tools/config.py to modify), Gemini 2.5, or Bedrock models e.g. (Claude 3 Haiku/Claude Sonnet 3.7). The app will query the LLM with batches of responses to produce summary tables, which are then compared iteratively to output a table with the general topics, subtopics, topic sentiment, and relevant text rows related to them. The prompts are designed for topic modelling public consultations, but they can be adapted to different contexts (see the LLM settings tab to modify).

Instructions on use can be found in the README.md file. Try it out with this dummy development consultation dataset, which you can also try with zero-shot topics. Try also this dummy case notes dataset.

You can use an AWS Bedrock model (paid), or Gemini (a free API for Flash). The use of Gemini requires an API key. To set up your own Gemini API key, go here.

NOTE: Large language models are not 100% accurate and may produce biased or harmful outputs. All outputs from this app absolutely need to be checked by a human to check for harmful outputs, hallucinations, and accuracy.

Choose a tabular data file (xlsx, csv, parquet) of open text to extract topics from.

LLM model
Select the open text column of interest. In an Excel file, this shows columns across all sheets.
Select the open text column to group by
Force responses into zero shot topics
Ask the model to assign responses to only a single topic
Ask the model to produce structured summaries using the zero shot topics as headers rather than extract topics
Choose sentiment categories to split responses

Language model response will appear here