Large language model topic modelling
Extract topics and summarise outputs using Large Language Models (LLMs, Gemma 3 4b/GPT-OSS 20b if local (see tools/config.py to modify), Gemini, Azure, or AWS Bedrock models (e.g. Claude, Nova models). The app will query the LLM with batches of responses to produce summary tables, which are then compared iteratively to output a table with the general topics, subtopics, topic sentiment, and a topic summary. Instructions on use can be found in the README.md file. You can try out examples by clicking on one of the example datasets under 'Test with an example dataset' at the bottom of the page, which will show you example outputs from a local model run. API keys for AWS, Azure, and Gemini services can be entered on the settings page (note that Gemini has a free public API).
NOTE: Large language models are not 100% accurate and may produce biased or harmful outputs. All outputs from this app absolutely need to be checked by a human to check for harmful outputs, hallucinations, and accuracy.
Choose a tabular data file (xlsx, csv, parquet) of open text to extract topics from.
Test with an example dataset
Load in previously completed Extract Topics output files ('reference_table', and 'unique_topics' files) to modify topics, deduplicate topics, or summarise the outputs. If you want pivot table outputs, please load in the original data file along with the selected open text column on the first tab before deduplicating or summarising.
Summarised table will appear here
Create an overall summary from an existing topic summary table.
Define settings that affect large language model output.
Querying Bedrock models with API keys requires a role with IAM permissions for the bedrock:InvokeModel action.