file_data_state
master_topic_df_state
master_unique_topics_df_state
master_reference_df_state
master_modify_unique_topics_df_state
master_modify_reference_df_state
summary_reference_table_sample_state
master_reference_df_revised_summaries_state
master_unique_topics_df_revised_summaries_state
Large language model topic modelling
Extract topics and summarise outputs using Large Language Models (LLMs, Gemma 2B instruct if local, Gemini Flash/Pro, or Claude 3 through AWS Bedrock if running on AWS). The app will query the LLM with batches of responses to produce summary tables, which are then compared iteratively to output a table with the general topics, subtopics, topic sentiment, and relevant text rows related to them. The prompts are designed for topic modelling public consultations, but they can be adapted to different contexts (see the LLM settings tab to modify).
Instructions on use can be found in the README.md file. Try it out with this dummy development consultation dataset, which you can also try with zero-shot topics, or this dummy case notes dataset.
You can use an AWS Bedrock model (Claude 3, paid), or Gemini (a free API, but with strict limits for the Pro model). Due to the strict API limits for the best model (Pro 1.5), the use of Gemini requires an API key. To set up your own Gemini API key, go here.
NOTE: that API calls to Gemini are not considered secure, so please only submit redacted, non-sensitive tabular files to this source. Also, large language models are not 100% accurate and may produce biased or harmful outputs. All outputs from this app absolutely need to be checked by a human to check for harmful outputs, hallucinations, and accuracy.
Choose a tabular data file (xlsx or csv) of open text to extract topics from.
Language model response will appear here
Please give feedback
Load in previously completed Extract Topics output files ('reference_table', and 'unique_topics' files) to modify topics, deduplicate topics, or summarise the outputs. If you want pivot table outputs, please load in the original data file along with the selected open text column on the first tab before deduplicating or summarising.
1 | 2 | 3 | 4 |
---|---|---|---|
Summarised table will appear here
Load in output files from a previous topic extraction process and continue topic extraction with new data.
View a 'unique_topic_table' csv file in markdown format.
Define settings that affect large language model output.