Using LLM to Compress Columns

Tim Burns
May 28
1 min read

Building categories and segments is a good task for an LLM. As I am looking for interesting trends around forecasting in the Iowa Liquor Store data, I find that the categories and item descriptions leave much to be desired.

item_description	category_name
MALIBU COCONUT RUM	FLAVORED RUM
SMIRNOFF STRAWBERRY	AMERICAN FLAVORED VODKA
PARAMOUNT WHITE RUM PET	WHITE RUM

The LLM's task is to examine the item description and category and classify the item into segments.

https://github.com/timowlmtn/azrius-core/blob/main/python/src/core/LLMColumnCompressor.py

I have a code to update the item_dim table, augment the data by adding segment columns, and reduce the item_description and category_name.

Query the original raw data and look for rows not added or rows with null values. As our LLM gets better, we will get better matches.
Create a prompt to the LLM to classify the data into segments, using the selected columns.
Merge the new segment created from the LLM back into a dataframe
Merge the dataframe back into the ITEM_DIM table so we have a new column to augment.

This is a good way to densify query data in the ITEM_DIM table for analysis. After combining the data, we can drop sparse columns and create more meaningful ones.

Using LLM to Compress Columns

Recent Posts

Comments