Receipt Indexer

I made a receipt indexer using Python and the Gemma-3-4b.

To get better at keeping track of my spending habits, I've created an easily accessible web based tool in Python and HTML that allows me to reference receipts the entirety of my purchase history, all running locally aside from the preference to use Google Sheets for record keeping.

Function

This app allows a user to take a photo, upload or paste the image of a receipt directly to the main site. Once uploaded, the image is received by the Python backend using Flask, assigned a UUID and stored. Asynchronously, the image is processed by a GPU node in my internal network where Gemma-3-4b is running. As the prompt is prepended with instructions on processing the image, the LLM returns the receipt in a JSON format, which is ingested into a Google Sheet via the Google Sheets Python API.

============================================

receiptIndexerMainPage

============================================

receiptExample

============================================

indexedExample

============================================

Closing Notes

As you can see from the images above, I uploaded a screenshot of a receipt from a nearby store. The LLM was able to parse the Store Name, Date, Time, Items, Quantity, and Total price with no issues, adding it to the Google Sheet for logging.

My favorite part of this technology is the versatility of the OCR capability of the Gemma model I was running on. Portions of the image could be blurred, stretched, or taken from any angle, yet the OCR model still was able to decipher what was on the receipt and format it as I asked.

I'm also really excited by the fact that this all takes place over Tailscale, running fully through my own subnet routers, using my DNS servers for resolution, with the GPU node hosted on a completely separate box from the VPS this application is running on.

My next planned upgrade to this project will most likely be integrating a USB scanner for receipts for automatic scanning and indexing, as opposed to requiring the user to take a photo.