From On‑Device OCR to LLMs: Lessons from a 2019 Medical NLP Project | by Miglanishubham | Mar, 2025


Back in 2019, I worked on a side project to extract medicine names from text on labels using on-device AI. The idea was simple: snap a picture, extract the text, and find the medicine name — all without needing the cloud. But the execution? Not so simple. Devices were slower, models weren’t as advanced, and every improvement took effort.

Fast forward to 2025, and things have changed dramatically. With modern tools like transformers, large language models (LLMs), and faster devices, the same task is easier, faster, and more accurate. This post reflects on what I built in 2019, the deep technical journey through OCR and NER, and how I’d approach it today with everything we have.

If you’re into AI, mobile apps, or just curious about how fast things evolve, this one’s for you.

In 2019, running AI on a phone was still pretty new. I wanted everything to work offline — no cloud calls, just fast answers on-device. The task was two-fold:

  • OCR (Optical Character Recognition): Convert a photo into text.
  • NER (Named Entity Recognition): Find the medicine name in that text.

Sounds simple, right? But devices were limited, fonts were weird, and data was messy. Most of the text wasn’t medicine-related, so models often missed the significant bits. Accuracy was misleading — a model that labeled everything as “not medicine” looked great on paper but failed at the real goal.

So I built a pipeline. I used Tesseract OCR, tuned it heavily, trained it on synthetic data, and tested every preprocessing trick I could find. Then I tried multiple NER models — LSTMs and random forests and eventually settled on a simple memory-based approach (dictionary lookup). It wasn’t perfect, but it worked.

Here’s a quick look at what’s now possible:

  • Modern OCR: Tools like TrOCR and Donut use transformers to read text directly from images. They’re smarter, handle layout better, and need less image tweaking.
  • LLMs for Text Extraction: You can now prompt models like GPT-4 or Claude to identify medicine names from noisy OCR output — even in zero-shot scenarios.
  • Improved NER: With models like BioBERT and ClinicalBERT, you don’t need a huge dataset. These models already know medical terms and perform well with minimal fine-tuning.
  • On-device Performance: Phones today have neural engines. You can run medium-sized models locally in real-time. Or go hybrid — do some work locally and call the cloud when needed.

This project taught me more than just technical tricks — it taught me to work around constraints, think critically about evaluation metrics, and experiment fast.

If I were to do it again today, I’d use transformer-based OCR out of the box, pair it with BioBERT or an instruction-tuned LLM for NER, and let a hybrid on-device/cloud setup handle the rest. The tools are mature, the models are more innovative, and the time-to-prototype is much faster.

But one thing hasn’t changed: building a solid pipeline requires care. Whether choosing the right model or cleaning up OCR output, thoughtful engineering makes the difference.

If you’re working on something similar today, start with the best tools, but always test deeply and stay pragmatic. Sometimes, a lookup list still wins.

đź“Ś TL;DR: Back in 2019, I built an on-device OCR + NER pipeline for medicine names. It was slow and clunky, but a great learning experience. Today, transformers and LLMs make that same problem easier and more accurate.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here