From On‑Device OCR to LLMs: Lessons from a 2019 Medical NLP Project | by Miglanishubham | Mar, 2025

Back in 2019, I worked on a side project to extract medicine names from text on labels using on-device AI. The idea was simple: snap a picture, extract the text, and find the medicine name — all without needing the cloud. But the execution? Not so simple. Devices were slower, models weren’t as advanced, and every improvement took effort.

Fast forward to 2025, and things have changed dramatically. With modern tools like transformers, large language models (LLMs), and faster devices, the same task is easier, faster, and more accurate. This post reflects on what I built in 2019, the deep technical journey through OCR and NER, and how I’d approach it today with everything we have.

If you’re into AI, mobile apps, or just curious about how fast things evolve, this one’s for you.

In 2019, running AI on a phone was still pretty new. I wanted everything to work offline — no cloud calls, just fast answers on-device. The task was two-fold:

OCR (Optical Character Recognition): Convert a photo into text.
NER (Named Entity Recognition): Find the medicine name in that text.

Sounds simple, right? But devices were limited, fonts were weird, and data was messy. Most of the text wasn’t medicine-related, so models often missed the significant bits. Accuracy was misleading — a model that labeled everything as “not medicine” looked great on paper but failed at the real goal.

So I built a pipeline. I used Tesseract OCR, tuned it heavily, trained it on synthetic data, and tested every preprocessing trick I could find. Then I tried multiple NER models — LSTMs and random forests and eventually settled on a simple memory-based approach (dictionary lookup). It wasn’t perfect, but it worked.

Here’s a quick look at what’s now possible:

Modern OCR: Tools like TrOCR and Donut use transformers to read text directly from images. They’re smarter, handle layout better, and need less image tweaking.
LLMs for Text Extraction: You can now prompt models like GPT-4 or Claude to identify medicine names from noisy OCR output — even in zero-shot scenarios.
Improved NER: With models like BioBERT and ClinicalBERT, you don’t need a huge dataset. These models already know medical terms and perform well with minimal fine-tuning.
On-device Performance: Phones today have neural engines. You can run medium-sized models locally in real-time. Or go hybrid — do some work locally and call the cloud when needed.

This project taught me more than just technical tricks — it taught me to work around constraints, think critically about evaluation metrics, and experiment fast.

If I were to do it again today, I’d use transformer-based OCR out of the box, pair it with BioBERT or an instruction-tuned LLM for NER, and let a hybrid on-device/cloud setup handle the rest. The tools are mature, the models are more innovative, and the time-to-prototype is much faster.

But one thing hasn’t changed: building a solid pipeline requires care. Whether choosing the right model or cleaning up OCR output, thoughtful engineering makes the difference.

If you’re working on something similar today, start with the best tools, but always test deeply and stay pragmatic. Sometimes, a lookup list still wins.

📌 TL;DR: Back in 2019, I built an on-device OCR + NER pipeline for medicine names. It was slow and clunky, but a great learning experience. Today, transformers and LLMs make that same problem easier and more accurate.

From On‑Device OCR to LLMs: Lessons from a 2019 Medical NLP Project | by Miglanishubham | Mar, 2025

Recent Articles

Google rolls out Gemini 2.5 Flash preview on April 17

Budget-Aware Fashion Matching With Gemini | by Arwa Awad | Apr, 2025

NVIDIA Introduces CLIMB: A Framework for Iterative Data Mixture Optimization in Language Model Pretraining

AI can read minds now, and is your co-host a clone? • Graham Cluley

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

Related Stories

Leave A Reply Cancel reply