Layers of the AI Stack, Explained Simply

📕 This is the first in a multi-part series on creating web applications with Generative Ai integration.

of Contents

Introduction

The AI space is a vast and complicated landscape. Matt Turck famously does his Machine Learning, AI, and Data (MAD) landscape every year, and it always seems to get crazier and crazier. Check out the latest one made for 2024.

Overwhelming, to say the least.

However, we can use abstractions to help us make sense of this crazy landscape of ours. The primary one I will be discussing and breaking down in this article is the idea of an AI stack. A stack is just a combination of technologies that are used to build applications. Those of you familiar with web development likely know of the LAMP stack: Linux, Apache, MySQL, PHP. This is the stack that powers WordPress. Using a catchy acronym like LAMP is a good way to help us humans grapple with the complexity of the web application landscape. Those of you in the data field likely have heard of the Modern Data Stack: typically dbt, Snowflake, Fivetran, and Looker (or the Post-Modern Data Stack. IYKYK).

The AI stack is similar, but in this article we will stay a bit more conceptual. I’m not going to specify specific technologies you should be using at each layer of the stack, but instead will simply name the layers, and let you decide where you fit in, as well as what tech you will use to achieve success in that layer.

There are many ways to describe the AI stack. I prefer simplicity; so here is the AI stack in four layers, organized from furthest from the end user (bottom) to closest (top):

Infrastructure Layer (Bottom): The raw physical hardware necessary to train and do inference with AI. Think GPUs, TPUs, cloud services (AWS/Azure/GCP).
Data Layer (Bottom): The data needed to train machine learning models, as well as the databases needed to store all of that data. Think ImageNet, TensorFlow Datasets, Postgres, MongoDB, Pinecone, etc.
Model and Orchestration Layer (Middle): This refers to the actual large language, vision, and reasoning models themselves. Think GPT, Claude, Gemini, or any machine learning model. This also includes the tools developers use to build, deploy, and observe models. Think PyTorch/TensorFlow, Weights & Biases, and LangChain.
Application Layer (Top): The AI-powered applications that are used by customers. Think ChatGPT, GitHub copilot, Notion, Grammarly.

Layers in the AI stack. Image by author.

Many companies dip their toes in several layers. For example, OpenAI has both trained GPT-4o and created the ChatGPT web application. For help with the infrastructure layer they have partnered with Microsoft to use their Azure cloud for on-demand GPUs. As for the data layer, they built web scrapers to help pull in tons of natural language data to feed to their models during training, not without controversy.

The Virtues of the Application Layer

I agree very much with Andrew Ng and many others in the space who say that the application layer of AI is the place to be.

Why is this? Let’s start with the infrastructure layer. This layer is prohibitively expensive to break into unless you have hundreds of millions of dollars of VC cash to burn. The technical complexity of attempting to create your own cloud service or craft a new type of GPU is very high. There is a reason why tech behemoths like Amazon, Google, Nvidia, and Microsoft dominate this layer. Ditto on the foundation model layer. Companies like OpenAI and Anthropic have armies of PhDs to innovate here. In addition, they had to partner with the tech giants to fund model training and hosting. Both of these layers are also rapidly becoming commoditized. This means that one cloud service/model more or less performs like another. They are interchangeable and can be easily replaced. They mostly compete on price, convenience, and brand name.

The data layer is interesting. The advent of generative AI has led to a quite a few companies staking their claim as the most popular vector database, including Pinecone, Weaviate, and Chroma. However, the customer base at this layer is much smaller than at the application layer (there are far less developers than there are people who will use AI applications like ChatGPT). This area is also quickly become commoditized. Swapping Pinecone for Weaviate is not a difficult thing to do, and if for example Weaviate dropped their hosting prices significantly many developers would likely make the switch from another service.

It’s also important to note innovations happening at the database level. Projects such as pgvector and sqlite-vec are taking tried and true databases and making them able to handle vector embeddings. This is an area where I would like to contribute. However, the path to profit is not clear, and thinking about profit here feels a bit icky (I ♥️ open-source!)

That brings us to the application layer. This is where the little guys can notch big wins. The ability to take the latest AI tech innovations and integrate them into web applications is and will continue to be in high demand. The path to profit is clearest when offering products that people love. Applications can either be SaaS offerings or they can be custom-built applications tailored to a company’s particular use case.

Remember that the companies working on the foundation model layer are constantly working to release better, faster, and cheaper models. As an example, if you are using the gpt-4o model in your app, and OpenAI updates the model, you don’t have to do a thing to receive the update. Your app gets a nice bump in performance for nothing. It’s similar to how iPhones get regular updates, except even better, because no installation is required. The streamed chunks coming back from your API provider are just magically better.

If you want to change to a model from a new provider, just change a line or two of code to start getting improved responses (remember, commoditization). Think of the recent DeepSeek moment; what may be frightening for OpenAI is thrilling for application builders.

It is important to note that the application layer is not without its challenges. I’ve noticed quite a bit of hand wringing on social media about SaaS saturation. It can feel difficult to get users to register for an account, let alone pull out a credit card. It can feel as though you need VC funding for marketing blitzes and yet another in-vogue black-on-black marketing website. The app developer also has to be careful not to build something that will quickly be cannibalized by one of the big model providers. Think about how Perplexity initially built their fame by combining the power of LLMs with search capabilities. At the time this was novel; nowadays most popular chat applications have this functionality built-in.

Another hurdle for the application developer is obtaining domain expertise. Domain expertise is a fancy term for knowing about a niche field like law, medicine, automotive, etc. All of the technical skill in the world doesn’t mean much if the developer doesn’t have access to the necessary domain expertise to ensure their product actually helps someone. As a simple example, one can theorize how a document summarizer may help out a legal company, but without actually working closely with a lawyer, any usability remains theoretical. Use your network to become friends with some domain experts; they can help power your apps to success.

An alternative to partnering with a domain expert is building something specifically for yourself. If you enjoy the product, likely others will as well. You can then proceed to dogfood your app and iteratively improve it.

Thick Wrappers

Early applications with gen AI integration were derided as “thin wrappers” around language models. It’s true that taking an LLM and slapping a simple chat interface on it won’t succeed. You are essentially competing with ChatGPT, Claude, etc. in a race to the bottom.

The canonical thin wrapper looks something like:

A chat interface
Basic prompt engineering
A feature that likely will be cannibalized by one of the big model providers soon or can already be done using their apps

An example would be an “AI writing assistant” that just relays prompts to ChatGPT or Claude with basic prompt engineering. Another would be an “AI summarizer tool” that passes a text to an LLM to summarize, with no processing or domain-specific knowledge.

With our experience in developing web apps with AI integration, we at Los Angeles AI Apps have come up with the following criterion for how to avoid creating a thin wrapper application:

If the app can’t best ChatGPT with search by a significant factor, then it’s too thin.

A few things to note here, starting with the idea of a “significant factor”. Even if you are able to exceed ChatGPT’s capability in a particular domain by a small factor, it likely won’t be enough to ensure success. You really need to be a lot better than ChatGPT for people to even consider using the app.

Let me motivate this insight with an example. When I was learning data science, I created a movie recommendation project. It was a great experience, and I learned quite a bit about RAG and web applications.

film search — My old film recommendation app. Good times! Image by author.

Would it be a good production app? No.

No matter what question you ask it, ChatGPT will likely give you a movie recommendation that is comparable. Despite the fact that I was using RAG and pulling in a curated dataset of films, it is unlikely a user will find the responses much more compelling than ChatGPT + search. Since users are familiar with ChatGPT, they would likely stick with it for movie recommendations, even if the responses from my app were 2x or 3x better than ChatGPT (of course, defining “better” is tricky here.)

Let me use another example. One app we had considered building out was a web app for city government websites. These sites are notoriously large and hard to navigate. We thought if we could scrape the contents of the website domain and then use RAG we could craft a chatbot that would effectively answer user queries. It worked fairly well, but ChatGPT with search capabilities is a beast. It oftentimes matched or exceeded the performance of our bot. It would take extensive iteration on the RAG system to get our app to consistently beat ChatGPT + search. Even then, who would want to go to a new domain to get answers to city questions, when ChatGPT + search would yield similar results? Only by selling our services to the city government and having our chatbot integrated into the city website would we get consistent usage.

One way to differentiate yourself is via proprietary data. If there is private data that the model providers are not privy to, then that can be valuable. In this case the value is in the collection of the data, not the innovation of your chat interface or your RAG system. Consider a legal AI startup that provides its models with a large database of legal files that cannot be found on the open web. Perhaps RAG can be done to help the model answer legal questions over those private documents. Can something like this outdo ChatGPT + search? Yes, assuming the legal files cannot be found on Google.

Going even further, I believe the best way have your app stand out is to forego the chat interface entirely. Let me introduce two ideas:

Proactive AI
Overnight AI

The Return of Clippy

I read an excellent article from the Evil Martians that highlights the innovation starting to occur at the application level. They describe how they have forgone a chat interface entirely, and instead are trying something they call proactive AI. Recall Clippy from Microsoft Word. As you were typing out your document, it would butt in with suggestions. These were oftentimes not helpful, and poor Clippy was mocked. With the advent of LLMs, you can imagine making a much more powerful version of Clippy. It wouldn’t wait for a user to ask it a question, but instead could proactively gives users suggestions. This is similar to the coding Copilot that comes with VSCode. It doesn’t wait for the programmer to finish typing, but instead offers suggestions as they code. Done with care, this style of AI can reduce friction and improve user satisfaction.

Of course there are important considerations when creating proactive AI. You don’t want your AI pinging the user so often that they become frustrating. One can also imagine a dystopian future where LLMs are constantly nudging you to buy cheap junk or spend time on some mindless app without your prompting. Of course, machine learning models are already doing this, but putting human language on it can make it even more insidious and annoying. It is imperative that the developer ensures their application is used to benefit the user, not swindle or influence them.

Getting Stuff Done While You Sleep

Another alternative to the chat interface is to use the LLMs offline rather than online. As an example, imagine you wanted to create a newsletter generator. This generator would use an automated scraper to pull in leads from a variety of sources. It would then create articles for leads it deems interesting. Each new issue of your newsletter would be kicked off by a background job, perhaps daily or weekly. The important detail here: there is no chat interface. There is no way for the user to have any input; they just get to enjoy the latest issue of the newsletter. Now we’re really starting to cook!

I call this overnight AI. The key is that the user never interacts with the AI at all. It just produces a summary, an explanation, an analysis etc. overnight while you are sleeping. In the morning, you wake up and get to enjoy the results. There should be no chat interface or suggestions in overnight AI. Of course, it can be very beneficial to have a human-in-the-loop. Imagine that the issue of your newsletter comes to you with proposed articles. You can either accept or reject the stories that go into your newsletter. Perhaps you can build in functionality to edit an article’s title, summary, or cover photo if you don’t like something the AI generated.

Summary

In this article, I covered the basics behind the AI stack. This covered the infrastructure, data, model/orchestration, and application layers. I discussed why I believe the application layer is the best place to work, mainly due to the lack of commoditization, proximity to the end user, and opportunity to build products that benefit from work done in lower layers. We discussed how to prevent your application from being just another thin wrapper, as well as how to use AI in a way that avoids the chat interface entirely.

In part two, I will discuss why the best language to learn if you want to build web applications with AI integration is not Python, but Ruby. I will also break down why the microservices architecture for AI apps may not be the best way to build your apps, despite it being the default that most go with.

🔥 If you’d like a custom web application with generative AI integration, visit losangelesaiapps.com

Layers of the AI Stack, Explained Simply

of Contents

Introduction

The Virtues of the Application Layer

Thick Wrappers

The Return of Clippy

Getting Stuff Done While You Sleep

Summary

Recent Articles

Plotly’s AI Tools Are Redefining Data Science Workflows

Introduction AI & Free AI Courses — Accelerate Your Career | by Anil Guven | Apr, 2025

Attacks on the education sector are surging: How can cyber-defenders respond?

Best streaming deal: Max annual subscriptions are up to 20% off

From Logic to Confusion: MIT Researchers Show How Simple Prompt Tweaks Derail LLM Reasoning

Related Stories

Leave A Reply Cancel reply