AI Classification in Big Databases: How We Achieved 99%+ Accuracy

Transcript: How to Do Accurate Product Classification in Large Databases Using AI

How do you do classification in big databases with AI?

There are a few challenges here. When you try to classify data using AI, the process becomes autonomous, consistent, and scalable. But once you're dealing with thousands—or even hundreds of thousands—of records, things can become unreliable.

The same AI logic applied to a small data set might work perfectly, but applied to a massive one, the results can vary. It may become unreliable, time-consuming, and expensive, especially with very large datasets. In fact, AI models can get very expensive at scale.

We had a use case from one of our customers: they needed to assign the correct HS codes for a huge list of products. They were importing hundreds of thousands of items, and part of the process involved generating product descriptions, images, and assigning the correct HS code.

For those unfamiliar, HS codes are used to categorize imported products for customs and tax reporting. When you import a product into the EU, you must assign it to the right category from a list of roughly 50,000 HS codes, organized into chapters and subheaders.

So, asking AI to pick the right code from 50,000 entries is technically possible — but doing it in one go with a large context window is expensive and not very efficient.

To solve this, we built a progressive search process inside Make.com.

First, we tried vector search: we loaded the HS codes into a vector database and asked the AI to match product descriptions to codes. This approach is generally known as RAG (Retrieval Augmented Generation).

However, even with vector search, the results weren't consistent enough. The dataset was simply too large. So we moved to a step-by-step process:

  1. First, we asked the AI to generate a search term based on the product.
  2. Then we ran that search progressively through the vector database.
  3. Each step would return a refined set of results.
  4. We repeated this 3–4 times.

With just four iterations, we reached about 95% accuracy.

But we wanted even better.

So, we expanded the logic:

  • After each vector search, instead of picking a single result, we asked the AI to return three top matching categories.
  • The AI reasoned through those options and narrowed them down using internal logic.
  • We applied filters progressively through each round of search to refine accuracy.

As a result, we reached 99%+ accuracy in classifying products to the correct HS codes.

Pro Tip:

When using a vector database, always choose one that supports metadata filtering. This allows you to limit records by conditions like region, product type, or user access level. In our case, we used Pinecone, but you could also use PGVector with Supabase or another vector DB of your choice.

Bonus Insight: AI Module Mapping

One of the newer features in this flow is called module mapping.

Let’s say you’re sending info to an AI model — for example, we had a model that searched the internet for product-related data. With module mapping, you can automatically pass the output of that module into the next step, with no extra configuration.

In our setup, module 22 outputs a result, and that result is mapped into the next module automatically, keeping the entire automation clean and streamlined.

Share this post