How can ChatGPT "know" all the records in a resource database?

Neil McKechnie • January 3, 2024

Converting your records into a format that AI can understand is the key

Large language models (LLMs) like GPT4 have been trained on many billions of documents on the Internet and other sources. That means it's possible that older versions of your resource database records are somewhere deep in its training data, assuming that you published your database on the web in the past.

But if you've tried asking ChatGPT questions as if you are a help seeker, and were expecting to be well-navigated to your most relevant resource records, you've probably discovered it does not perform well. Even if it did happen to find some decent referrals, the information like phone numbers and hours of operation could be months or years out of date.

That may be relieving to know that it's not as good as your talented staff and well-designed public website. But does that mean that while the whole world benefits from all the amazing things we can do now in ChatGPT, we in I&R are stuck in the "pre-ChatGPT" days for the foreseeable future?

Absolutely not! I&R can benefit now from intelligently applying AI to all the great data we have.

You don't have to give your data to an AI vendor to benefit from AI

That raises a conundrum.

For AI to do better with I&R work, it might appear that the LLM has to directly ingest and learn your current I&R data, like your resource database, now and on an ongoing basis.

But most I&R organizations are reluctant to hand over their resource databases to just anyone, especially to a third party like an AI vendor, who won't agree to limit how they use your data to serve just you. (Yanzio will make such an agreement with you, by the way. Your data is your data.)

And what are the chances that an AI vendor like OpenAI, Microsoft, Google or Anthropic would put this work very high on their list for any I&R who might approach them, while they are feverishly working on massive other opportunities these new technologies are unlocking?

Fortunately there is a way to use AI in close conjunction with your current resource data, without the direct involvement of an AI vendor. You just have to understand how to connect the two.

A brief, nerdy digression on AI: math and vectors

(Skip to the next section if the words "math" and "vector" may as well be a foreign language to you.)

Deep inside an LLM, it represents chunks of text as a "vector". If you took an upper-level math class in high school, you may remember what a vector is: an arrow pointed in a certain direction, with its length representing how big or strong the quantity is. Essentially it is a mathematical representation of a concept or physical phenomenon (like wind or gravity).

A simple example might be on a weather map, with a vector representing the direction wind is blowing and how strong it is. That would be a two-dimensional vector since it is on a flat map, like a piece of paper with width and length. A three-dimensional vector would also incorporate height, so our weather map now could also tell us if the wind was blowing upwards or downwards, too.

An LLM takes a sentence like this one, and converts it into a vector - but instead of two or three dimensions, an LLM like GPT uses 1,536 dimensions that can represent the subtle underlying meaning of the text, mathematically. It needs that many so it can differentiate sentences like "My favorite pet is a cat" from "George Washington was the first president of the United States".

In total, GPT4 has 175 billion parameters which are the elements of these vectors. That's a huge amount. All those vectors represent all the sentiments of all the data that the LLM has ingested and trained upon.

Speaking the language of AI: vectors

Instead of handing a resource database over to an AI vendor, raising the conundrum a couple of paragraphs ago, we flip matters around.

We convert your resource database into its vector representations. It's a highly technical process but all we're doing is making your resource database directly understandable by AI. We can do this on an ongoing basis so that it's always pretty current.

We keep those vectors in a safe place like a special database, called a "vector database", so that it does not have to be directly ingested into the LLM owned by an AI vendor. It's under your control in a private place of your choosing.

Doing this let's us have ChatGPT-style interactions with your resource data, allowing people you authorize to ask powerful questions like:

"What food banks are open on Wednesday afternoons that serve veterans and are free?" or,
"Show me all the summer day camps for teens that can be reached with low-cost public transportation and offer swimming with a lifeguard."

Whenever we want to ask these kinds of questions about your resource data, we do the following:

Convert the question into a vector (or a set of vectors) by using tools available from the LLM.
Search your private vector database for resource data that are highly similar to the question vectors. "Highly similar" is actually a geometric calculation done comparing vectors by their angles and lengths.
Take those "highly similar" results, choose the best ones, and interpret them back into their original text-based resource records.
As an optional final step, feed those results into the LLM and ask it to do some quality checking and summarization.

You retain control of your data

One of the main benefits of this approach is that you retain control of your resource database records, as well as any tools or user interfaces that make these capabilities available to your staff, partners or the general public. That means you also get to refine and improve how it works over time and keep the data as current as you like. And you have full access to the usage of the tools and data, which is vital for understanding the needs and demographics of the people you serve.

Does this really work? How do I get started?

I'm seeing very promising results using this process with real I&R resource databases given to me recently by 2-1-1's specifically for this experimentation. Using it passively or directly in the referral-making process is an obvious place to start.

But there are so many other ways to use this too, such as in training, reporting and assessing service gaps in communities.

If this excites you, please reach out to me to learn more.

< Older Post

Newer Post >