Deploying GPT4 in the enterprise: How I created a NetSuite/Slack support bot
See how I attempted to revolutionize our Tier 1 NetSuite support using ChatGPT.
Like many others, we at DEPT® have been working hard to learn how to leverage LLM's like ChatGPT in a corporate setting. One example of this is our initiative to use ChatGPT-4 as our Tier-1 NetSuite support.
In this post, I'll share some insights on both the personal and technical sides of my journey to deploying ChatGPT in the enterprise.
ChatGPT appears on stage
Somewhere towards the end of December, I was in the midst of doing the Advent of Code challenges. This is an annual coding competition, and I thoroughly enjoyed working my way through a bunch of the problems. I was challenging myself to use languages I’d never used before, such as Python (I know, right!?) or libraries I hadn’t used in a long time like Ramda.js.
It was at this time that ChatGPT suddenly appeared on the world stage.
I’m not sure what I expected to happen when I took one of my Ramda.js implementations and asked ChatGPT to explain what the code did, but I was certainly surprised when it gave me back a flawless breakdown of the code in natural language - especially because I’ve always found Ramda.js hard to read. When I then asked it to translate the code into Python, and the result was perfect except for a single ‘if-statement’ that it missed, I was truly impressed. Finally, when I asked it about the if-statement it apologized for the oversight in its now familiar tone and proceeded to fix the code. By that time I knew I would not see my friends and family for while.
Experimentation
Equal parts excited and terrified of this new technology, I started playing around with the API and the chat interface. I’ve spent many nights and weekends in those months coding up all kinds of interesting things. A personal favorite was a bot that would take a question from a user about the AdventureWorks database. I came up with this design where ChatGPT was put in between the user and an SQL server as is shown here:
One cool feature of this design was that ChatGPT could independently decide how many queries it would send to the server before answering the user. This also allowed it to correct any errors it may create when formulating queries. All in all, I was pretty proud of my design, so you can imagine my disappointment when I found out that there were many people before me that came up with similar ideas that were executed much better, such as the langchain framework.
Along with the conclusion that I had in fact not created The Next Big Thing, I also learned from this that autonomous agents are currently just too unreliable to be very useful in a real world setting. The essence of the issue is that these models seem to have e.g. a 95% success rate of doing something right, which means that if you start chaining x actions together the success rate starts coming down quickly (success % = .95^x).
Hype cycle
I had clearly arrived on the downhill slope of my personal GPT hype cycle, so I decided that I’d focus my efforts on something a little more productive. After all, I had invested all this time expecting that something useful would probably come out at the other end. I needed something that would be more exciting than plain old ChatGPT, that was more realistic than a fully autonomous data analyst and that would address the main obstacles I saw for using ChatGPT in a business setting:
- We’re not comfortable sharing any PII or otherwise sensitive data
- Even with ChatGPT 4 hallucinations are still an issue
- It has no knowledge of our company, vocabulary, processes, etc.
It turns out that Langchain offers one concept that addresses all three of these downsides, and that is its question answering solution. I decided to go ahead and give that one a go.
ChatGPT in the enterprise
I decided to (attempt to) revolutionize our Tier 1 NetSuite support. Users would be able to ask any question and the bot would answer based on our extensive internal NetSuite documentation. In case the documentation did not contain an answer, the bot would point users to the Tier-2 service desk. This seemed like a solution that struck the perfect balance – it was innovative without overreaching, and practical in addressing real-world needs. Also, our internal documentation is just that - internal - it is not so sensitive that the risk of a data breach is prohibitive for this use case.
To bring this idea to life, the first step was to integrate ChatGPT into Slack. This ensured that the bot was easily accessible since Slack is already widely used for internal communication.
Moreover, the NetSuite support bot was placed in a slack channel where our NetSuite admins could monitor its responses. This was crucial for quality control, as it allowed the admins to step in and correct the bot if necessary.
The Nitty Gritty
To start off, you’ll need all the documentation you want to use for Q&A in a single directory. This allows you to use the Langchain directory loader that will recursively go through the folder and parse all files (optionally filtered to a specific set of file extensions). It splits each file into chunks of e.g. 4000 characters that will fit in the ChatGPT context window, and then send them to an embeddings API that creates an embedding for them (a mathematical representation of the chunk). Once that’s done, the chunk is saved in an in-memory vector database, indexed by it’s embedding. This all happens in two lines of python code ❤️.
loader = DirectoryLoader('./netsuite_docs/', glob="**/*.html", loader_cls=UnstructuredHTMLLoader)
index = VectorstoreIndexCreator().from_loaders([loader])Once setting up the vector database with the documentation was done, the system was ready for question answering. The process begins when a user poses a question. This question is then sent to the Embeddings API, which generates an embedding. The system uses this embedding to query the Vector Database, which has the chunks of documentation along with their respective embeddings. The Vector Database returns the chunks of documentation that are most relevant to the question, based on the similarity of their embeddings. These relevant chunks, along with the original question, are then passed to ChatGPT. ChatGPT reads the documentation chunks and answers the question based on that information. This is another line of code.
response = index.query("How do I modify a memorized journal?")Finally, the system delivers this answer back to the user in slack, completing the question-answering process. The slack integration needed much more code, so I won’t share those here (thankfully chatgpt4 was able to write that). I do think it’s worth pondering that using this brand new AI technology is easier than creating a slack bot. All in all, the process now looks like the one visualized below.
Every Rose Has Its Thorn
The fact that a working (CLI) prototype can be created with Langchain in less than 10 lines code simply amazing. However, this of course also comes with some downsides.
First of all, because it’s so easy to get it to work, initially it wasn’t at all obvious to me how it worked. As a matter of fact, it came as a bit of a surprise when I found out that I’d been using an ‘embeddings API’ from OpenAI that I was actually paying for using my API key, which I thought was only needed for the ChatGPT API (don’t worry though, the embeddings API is so cheap it may as well have been free).
Second, because everything is abstracted away, the basic implementation is also fairly generic. For example, it turns out that the standard template used by Langchain for question anwering is designed to work with pretty much any LLM, which means that it has a very elaborate few-shot examples in it’s prompts which ChatGPT3.5 and 4 don’t really need. I ended up writing my own prompt to save tokens and avoid confusing the models with irrelevant example conversations.
Looking back, I’d still recommend using Langchain. It’s super effective for rapid prototyping and exploring the possibilities of ChatGPT API’s. However, I think I will probably implement a future production version without the framework. In the end of the day it is mostly the OpenAI API’s - rather than the Langchain framework - that allow for these applications to be so powerful with so little effort. This became only more true with the recent launch of the functions feature.
Enter: The User
Once we started a closed beta with a selected group of users, I learned that my testing effort was a bit of a farce and the bot wasn’t performing very well. I hadn’t done a a very good at coming up with questions that real users would be asking. It turns out that when you read the docs and then try to come with a questions about them, the questions will be worded in ways ‘similar’ to the docs (even if you try not to do that). This bias then helps both the similarity search retrieving snippets and the chatbot that has to answer the question. I’m fairly sure others will face this same issue since the core competency of LLM’s is that they can parse unpredictable inputs. And if you can’t predict the inputs, it’s difficult to test the system properly.
Shortly after the initial deploy we switched from ChatGPT 3.5 to 4 which made a huge difference and it was much better able more unexpected inputs. I would therefore definitely recommend to follow Andrej Karpathy’s advice on deploying LLM’s, which is to first roll out with the best technology you have (GPT 4) and then scale back - rather than other way around. In a world with many LLM skeptics, it’s valuable to make a good first impression.
After this initial hiccup, the reception was overwhelmingly positive. Not only did the NetSuite / Slack support bot reduce the response time for queries dramatically, it also allowed our NetSuite admins to focus on more complex issues.
Conclusion
At the start of this year I set out to build something cool with ChatGPT. While experimenting I learned that ChatGPT and other LLM’s show enormous potential, but there are also many things to keep in mind when trying to deploy generative AI responsibly. From avoiding hallucinations to managing costs and ensuring proper testing, these systems come with new challenges. It turned out more difficult than expected to create something that starts delivering value in the real world, based on todays state of the technology.
That being said, I’m very happy with the end-result. Perhaps in 12 months this will all be hilariously outdated, but it can’t hurt to try to keep up with the current state of tech. And whatever may come next, I had a lot of fun tinkering along during these first few months of the generative AI revolution, and if nothing else, that to me is a comforting thought.