Happy Employees == Happy ClientsCAREERS AT DEPT®
DEPT® Engineering BlogAI

Building an NLP Image: Amazon AMI for NVIDIA NeMo

NeMo is a toolkit provided by NVIDIA that allows you to use modules (reusable components) for a variety of AI tasks. Components like encoders, decoders, loss functions and more.

TLDR; I built a NeMo AWS image (AMI) for my NLP work. It is simple and sparse on purpose. Feel free to use it by itself or as a base image to build on top of:

What is NeMo?

NeMo is a toolkit provided by NVIDIA that allows you to use modules (reusable components) for a variety of AI tasks. Components like encoders, decoders, loss functions and more.

From NVIDIA

NVIDIA NeMo, part of the NVIDIA AI platform, is a framework for building, training, and fine-tuning GPU-accelerated speech and natural language understanding (NLU) models with a simple Python interface. Using NeMo, developers can create new model architectures and train them using mixed- precision compute on Tensor Cores in NVIDIA GPUs through easy-to-use application programming interfaces (APIs).

I chose it, specifically, because I wanted to fine-tune pre-trained NLP models for a client. This is possible with other toolkits, for example HuggingFace, but I wanted to take advantage of the massive pre-trained models that NVIDIA makes available.

Why create an AMI?

AWS machine images are called AMIs. You can use existing AMIs for general use cases. For example, a clean install of Ubuntu Linux. You can also create custom AMIs for private use or publish it for public use.

NVIDIA actually already has a NeMo AMI image available. I tried using it but I was getting dependency conflicts on certain libraries that I was trying to install. I also just like to understand how the underlying system is built when I’m working on something so I decided to build my own AMI.

Building an AMI

Instance type and size

I chose the p3.2xlarge instance to build on. NeMo needs to run on NVIDIA GPUs so there are a limited number of instances you can choose from. The p3.2xlarge type is reasonably powerful and I want to fine-tune some pretty large models so I figured this would be a good compromise between performance and cost.

A warning, do not just spin this machine up and let it run. They are expensive. Around $3/hr at the cheapest.

I also bumped the root device size to 100GB to accommodate all of the packages and libraries that need to be built for NeMo.

Prerequisites

NeMo requires quite a few packages to run. The full list is in the code but I’ll go through some of the bigger ones.

NVIDIA Drivers

The p3.2xlarge uses V100 Tensor Core GPUs. A quick check with ubuntu-drivers devices shows me that the nvidia-driver-525 package is the recommended install.

CUDA

The entire point of using the NVIDIA chipset for the highly parallel performance boost delivered by CUDA. CUDA allows programmers to use GPUs to run highly concurrent code. It’s the foundation for much of the modern AI tools ecosystem and definitely required for NeMo. I had using NeMo with the latest version (CUDA 12, right now) so this image uses CUDA 11.7.

Python

Most of the AI universe runs on Python. Ubuntu ships with Python 3.7 so it was already available but I wanted something a little more modern. I installed Python 3.10.9 on this image. It’s installed using pyenv under the ubuntu user so it’s immediately available as you log in.

Changing the primary Python version makes me squeamish since so many operating system processes and tools rely on the “factory installed” Python. I use pyenv to shim another version into the shell. This keeps the system happy and lets me use whatever I want.

NeMo

The whole point of this exercise is to install NeMo, so that’s the final step. I installed it straight from the main branch. I only pull the NLP packages though so if you want to use it for something else, you’ll want to pip install the full toolkit.

Creating the AMI

I’ve used Hashicorp’s Packer to create the AMI. I like Packer for a bunch of reasons. I like that I can basically do a history of commands I used to build the original image and arrange them into scripts. I also like that it looks similar to Terraform, which I’m already familiar with. But the biggest, and probably most embarrassing, the reason is that I don’t like working with Docker.

I know Docker is amazing and it’s the backbone of so much of today’s modern infrastructure. But it’s also another abstraction and abstractions introduce unknowns. AMIs are also abstractions, but one I’m more familiar with. So it boils down to knowing my own areas of ignorance and planning around them. And, hey, if you love Docker, Packer builds Docker images too.

My primary philosophy with this image is to get NeMo’s NPL library and that’s it. Feel free to use this as a base image to build a more featureful set of tools. Adding Jupyter is probably a good start and I might do that soon, but not in this image. I want to keep this simple because it was hard enough to get the versions dependencies correct.

The AMI name is dept-nemo-image-{{timestamp}} so it’s easy to pick up the latest build. You can see how in my Terraform code.

Launching an Instance from the AMI

I have included Terraform code to build a simple VPC and launch the instance into that VPC. I allow SSH ingress and egress to anywhere. This setup is pretty simplistic so I imagine anyone wanting to use this in a more complex environment will need to make changes. Again, I don’t think I’ll include much more complex code in this repo though. It’s meant to be simple to launch and understand.

You’ll have to have Terraform set up to run in your AWS account, but that’s beyond the scope of this article. Luckily, Terraform already has a good set of instructions for getting started here: https://developer.hashicorp.com/terraform/tutorials/aws-get-started

Connecting and Using

I only have one variable right now and that is a keypair name and key grouping. It allows you to add your key to the machine under the ubuntu user.

To test it out, I used this Notebook as a guide: https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/Question_Answering.ipynb

Something to keep in mind, the first time pytorch_lightning and some of the NLP models are imported will take a while. I haven’t figured out exactly what they’re doing, but I suspect they’re dynamically caching some information. After the first run, however, they’re fast to import.

Conclusion

I started this little project to help me understand how the different software pieces work together so that, in the future, I can easily make changes if I find that my setup doesn’t support what I’m trying to do. I don’t know enough about how to fine-tune LLMs yet to know the size of the machine I’ll need, additional supporting libraries, or even other pieces of AWS infrastructure.

My hope is that, by using Terraform and Packer with AWS, I give myself enough flexibility to quickly change my underlying build and then, when I’ve gotten it right, lock it in quickly so that I can replicate it.

I hope you find it useful too.