How AWS is building a tech stack for generative AI

Generative artificial intelligence (GenAI) is expected to be a game-changer in the world of business and IT, driving organisations across the Asia-Pacific region to intensify their efforts to harness the transformative potential of this technology.

With the strength of their ecosystems and the symbiotic relationship between cloud computing and GenAI, hyperscalers like Amazon Web Services (AWS), Microsoft and Google are expected to be a dominant force in the market.

In an interview with Computer Weekly, Olivier Klein, chief technologist for Asia-Pacific and Japan at AWS, delves into the technology stack the company has built to ease GenAI adoption while addressing common concerns related to the cost of running GenAI workloads, security, privacy and the support for emerging use cases.

Tell us more about how AWS is helping customers leverage GenAI capabilities.

Klein: First, our vision is to democratise AI, including machine learning and GenAI. Our approach is a little different from others. We believe there won’t be one model that’s going to rule them all and we want to give our customers flexibility and choice of best-in-class models.

With Amazon Bedrock, we not only provide Amazon models like Titan, but also others like Jurassic from A121 Labs, Cohere and Stability AI models. We’re also investing up to $4bn in Anthropic, so we can co-build some things and make their latest and greatest features available on the Bedrock platform.

You’d also get direct integration into our existing data stores, specifically vector databases, allowing you to feed customer and transactional data from Amazon RDS, PostgreSQL and Amazon Aurora databases into your large language models. Then, you can finetune the models through retrieval augmented generation (RAG), where you can feed an initial prompt with additional data from your live database. This will enable you to personalise or finetune an answer on the fly for a customer, for example.

All of that is securely and privately run within your virtual private cloud (VPC) within your environment, so you have full control and ownership of your data and how your models will be retrained, which is important for a lot of our customers.

“Our strategy is to support training that’s generally done in the regions, with some services that allow you to run things at the edge. Some of them might integrate into our IoT or app sync services, depending on the use case”
Olivier Klein, AWS

At the same time, we are continuously looking to make it cost-effective, which goes back to our e-commerce roots of providing choice and flexibility and passing savings to our customers. Besides GenAI models, we also offer a choice of hardware, whether it’s Intel’s Habana Gaudi, the latest Nvidia GPUs or our custom silicon like AWS Trainium, which is 50% more cost-effective than comparable GPU instances. Our second iteration of AWS Inferentia is also 40% more cost-effective than the previous chip.

On top of that, we have use case-specific AI services like Amazon Personalized, Amazon Fraud Detector and Amazon Forecast, giving you access to the same forecasting and fraud detection capabilities that is using. We’ve also announced AWS Supply Chain, for example, that overlays machine learning capabilities over your ERP [enterprise resource planning] system. In the GenAI space, there are things like Amazon CodeWhisperer, an AI coding companion that can be trained on software fragments and artefacts within your environment.

You’d see us venturing out to provide more solutions for specific industries. For example, AWS HealthScribe uses GenAI to help a clinician do clinical documentation faster on the fly with transcripts of patient-clinician conversations. That’s very useful in a telehealth setting, but it also works face-to-face. I envision a future where we’d work with more partners to offer more industry-specific foundation models.

When it comes to open-source models, do you allow customers to bring their own models and train them using their data in Bedrock?

Klein: There’s a variety of things. We provide some of these foundation models and lately, we’ve also added Meta’s Llama, making Bedrock the first fully managed service that provides you with llama. All of these foundation models can also be used in Amazon SageMaker, which lets you bring in and finetune more specific models like those from Hugging Face. With SageMaker, you absolutely have the choice to create a different model that is not based on the foundation models in Bedrock. SageMaker is also capable of serverless inference so you can scale up your service if usage spikes.

More enterprises are running distributed architectures and AI is likely going to follow suit as well. How is AWS supporting use cases where customers might want to do more inferencing at the edge? Can they take advantage of the distributed infrastructure that AWS has built?

Klein: Absolutely. It’s really that continuum that starts with training models in the cloud while inferencing can be done in Local Zones, and possibly in Amazon Outpost, your own datacentre or on your phone. Some of the models we offer in SageMaker Jumpstart, such as Falcon 40B, a 40-billion-parameter model, can be run on a device. Our strategy is to support training that’s generally done in the regions, with some services that allow you to run things at the edge. Some of them might integrate into our IoT [internet-of-things] or app sync services, depending on the use case.

Like Greengrass?

Klein: Yes, Greengrass would be a great way to push out a model. You often need to do pre-processing at the edge which requires some processing power. You wouldn’t quite run the models on a Raspberry Pi, so for additional answers, you’d always need to connect back to the cloud and that’s why Greengrass is a perfect example. We don’t have customers that do that yet, but from a technical point of view, that’s feasible. And I could envision this being more relevant as more LLMs [large language models] make their way into mobile apps.

I’d think many of these use cases would go along with 5G edge deployments?

Klein: You make a really good point. AWS Wavelength would enable you to run things at the edge and leverage the cell towers of telcos. If I’m a software provider with a specific model that runs at the edge within the coverage of a 5G cell tower, then the model can connect back to the cloud with very low latency. So that makes sense. If you look at something like Wavelength, it is after all an Outpost deployment that we offer with our telecommunications partners.

AWS has a rich ecosystem of independent software vendor (ISV) partners such as the likes of Snowflake and Cloudera which have built their services on top of the AWS platform. Those companies are also getting into the GenAI space by positioning data platforms as places where customers can do the training. How do you see the dynamics working out between what AWS is doing versus what some of your partners or even your customers are doing in that space?

Klein: We have great partnerships with Snowflake to Salesforce, whose Einstein GPT is trained on AWS. Salesforce directly integrates with AWS AppFabric, which is a service that connects SaaS [software-as-a-service] partners and together with Bedrock, we can support GenAI with our SaaS partners. Some of our partners make models available, but we also innovate on the underlying level to reduce the cost of training and running the models.

HPE has been positioning its supercomputing infrastructure as being more efficient than hyperscale infrastructure for running GenAI workloads. AWS has high-performance computing (HPC) capabilities as well, so what are your thoughts around HPC or supercomputing resources being more efficient for crunching GenAI workloads?

Klein: I’m glad you brought that up because this is where the devil is always in the details. When you think about HPC, the proximity between nodes matters. The further away they are, the more time I lose when the nodes talk to each other. We address that in the way we design our AWS infrastructure through things like AWS Nitro, which is designed for security reasons and to offload hypervisor capabilities to speed up communications on your network plane.

“One of the common questions is how you fine-tune and customise models and inject data on the fly. Do existing models have the flexibility to bring in your data securely and privately, and, with a click of a button, integrate with the Aurora database?”
Olivier Klein, AWS

There’s also AWS ParallelCluster, a service that checks all the boxes on Amazon EC2 features to create a cluster that allows you to have low-latency internode communication through EC2 placement groups. What it means is that we ensure that the physical locations of these virtual machines are close to each other. Generally, you’d rather have them further apart for availability, but in an HPC scenario, you want them to be as close as possible.

One thing that I would add is that you still get the benefit of flexibility and scale, and the pay-as-you-go model which I think is game-changing for training workloads. And, if you think about LLMs, which need to be stored in memory, the closer you can get memory next to compute, the better. You might have seen some of the announcements on Amazon Redis and ElastiCache and how Redis embeds into Bedrock, giving you a large and scalable cache where your LLM can be stored and executed.

So, not only do you get scalability, but you also have the flexibility of offloading things into the cache. For training purposes, you’d want to run the model as close to as many nodes as possible, but once you have your model trained, you need to host that somewhere in memory which you’d want to be flexible because you don’t want to sit on a massive permanent cluster just to make a few queries.

It’s still early days for many organisations when it comes to GenAI. What are some of the key conversations you’re having with customers?

Klein: There are a few common themes. First, we always design our services in a secure and private manner to address customer concerns about whether it’s their model or whether their data be used for retraining.  

One of the common questions is how you finetune and customise models and inject data on the fly. Do existing models have the flexibility to bring in your data securely and privately, and, with a click of a button, integrate with the Aurora database? From a business perspective, where we think GenAI will be most relevant.

There’s that customer experience angle. With Agents for Bedrock, you’re able to execute predefined tasks through your LLM, so if a conversation with a customer goes in a certain way, you could trigger a workflow and change his customer profile, for example. Under the hood, there’s an AWS Lambda function that gets executed, but you can define it based on a conversation driven by your LLM.

There are also a lot of questions about how to integrate GenAI into existing systems. They don’t want to have a GenAI bot on the side and then have their agents copy and paste answers. A good example of where we see this today is in call centres, where our customers are transcribing conversations and feeding them into their Bedrock LLM and then rendering possible answers for the agent to pick from.

Source link

Learn More →