From the course: AWS Certified Data Analytics – Specialty (DAS-C01) Cert Prep: 1 Collection

Data ingestion pipelines

- [Presenter] AWS Data Ingestion and Processing Pipelines solve complex problems by using services that are managed for you. A good example is AWS Batch. It takes many complex parts of solving reoccurring jobs and puts them together in pieces. So for example, you have a trigger, and it could be on a timer, or it could be called via code or some other event. But in this particular scenario, let's say it's called at some interval, a new job is then created, and the job is put into the job queue. The job queue could have thousands of jobs inside of there. And they get processed according to how many processes you allow AWS Batch to create for you. So it could be, let's say 200 different containers are launched that are all, let's say, GPUs. And you could do things like fine-tune hugging face models based on some event schedule in your organization. So really AWS Batch is general purpose, but it works very well for machine learning, especially for doing things like GPU fine tuning. A second process would be using AWS step functions. AWS step functions is also a serverless technology, so it's managed for you. And if there's an event that pops in, it could be an API call or it could be a trigger on some data coming somewhere. What would happen is you could pipe together multiple Lambda functions. And those Lambda functions would take a payload, process whatever occurs inside of that payload, and then ultimately, at the very end of the pipeline, it would actually give you your results. And what's really nice about step functions is it has a great amount of orchestration in between so that you can see the logs, you can see how long things took, and so you don't have to build this ecosystem to build things that are orchestrated together. It's built for you. You just have to put the logic together.

Contents