trainML Inference Jobs allow you to run new data through trained models and deliver the results back without any concern for managing, scaling, or descaling server clusters.
Inference activity is usually initiated by customers rather than your team, making the utilization versus contention of GPU-enabled servers even more difficult to manage. Over-provisioning to meet peak demand is very costly and leads to waste, under-provisioning leads to long wait times and a poor customer experience.
trainML Inference Jobs allow you to service any number of inference tasks in parallel without the cost of peak load provisioning. Inference jobs have the same cost per execution hour whether you're running one or one hundred at a time. Like Training Jobs, they automatically stop when the inference task finishes so you don't have to worry about instance management.
Most of the data that needs inference applied to it isn't generated by the GPU-enabled server itself. As such, an important step in the inference processes is getting the data to the GPU-enabled server. Running inference directly on object storage or network file systems can introduce severe performance bottlenecks, wasting expensive GPU-time.
With trainML inference jobs, all you have to provide is the file path of the input data and the file path where you want the inference results saved. trainML does the rest. As part of the inference job provisioning process, the input data will automatically be copied to the high-performance local NVMe SSD storage on the GPU-enabled server. Once inference is complete, the outputs will automatically be copied to the destination you specified. The input data will automatically be purged from the trainML platform, so you don't have to worry about additional storage charges or create persistent datasets for one-time-use data.
Inference can only be considered productionalized when it's fully automated. If engineers have to manually wrangle models and data or provision instances, the consumers of the model's output will eventually be disappointed due to delays or errors.
With the trainML Python SDK and CLI, you can programmatically invoke an inference job as soon as new data arrives. Whether you're using Lambda triggers on an S3 bucket, a Linux cron job or file listener, or a sophisticated workflow engine like Airflow, spawning trainML inference jobs can be done with just a few lines of code.