trainML Models enable you to store an immutable version of model code and its artifacts for reuse in other jobs. Models can be populated by saving notebooks, running training jobs, or even copying from external sources.
Effective production inference can only be possible when the model version can be relied on. When model code and parameters are stored on servers, network file systems, or cloud object storage, knowing what model version is in production at any given time is a challenge. In these setups, reverting to a previous model version is even more difficult and error-prone.
With trainML models, you can be sure that every inference task using a given model is always using the exact same code and parameters. Models are effectively a snapshot of the model code, saved parameters, and any other artifacts produced by the training process that are necessary for the inference process. Once created, they can never be changed, only new versions can be created.
As you publish new model versions, enabling them in production is as simple as starting the next job with the new model ID. Reverting is just as simple, just use the previous known good model's ID to start the next job.
Running GPU-enabled servers is a lot of work. Syncing data, model code, and artifacts across many of them is even more. This leads many companies to run training and inference on the same systems. In addition to the model versioning issues this setup has, it also adds a training/inference resource trade-off to an already challenging GPU utilization and contention problem.
Using trainML models, the resources for training is completely decoupled from resources for inference. You can train a model with a single 4 GPU instance, and then run thousands of 1 GPU inference jobs on that trained model without any additional setup. There is no limit to the number of simultaneous jobs using the same model.
Many model management and ML pipeline tools exist, and many of them add significant value in certain areas. Often, however, this comes at the cost of flexibility. Many of them are "opinionated" as to how your code must be written, what interfaces and methods must be implemented, and how input and output data can be formatted and specified. If you choose to use different frameworks for different aspects of the pipeline, their conflicting paradigms can lead to significant overhead to translate data and business logic between them.
trainML Models do not impose such requirements. Use any framework you want for any portion of your process. All trainML requires is the command to run the model. Models can be populated from a variety of external sources, including your local computer. Training jobs results can be automatically saved as a trainML model effortlessly.
Even though trainML models are immutable, making adjustments to models is also very easy. Just start a Notebook with the model you wish to modify. Once it starts, edit the code as desired and save the notebook as a new model. Once you are satisfied, use the new model version for future inference jobs.