Introduction
Ray is a Python framework that allows for the parallel and distributed execution of Python applications. It is primarily designed for use in Machine Learning and AI applications, but it can also be used in any Python application that requires distributed computing.
In the previous blog post Ray Cluster, we saw how to deploy a Ray cluster on a single machine with Docker (Docker Compose) and briefly explained the different parts of the framework: Ray Data, Ray Train, Ray Tune, Ray Serve, and how the latter, Ray Serve, can be used to serve a Hugging Face model that translates texts from English to French.
In this post, we will see how we have deployed a Ray cluster on our production machines and how we have started using it to train and serve the machine learning models that our products need.
Cluster Nodes

A Ray cluster consists of a head node and one or more worker nodes.
The head node is the master node of the cluster; it is responsible for coordinating the cluster and serving the web monitoring interface.
The worker nodes are the work nodes and are responsible for executing the Python tasks to be processed.
The head node also acts as a worker node but is mainly characterized by having (singleton) processes responsible for managing the cluster:
- GCS: Global Control Service, which centralizes the cluster’s metadata, manages the associated nodes, and the directory of running processes.
- Autoscaler: This is the process that reacts when tasks are launched and checks if the resources they demand exceed the cluster’s current capacity. It increases the number of worker nodes if they do, or stops nodes that are not needed if they have been “inactive” for the configured amount of time.
- Raylet: This is the process responsible for executing tasks and managing objects and actors on a worker node.
Deploying the Taniwa Ray Cluster (On-Premise):
Configuration
Ray clusters can be deployed on any infrastructure: in the clouds of AWS, GCP, Azure, on Kubernetes, On-Premise, etc.
We have deployed a Ray cluster on two of our On-Premise machines using the framework’s cluster-launcher.
The cluster-launcher consists of a series of commands capable of interpreting the cluster configuration defined in a YAML file where the number of nodes, machine type, number of CPUs, number of GPUs, etc., are specified.
In the On-Premise case, the configuration file template can be downloaded from here: On-Premise Template
Let’s explain the main parts of the file in our cluster (ray_cluster.yaml):

