-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][Docs][Discussion] Provider consistent guidance on resource Request and Limits #744
Comments
I think we can not assume we only run one Ray cluster in K8s cluster. So different size clusters will fill the k8s clusters. |
I've amended that bullet slightly. It's of course subject to the constraints of the user's Kubernetes environment, but generally speaking, it does not make sense to pack many tiny Ray pods into a single K8s node. |
A common setup (and one we've used at Anyscale in the past) is to set up cluster autoscaling such that you get a new K8s node of the appropriate type each time you request a Ray pod. The Ray pod and supporting machinery then fill up the entire K8s node. |
Hey @DmitriGekhtman for number 2:
Is there a way to configure the autoscaler to use STRICT_SPREAD? https://docs.ray.io/en/latest/ray-core/scheduling/placement-group.html?highlight=placement Seems that is is only possible from the SDK and not when configuring the cluster? https://github.com/ray-project/kuberay/blob/3aebd8c9f5ae5d9d9d12489d5636d3cf1b97548e/ray-operator/config/samples/ray-cluster.autoscaler.large.yaml |
STRICT_SPREAD helps to spread Ray tasks across different Ray pods. To spread Ray pods across different Kubernetes nodes, I think the thing to look into would be pod anti-affinity. |
I noticed that |
Mostly because the Ray core APIs originally targeted running Ray nodes directly on VM and bare metal, rather than running Ray nodes as Kubernetes pods. It would require substantive changes to allow Ray node CPU capacity to be non-integer. On the other hand, with Ray custom resources, you can express whatever resource accounting semantics you want. |
Search before asking
Description
Documentation on resource request and limits should be made clearer and more prominent, perhaps repeated in several places in the docs.
Best practices are laid out here:
https://discuss.ray.io/t/questions-for-configurations-using-helm-chart/7762?u=dmitri
https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html#resources
Subtleties:
We currently make the following recommendations:
Some issues with these recommendations.
Use case
Less user confusion when figuring out resources for Ray on K8s.
Related issues
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: