Reference
ToRC-Scheduler
The ToRC-Scheduler is responsible for all the core services / containers. The scheduler is a Mesos-Framework and interacts closely with the Mesos master.
Please refer to the different reference sections:
Command-Line Arguments
To get a list of all the possible arguments type
./torc-scheduler --help
Additional
Argument | Explanation | Required |
---|---|---|
--master | IP Address of Mesos Master | yes |
--config | Configuration File to be used | yes |
--myip | IP Address of Scheduler in case IP is different from Mesos Master IP | no |
Cmdline used in our demo setup
./torc-scheduler --master 10.250.3.20 --config /config.yml
Configuration
The ToRC Scheduler is highly configurable through its configuration file. The configuration is formatted as a yaml
file.
The configuration files used in our demo setup can be found in the torc-scripts projects.
The file contains following sub-structures
name:
Key | Value |
---|---|
name | The name of this scheduler / Mesos Framework |
Example:
name: torc-scheduler
nodes:
Configuration of all the nodes managed by this scheduler.
Key | Value |
---|---|
name | The name of the node |
ip | Main IP address of the node. The variable $MASTER_IP can be used as placeholder for the control/master node. |
external_ip | External IP address of the node. This address gets used as endpoint for external routes |
type | Node type can be either master or slave |
Example:
nodes:
- name: wedge
ip: $MASTER_IP
external_ip: $MASTER_IP
type: master
- name: bladerunner1
ip: 10.250.3.21
external_ip: 10.250.3.21
type: slave
dns-addons:
Additional DNS entries can be added to the service.torc
domain.
Key | Value |
---|---|
name | The name of the DNS entry |
ip | The IP for this DNS entry. The variable $MASTER_IP can be used as placeholder for entries that are running on the control/master node |
Example:
dns-addons:
- name: etcd
ip: $MASTER_IP
- name: network-agent
ip: $MASTER_IP
network-agent:
Which network-agent is used to communication with the switching silicon (Broadcom Trident II).
Key | Value |
---|---|
type | Either snaproute or fboss |
connection | The connection argument for the API of the network-agent in form ip:port. By default controller / master run on the same node as the network-agent and therefor $MASTER_IP can be used |
Example:
network-agent:
type: snaproute
connection: $MASTER_IP:8080
statesync:
Interval in which the scheduler checks on the state of the managed services, updates their “last_update” timestamp, and if necessary adds DNS entries in consul and routes in the network-agent.
Key | Value |
---|---|
poll_interval_in_seconds | Interval in secondes |
Example:
statesync:
poll_interval_in_seconds: 10
stateclean:
The garbage-collector in the scheduler constantly checks the internal state for service entries whose “last_update” timestamp is older than the timeout setting, and for those services it will delete their DNS entries and routes.
If the services are of type “system services”, see below, it will initiate a restart after a configured delay.
Key | Value |
---|---|
poll_interval_in_seconds | Interval in secondes for the cleanup thread |
timeout_in_seconds | timeout interval after which a services gets marked as down |
restart_delay_in_seconds | delay between the time a service got marked as “timed-out” and a restart gets initated |
Example:
stateclean:
poll_interval_in_seconds: 18
timeout_in_seconds: 30
restart_delay_in_seconds: 30
services:
The services managed by the scheduler are currently all deployed using Docker containers.
There are two types of services managed by the scheduler. System Services and On-Demand Services.
System Services are the core ToRC services. The scheduler will guarantee that they will always be up and running. If one fails the scheduler will try to restart it. System Services are defined in the healthcheck:system_services:
section of the config file.
On-Demand Services are services that can be started on-demand via the scheduler API. On-Demand Service can be arranged in groups and started as a group. On-Demand Services are defined in the api:service-groups:
section.
Placement of a service can be constraint based on node specific attributes like node_name, node_type, or node_function. Those node specific attributes can be found on each node in /etc/mesos-slave/attributes
.
Placement of a service can be constraint based on resource requirements like size of memory or cpu allocation.
Service Definition:
Key | Value | Required |
---|---|---|
name | Name of the service. Displayed in Mesos as task-name | Yes |
image_name | Name of the Docker image | Yes |
node_name | Container will only be placed on node with the specified name | No |
node_type | Container will only be placed on nodes with matching type | No |
node_function | Container will only be placed on nodes with matching function | No |
memory | Container will only be placed on a node with that amount of free memory | No |
cpu | Container will only be placed on a node that can satisfy the specified cpu allocation | No |
arguments | Arguments passed to the container | No |
parameters | Additional parameters passed to Docker as part of the run command | No |
privileged | If the container requires to be run in privileged mode | No |
sla | Two options are currenty supported. “singleton_each_node” tells the scheduler to start exactly one of this services on each node, includes controller. “singleton_each_slave” starts exactly one of this service on each slave. | No |
is_metered | Metrics of this container will get collected and stored in the time series database | No |
network_type | Type of network adapter to use. “host” specifies that this container will run on the host adapter, typical for system services. “torc” stands for “torc managed” L3 routable network adapter, typical for use-case specific containers. | |
volumes | Lets you define host directories which get mounted inside of a container. See below for required fields for volumes | No |
Example:
DNS Container has to run on controller node with host adapter and expects the bind argument to be set.
- name: dns
image_name: dns
arguments: -bind=$MASTER_IP
node_function: controller
network_type: host
InfluxDB runs on bladerunner3, needs 1GB of memory, and listens on the host adapter.
- name: influxdb
image_name: influxdb
memory: 1024.0
node_name: bladerunner3
network_type: host
volumes:
Volumes define the mount points for Docker containers as part of the service definition.
Key | Value |
---|---|
host_path | Path on the host the container needs access to |
container_path | Mount point inside of the container |
read_only_mode | If “yes”, container has only read access to the host folder, if set to “no”, container can read and write to the host folder |
Example:
Performance CoPilot is our metrics collection agent and needs access to the host folders to be able to collect the different host and container metrics.
- name: pcp
image_name: attinnovate/charmander-pcp
privileged: true
sla: singleton_each_node
volumes:
- host_path: /sys
container_path: /sys
read_only_mode: true
- host_path: /etc/localtime
container_path: /etc/localtime
read_only_mode: true
- host_path: /var/lib/docker
container_path: /var/lib/docker
read_only_mode: true
- host_path: /run
container_path: /run
read_only_mode: false
- host_path: /var/log
container_path: /var/log
read_only_mode: false
- host_path: /dev/log
container_path: /dev/log
read_only_mode: false
parameters: --ipc=host
network_type: host
healthcheck:
Configuration for the healthcheck thread responsible of supervising and running of the system services.
Key | Value |
---|---|
poll_interval_in_seconds | Time in seconds the health check runs and verifies that all the system_services are up and running |
system_services | List of system service, see above |
Example:
healthcheck:
poll_interval_in_seconds: 12
system_services:
- name: dns
image_name: dns
arguments: -bind=$MASTER_IP
node_function: controller
network_type: host
- name: vector
image_name: vector
arguments: -p 9091
node_function: controller
network_type: host
api:service-groups:
Lists of the groups of on-demand services. Those services can be started using the REST API, see below.
Key | Value |
---|---|
name | Name of the service group. Referred to by the service_group start request, see below |
services | One or more service definitions that belong to this group |
Example:
api:
service-groups:
- name: torc-dns-scheduler
services:
- name: torc-dns-scheduler
image_name: torc-dns-scheduler
node_name: bladerunner3
memory: 1024.0
arguments: --master $MASTER_IP --config config.yml
network_type: host
REST API
The ToRC Scheduler offers REST APIs on port 3000. The APIs can be used to request service and node information, to start On-Demand services/service_groups, and to kill On-Demand services.
Ping/Pong
Simple connection check. Response should be ‘pong’.
$ curl http://wedge:3000/admin/ping
List Nodes
List details of all known nodes.
$ curl http://wedge:3000/nodes
List Services
List all services, system services and use-case specific once managed by sub-schedulers.
$ curl http://wedge:3000/services
List Running Services
List all running services and use-case specific services.
$ curl http://wedge:3000/services/running
List Metered Services
List all services and use-case specific services whose metrics get collected and stored in the time series database.
$ curl http://wedge:3000/services/metered
Start Service Group
Starts service(s) configured as part of that names service group.
$ curl http://wedge:3000/start/group?name=torc-dns-scheduler
Kill Service
Kills specified On-Demand service.
$ curl -X "DELETE" http://wedge:3000/service?name=torc-dns-scheduler