Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
345 changes: 345 additions & 0 deletions src/launch_manager_daemon/docs/concepts/configuration.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general: please use rst in top ./doc folder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want an extra rst file in the doc folder, that will includes this md file?
Or to translate this md file into a rst file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep doc in doc folder so doc-as-code works, so this file shall be rst too.

Original file line number Diff line number Diff line change
@@ -0,0 +1,345 @@
# Launch Manager Daemon Configuration Concept

## Version

The version of the configuration schema should be included in every configuration file.
When loading the configuration file at runtime, the launch_manager will check that it matches the supported schema version.

## Component vs deployment configuration

A component (also referred to as a software component) is an independent software unit - such as an application or a software container - that can be developed separately and subsequently deployed into a target system (for example machine). Components are usually combined into a functioning system during the deployment, or integration, phase.
Because a single component may be deployed into multiple systems, the deployment configuration is intentionally separated from the component properties. In summary:
* The **deployment configuration** defines information specific to a particular system setup.
* The **component properties** describe the fundamental characteristics of the software component, which are typically determined during that component’s development phase.

For illustration, please refer to the following example.

```json
"components": {
"setup_filesystem_sh": {
"component_properties": {
"is_native_application": true,
"is_supervised": false,
"alive_supervision": {
"reporting_cycle": 0.5,
"failed_cycles_tolerance": 2,
"min_indications": 1,
"max_indications": 3
},
"is_self_terminating": true
},
"deployment_config": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion it would make sense if - as intermediate solution - we also add here the expectation of alive notifications for the launch manager.

Until we have the new runtime registration mechanism implemented both in the launch_manager and in the health_monitoring_library we need this information from the configuration to keep the current functionality working. Without this information, launch_manager will not know the expected frequency of alive notifications.

Once the runtime registration mechanism is implemented, we can remove it from the configuration again.

"bin_dir" : "/opt/scripts",
"process_arguments": ["-a", "-b"],
"scheduling_policy": "SCHED_OTHER",
"scheduling_priority": "0",
"uid" : 1000,
"gid" : 1000,
"supplementary_group_ids": [500, 600, 700]
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we combine deployment configuration with component description in single JSON file, it's not reusable so that separation is rather only for json handling (this contradicts the above explanations).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I think the rough workflow that was discussed early January was that the app developer will somehow provide the component-related config + maybe partially the deployment config.

However, since this workflow is currently rather unclear and splitup into multiple files also raised more open points. Therefore, in the last discussion we concluded that as a first step everything shall to into a single file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then maybe even in a single json the object should be separated so later splitting it to multiple files does not change schema really.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pawelrutkaq we were talking internally about this separation as well, but as Nicolas said, a lot of things is unclear.

Just for clarification, are you talking about something like this?

{
    "component_configs": {
        "test_app1": {
            "is_native_application": false,
            "is_supervised": true,
            "is_self_terminating": false,
            "is_state_manager": false
        }
    },
    "deployment_configs": {
        "test_app1": {
            "startup_timeout": 0.5,
            "shutdown_timeout": 0.5,
            "uid" : 1000,
            "gid" : 1000,
            "supplementary_group_ids": [500, 600, 700],
            "scheduling_policy": "SCHED_OTHER",
            "scheduling_priority": "0"
        }
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

}
```

## Usage of Defaults

We should have a mechanism to define default values to avoid repetition of common properties.

### Proposal for first version: Default component and default deployment_config definition

In the `defaults` section of file, the user can define default values for component and deployment_config properties.
All keys have the identical names as used for the configuration of a concrete component / deployment_config.

```json
"defaults": {
"deployment_config": {
"bin_dir" : "/opt",
"startup_timeout": 0.5,
"shutdown_timeout": 0.5,
"uid" : 1000,
"gid" : 1000,
"supplementary_group_ids": [500, 600, 700],
"environmental_variables": {
"LD_LIBRARY_PATH": "/opt/lib",
"EMPTY_ENV_VAR": ""
},
"scheduling_policy": "SCHED_OTHER",
"scheduling_priority": "0",
...
},
"component_properties": {
"is_native_application": false,
"is_supervised": true,
"alive_supervision": {
"reporting_cycle": 0.5,
"failed_cycles_tolerance": 2,
"min_indications": 1,
"max_indications": 3
},
"is_self_terminating": false,
"is_state_manager": false,
"depends_on": [],
"process_arguments": [],
},
"run_target": {
"transition_timeout": 2
}
}
```

Any setting not defined as part of a component or deployment_config definition will automatically use the default value.
Overwriting Behavior:
* Primitive values (like `uid`) simply get overwritten if configured
* Dictionaries (like `environmental_variables`) get merged. Keys can be overwritten by configuring the same keys.
* Lists (like `supplementary_group_ids`) get overwritten. In the future we may add further properties like `supplementary_group_ids_extend` to add to the default list instead of overwriting - if this use case gets necessary.

### Proposal for future extension: Defining different component and deployment_config templates

As per our discussion there may be a use case to have different kinds of components and different kinds of deployment configurations with their own defaults.
The approach above can be extended in this direction, by defining multiple deployment configurations and multiple components in the "defaults" section.
Component and deployment config definition could then refer to one of the predefined alternatives.

## Dependencies between components

Each dependency is configured as a json object, as we foresee that further properties might be added to the dependency in the future (i.e. support for kinds of dependencies).

Initially (when mapping the new configuration to the existing software), the component states are identical to the process state:
* `Running`: The process finished initialization (or is a self-terminating processes)
* `Terminated`: The process terminated

```json
"components": {
"test_app2": {
"component_properties": {
"depends_on": {
"test_app4": {
"required_state": "Running"
}
}
}
}
}
```

## Supported component properties

Currently following component properties are supported:
* `binary_name`: Relative path, describing location of the executable file inside `bin_dir` folder. Full path to the executable will be equal to `{bin_dir}/{binary_name}`. Valid examples of `binary_name` include, but are not limited to:
* `test_app1`
* `bin/test_app1`
* `is_native_application`: Property describing if component is using S-CORE platform functionality (true / false).
* `is_supervised`: Property describing if component sends alive notifications to Launch Manager (true / false).
* `alive_supervision`: Description of alive monitoring, that should be applied to this component.
* `reporting_cycle`: Property describing the duration of time period, used for evaluation of alive supervision.
* `failed_cycles_tolerance`: Property describing number of reporting cycles (see `reporting_cycle`) that could fail in a row, before recovery action will be triggered.
* `min_indications`: Property describing minimal number of checkpoints, that shall be reported in a given `reporting_cycle`.
* `max_indications`: Property describing maximal number of checkpoints, that can be reported in a given `reporting_cycle`.
* `is_self_terminating`: Property describing if component intends to terminate on its own after startup. Please note that daemons usually stay running and waits for termination request from Launch Manager (true / false).
* `is_state_manager`: Property describing if component needs access to the API that changes current Run Target (true / false).
* `depends_on`: List of components on which this component depends.
* `process_arguments`: List of arguments, also know as command line arguments, that should be passed to this component.

## Supported deployment configuration elements

Currently following configuration elements are supported:
* `startup_timeout`: Period of time in which, `running` notification should be reported from the component to the Launch Manager. This timeout is measured from the moment the child process is created, till notification is received by LM.
* `shutdown_timeout`: Period of time in which component should terminate, after receiving SIGTERM signal from Launch Manager. This timeout is measured from the moment LM sends SIGTERM, till Operating System (OS) notifies LM that child process finished execution.
* `uid`: POSIX User ID assigned to this component.
* `gid`: POSIX Group ID assigned to this component.
* `supplementary_group_ids`: List of supplementary group IDs, that are assigned to this component.
* `security_policy`: Name of the security policy assigned to this component.
* `environmental_variables`: List of environmental variables that should be passed to this component.
* `scheduling_policy`: Scheduling policy that should be applied to the first thread of this component.
* `scheduling_priority`: Scheduling priority that should be applied to the first thread of this component.
* `bin_dir`: Path to a folder where component has bin installed.
* `working_directory`: Working directory that should be set for this component.
* `restarts_during_startup`: If component does not send `running` notification in time, see `startup_timeout` configuration for more details, Launch Manager will consider this a startup failure. This setting specify how many more times LM should try to start this component, before giving up and considering Run Target transition to fail. If `restarts_during_startup` is set to 0 (zero), no restart attempts will be performed and Run Target transition will fail immediately.
* `resource_limits`: Resource limitations that should be applied to this component.
* `memory_usage`: Maximum memory (in bytes) that can be consumed by this component during runtime.

## Assignment to RunTargets

For every dependency, the required state of the component is implicitly "Running".

```json
"run_targets": {
"run_target_01": {
"description": "Example description of a Run Target number 01.",
"includes": {
"components": ["test_app1"]
}
}
...
}
```

## Timeout Values

All time periods will be configured in seconds. If a smaller time period is needed, then a fraction numbers should be used. For example we should use `0.001` for one millisecond timeout value.

## Configuration Example

![Example Graph](../images/example_graph.png)

```json
{
"schema_version": 1,
"defaults": {
"deployment_config": {
"startup_timeout": 0.5,
"shutdown_timeout": 0.5,
"uid" : 1000,
"gid" : 1000,
"supplementary_group_ids": [500, 600, 700],
"security_policy": "",
"environmental_variables": {
"LD_LIBRARY_PATH": "/opt/lib",
"GLOBAL_ENV_VAR": "abc",
"EMPTY_GLOBAL_ENV_VAR": ""
},
"scheduling_policy": "SCHED_OTHER",
"scheduling_priority": "0",
"bin_dir": "/opt",
"working_directory": "/tmp",
"restarts_during_startup": 2,
"resource_limits": {}
},
"component_properties": {
"binary_name": "",
"is_native_application": false,
"is_supervised": true,
"alive_supervision": {
"reporting_cycle": 0.5,
"failed_cycles_tolerance": 2,
"min_indications": 1,
"max_indications": 3
},
"is_self_terminating": false,
"is_state_manager": false,
"depends_on": [],
"process_arguments": []
},
"run_target": {
"transition_timeout": 2
}
},
"components": {
"setup_filesystem_sh": {
"component_properties": {
"binary_name": "bin/setup_filesystem.sh",
"is_native_application": true,
"is_supervised": false,
"is_self_terminating": true,
"process_arguments": ["-a", "-b"]
},
"deployment_config": {
"bin_dir": "/opt/scripts"
}
},
"dlt-daemon": {
"component_properties": {
"binary_name": "dltd",
"is_native_application": true,
"is_supervised": false,
"depends_on": {
"setup_filesystem_sh": {
"required_state": "Terminated"
}
}
},
"deployment_config": {
"bin_dir" : "/opt/apps/dlt-daemon"
}
},
"someip-daemon": {
"component_properties": {
"binary_name": "someipd"
},
"deployment_config": {
"bin_dir" : "/opt/apps/someip"
}
},
"test_app1": {
"component_properties": {
"binary_name": "test_app1",
"depends_on": {
"dlt-daemon": {
"required_state": "Running"
},
"someip-daemon": {
"required_state": "Running"
}
}
},
"deployment_config": {
"bin_dir" : "/opt/apps/test_app1"
}
},
"state_manager": {
"component_properties": {
"binary_name": "sm",
"is_state_manager": true,
"depends_on": {
"setup_filesystem_sh": {
"required_state": "Terminated"
}
}
},
"deployment_config": {
"bin_dir" : "/opt/apps/state_manager"
}
}
},
"run_targets": {
"Minimal": {
"description": "Minimal functionality of the system",
"includes": {
"components": ["state_manager"]
}
},
"Full": {
"description": "Everything running",
"includes": {
"components": ["test_app1"],
"run_targets": ["Minimal"]
},
"transition_timeout": 5
},
"Off": {
"description": "Nothing is running"
},
"initial_run_target": "Minimal"
},
"health_monitoring" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no health_monitoring at LaunchDaemon so shall be removed. is you meant here app supervision, tthe cycle comes via IPC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting is about the evaluation of alive notifications in the launch manager daemon.
Currently, the evaluation is performed in a cycle. E.g. every 50ms launch_manager wakes up and is evaluating the checkpoints that have been received in the last 50ms. It would make sense to have this configurable. Depending on the system and the required detection time of failures, this number might be small (maybe 10ms) or big (maybe 100ms).

Otherwise, if the launch_manager would wake up on every received alive notification and do the evaluation immediately - the performance would be too bad if many applications report alive notifications

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I get it. But first of all, change the name - highly confusing with other components. Second of all, you mean you use single thread cycled by this value to check alive notifications presence for all processes ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation does use a single thread. I guess that could be extended in the future if this is required.

Do you have a suggestion for another name? Both the library and the daemon are doing some kind of health monitoring. But yeah, having the same name is confusing at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we said once the api betwen daemon and users app is SupervisorAPI, maybe it shall be supervisor_cycle_time (app_supervision_cycle_time) or something in this manner ?

"evaluation_cycle": 0.5,
"watchdogs": {
"simple_watchdog": {
"device_file_path": "/dev/watchdog",
"max_timeout": 2,
"deactivate_on_shutdown": true,
"require_magic_close": false
}
}
}
}
```

## Open Topics

### RecoveryAction

In the initial version (mapping to the config to the existing software) the only supported RecoveryAction will be to switch to a predefined recovery RunTarget in case of any error.

How to define the SwitchRunTarget RecoveryAction without entangling the component definition with the RunTarget definition?

### Execution dependencies: Component State vs. Process State

For the initial version, it will be easiest to have component state identical to process state as this is already supported by the existing code.

### How to configure the initial RunTarget

Should we reserve a name for the initial RunTarget or make this configurable (see `initial_run_target`)?

### Defining json schema

After agreement on the rough structure, we will define a json schema depicting all possible configuration options.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading