Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 17 additions & 17 deletions arch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -187,10 +187,10 @@ cluster built out of bare-metal components, each of the SD-Core CP
subsystems shown in :numref:`Figure %s <fig-aether>` is actually
deployed in a logical Kubernetes cluster on a commodity cloud. The
same is true for AMP. Aether’s centralized components are able to run
in Google Cloud Platform, Microsoft Azure, and Amazon’s AWS. They also
in Google Cloud Platform, Microsoft Azure, and Amazon’s AWS. They can also
run as an emulated cluster implemented by a system like
KIND—Kubernetes in Docker—making it possible for developers to run
these components on their laptop.
these components on their laptops.

To be clear, Kubernetes adopts generic terminology, such as “cluster”
and “service”, and gives it a very specific meaning. In
Expand Down Expand Up @@ -239,8 +239,8 @@ There is a potential third stakeholder of note—third-party service
providers—which points to the larger issue of how we deploy and manage
additional edge applications. To keep the discussion tangible—but
remaining in the open source arena—we use OpenVINO as an illustrative
example. OpenVINO is a framework for deploying AI inference models,
which is interesting in the context of Aether because one of its use
example. OpenVINO is a framework for deploying AI inference models.
It is interesting in the context of Aether because one of its use
cases is processing video streams, for example to detect and count
people who enter the field of view of a collection of 5G-connected
cameras.
Expand Down Expand Up @@ -274,11 +274,11 @@ but for completeness, we take note of two other possibilities. One is
that we extend our hybrid architecture to support independent
third-party service providers. Each new edge service acquires its own
isolated Kubernetes cluster from the edge cloud, and then the
3rd-party provider subsumes all responsibility for managing the
3rd-party provider takes over all responsibility for managing the
service running in that cluster. From the perspective of the cloud
operator, though, the task just became significantly more difficult
because the architecture would need to support Kubernetes as a managed
service, which is sometimes called *Container-as-a-Service (CaaS)*.\ [#]_
service, which is sometimes called *Containers-as-a-Service (CaaS)*.\ [#]_
Creating isolated Kubernetes clusters on-demand is a step further than
we take things in this book, in part because there is a second
possible answer that seems more likely to happen.
Expand Down Expand Up @@ -355,7 +355,7 @@ Internally, each of these subsystems is implemented as a highly
available cloud service, running as a collection of microservices. The
design is cloud-agnostic, so AMP can be deployed in a public cloud
(e.g., Google Cloud, AWS, Azure), an operator-owned Telco cloud, (e.g,
AT&T’s AIC), or an enterprise-owned private cloud. For the current pilot
AT&T’s AIC), or an enterprise-owned private cloud. For the pilot
deployment of Aether, AMP runs in the Google Cloud.

The rest of this section introduces these four subsystems, with the
Expand Down Expand Up @@ -485,9 +485,9 @@ Given this mediation role, Runtime Control provides mechanisms to
model (represent) the abstract services to be offered to users; store
any configuration and control state associated with those models;
apply that state to the underlying components, ensuring they remain in
sync with the operator’s intentions; and authorize the set API calls
users try to invoke on each service. These details are spelled out in
Chapter 5.
sync with the operator’s intentions; and authorize the set of API
calls that users try to invoke on each service. These details are
spelled out in Chapter 5.


2.4.4 Monitoring and Telemetry
Expand Down Expand Up @@ -526,13 +526,13 @@ diagnostics and analytics.

This overview of the management architecture could lead one to
conclude that these four subsystems were architected, in a rigorous,
top-down fashion, to be completely independent. But that is not
the case. It is more accurate to say that the system evolved bottom
up, solving the next immediate problem one at a time, all the while
top-down fashion, to be completely independent. But that is not the
case. It is more accurate to say that the system evolved bottom up,
solving the next immediate problem one at a time, all the while
creating a large ecosystem of open source components that can be used
in different combinations. What we are presenting in this book is a
retrospective description of an end result, organized into four
subsystems to help make sense of it all.
in different combinations. What this book presents is a retrospective
description of the end result, organized into four subsystems to help
make sense of it all.

There are, in practice, many opportunities for interactions among the
four components, and in some cases, there are overlapping concerns
Expand Down Expand Up @@ -686,7 +686,7 @@ own. The Control and Management Platform now has its own DevOps
team(s), who in addition to continually improving the platform, also
field operational events, and when necessary, interact with other
teams (e.g., the SD-RAN team in Aether) to resolve issues that come
up. They are sometimes called System Reliability Engineers (SREs), and
up. They are sometimes called Site Reliability Engineers (SREs), and
in addition to being responsible for the Control and Management
Platform, they enforce operational discipline—the third aspect of
DevOps discussed next—on everyone else.
Expand Down
78 changes: 40 additions & 38 deletions authors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,47 +6,49 @@ Science, Emeritus at Princeton University, where he served as Chair
from 2003-2009. His research focuses on the design, implementation,
and operation of Internet-scale distributed systems, including the
widely used PlanetLab and MeasurementLab platforms. He is currently
contributing to the Aether access-edge cloud project at the Open
Networking Foundation (ONF), where he serves as Chief Scientist.
Peterson is a member of the National Academy of Engineering, a Fellow
of the ACM and the IEEE, the 2010 recipient of the IEEE Kobayashi
Computer and Communication Award, and the 2013 recipient of the ACM
SIGCOMM Award. He received his Ph.D. degree from Purdue University.
contributing to the Aether access-edge cloud project at the Linux
Foundation. Peterson is a member of the National Academy of
Engineering, a Fellow of the ACM and the IEEE, the 2010 recipient of
the IEEE Kobayashi Computer and Communication Award, and the 2013
recipient of the ACM SIGCOMM Award. He received his Ph.D. degree from
Purdue University.

**Scott Baker** is a Cloud Software Architect at Intel, which he
joined as part of Intel's acquisition of the Open Networking
Foundation (ONF) engineering team. While at ONF, he led the Aether
DevOps team. Prior to ONF, he worked on cloud-related research
projects at Princeton and the University of Arizona, including
PlanetLab, GENI, and VICCI. Baker received his Ph.D. in Computer
Science from the University of Arizona in 2005.
**Scott Baker** is a Cloud Software Architect at Intel, where he works
on the Open Edge Platform. Prior to joining Intel, he was on the Open
Networking Foundation (ONF) engineering team that built Aether,
leading the runtime control effort. Baker has also worked on
cloud-related research projects at Princeton and the University of
Arizona, including PlanetLab, GENI, and VICCI. He received his
Ph.D. in Computer Science from the University of Arizona in 2005.

**Andy Bavier** is a Cloud Software Engineer at Intel, which he joined
as part of Intel's acquisition of the Open Networking Foundation (ONF)
engineering team. While at ONF, he worked on the Aether project. Prior
to joining ONF, he was a Research Scientist at Princeton University,
where he worked on the PlanetLab project. Bavier received a BA in
Philosophy from William & Mary in 1990, and MS in Computer Science
from the University of Arizona in 1995, and a PhD in Computer Science
from Princeton University in 2004.
**Andy Bavier** is a Cloud Software Engineer at Intel, where he works
on the Open Edge Platform. Prior to joining Intel, he was on the Open
Networking Foundation (ONF) engineering team that built Aether,
leading the observability effort. Bavier has also been a Research
Scientist at Princeton University, where he worked on the PlanetLab
project. He received a BA in Philosophy from William & Mary in 1990,
and MS in Computer Science from the University of Arizona in 1995, and
a PhD in Computer Science from Princeton University in 2004.

**Zack Williams** is a Cloud Software Engineer at Intel, which he
joined as part of Intel's acquisition of the Open Networking
Foundation (ONF) engineering team. While at ONF, he worked on the
Aether project, and led the Infrastructure team. Prior to joining ONF,
he was a systems programmer at the University of Arizona. Williams
received his BS in Computer Science from the University of Arizona
in 2001.
**Zack Williams** is a Cloud Software Engineer at Intel, where he
works on the Open Edge Platform. Prior to joining Intel, he was on the
Open Networking Foundation (ONF) engineering team that built
Aether, leading the infrastructure provisioning effort. Williams has also
been a systems programmer at the University of Arizona. He received
his BS in Computer Science from the University of Arizona in 2001.

**Bruce Davie** is a computer scientist noted for his contributions to
the field of networking. He is a former VP and CTO for the Asia
Pacific region at VMware. He joined VMware during the acquisition of
Software Defined Networking (SDN) startup Nicira. Prior to that, he
was a Fellow at Cisco Systems, leading a team of architects
responsible for Multiprotocol Label Switching (MPLS). Davie has over
30 years of networking industry experience and has co-authored 17
RFCs. He was recognized as an ACM Fellow in 2009 and chaired ACM
SIGCOMM from 2009 to 2013. He was also a visiting lecturer at the
Massachusetts Institute of Technology for five years. Davie is the
author of multiple books and the holder of more than 40 U.S. Patents.
the field of networking. He began his networking career at Bellcore
where he worked on the Aurora Gigabit testbed and collaborated with
Larry Peterson on high-speed host-network interfaces. He then went to
Cisco where he led a team of architects responsible for Multiprotocol
Label Switching (MPLS). He worked extensively at the IETF on
standardizing MPLS and various quality of service technologies. He
also spent five years as a visiting lecturer at the Massachusetts
Institute of Technology. In 2012 he joined Software Defined Networking
(SDN) startup Nicira and was then a principal engineer at VMware
following the acquisition of Nicira. In 2017 he took on the role of VP
and CTO for the Asia Pacific region at VMware. He is a Fellow of the
ACM and chaired ACM SIGCOMM from 2009 to 2013. Davie is the author of
multiple books and the holder of more than 40 U.S. patents.

42 changes: 21 additions & 21 deletions control.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ deployments of 5G, and to that end, defines a *user* to be a principal
that accesses the API or GUI portal with some prescribed level of
privilege. There is not necessarily a one-to-one relationship between
users and Core-defined subscribers, and more importantly, not all
devices have subscribers, as would be the case with IoT devices that
devices have subscribers; a concrete example would be IoT devices that
are not typically associated with a particular person.

5.1 Design Overview
Expand Down Expand Up @@ -115,7 +115,7 @@ Central to this role is the requirement that Runtime Control be able
to represent a set of abstract objects, which is to say, it implements
a *data model*. While there are several viable options for the
specification language used to represent the data model, for Runtime
Control we use YANG. This is for three reasons. First, YANG is a rich
Control Aether uses YANG. This is for three reasons. First, YANG is a rich
language for data modeling, with support for strong validation of the
data stored in the models and the ability to define relations between
objects. Second, it is agnostic as to how the data is stored (i.e.,
Expand Down Expand Up @@ -155,7 +155,7 @@ that we can build upon.
from (1) a GUI, which is itself typically built using another
framework, such as AngularJS; (2) a CLI; or (3) a closed-loop
control program. There are other differences—for example,
Adapters (a kind of Controller) use gNMI as a standard
Adaptors (a kind of Controller) use gNMI as a standard
interface for controlling backend components, and persistent
state is stored in a key-value store instead of a SQL DB—but the
biggest difference is the use of a declarative rather than an
Expand All @@ -168,11 +168,11 @@ x-config, in turn, uses Atomix (a key-value store microservice), to
make configuration state persistent. Because x-config was originally
designed to manage configuration state for devices, it uses gNMI as
its southbound interface to communicate configuration changes to
devices (or in our case, software services). An Adapter has to be
devices (or in our case, software services). An Adaptor has to be
written for any service/device that does not support gNMI
natively. These adapters are shown as part of Runtime Control in
natively. These adaptors are shown as part of Runtime Control in
:numref:`Figure %s <fig-roc>`, but it is equally correct to view each
adapter as part of the backend component, responsible for making that
adaptor as part of the backend component, responsible for making that
component management-ready. Finally, Runtime Control includes a
Workflow Engine that is responsible for executing multi-step
operations on the data model. This happens, for example, when a change
Expand Down Expand Up @@ -428,8 +428,8 @@ models are changing due to volatility in the backend systems they
control, then it is often the case that the models can be
distinguished as "low-level" or "high-level", with only the latter
directly visible to clients via the API. In semantic versioning terms,
a change to a low-level model would then effectively be a backwards
compatible PATCH.
a change to a low-level model would then effectively be a
backward-compatible PATCH.


5.2.3 Identity Management
Expand Down Expand Up @@ -467,15 +467,15 @@ the case of Aether, Open Policy Agent (OPA) serves this role.
<https://www.openpolicyagent.org/>`__.


5.2.4 Adapters
5.2.4 Adaptors
~~~~~~~~~~~~~~

Not every service or subsystem beneath Runtime Control supports gNMI,
and in the case where it is not supported, an adapter is written to
and in the case where it is not supported, an adaptor is written to
translate between gNMI and the service’s native API. In Aether, for
example, a gNMI :math:`\rightarrow` REST adapter translates between
example, a gNMI :math:`\rightarrow` REST adaptor translates between
the Runtime Control’s southbound gNMI calls and the SD-Core
subsystem’s RESTful northbound interface. The adapter is not
subsystem’s RESTful northbound interface. The adaptor is not
necessarily just a syntactic translator, but may also include its own
semantic layer. This supports a logical decoupling of the models
stored in x-config and the interface used by the southbound
Expand All @@ -484,15 +484,15 @@ Control to evolve independently. It also allows for southbound
devices/services to be replaced without affecting the northbound
interface.

An adapter does not necessarily support only a single service. An
adapter is one means of taking an abstraction that spans multiple
An adaptor does not necessarily support only a single service. An
adaptor is one means of taking an abstraction that spans multiple
services and applying it to each of those services. An example in
Aether is the *User Plane Function* (the main packet-forwarding module
in the SD-Core User Plane) and *SD-Core*, which are jointly
responsible for enforcing *Quality of Service*, where the adapter
responsible for enforcing *Quality of Service*, where the adaptor
applies a single set of models to both services. Some care is needed
to deal with partial failure, in case one service accepts the change,
but the other does not. In this case, the adapter keeps trying the
but the other does not. In this case, the adaptor keeps trying the
failed backend service until it succeeds.

5.2.5 Workflow Engine
Expand All @@ -519,7 +519,7 @@ ongoing development.
gNMI naturally lends itself to mutual TLS for authentication, and that
is the recommended way to secure communications between components
that speak gNMI. For example, communication between x-config and
its adapters uses gNMI, and therefore, uses mutual TLS. Distributing
its adaptors uses gNMI, and therefore, uses mutual TLS. Distributing
certificates between components is a problem outside the scope of
Runtime Control. It is assumed that another tool will be responsible
for distributing, revoking, and renewing certificates.
Expand Down Expand Up @@ -738,7 +738,7 @@ that it supports the option of spinning up an entirely new copy of the
SD-Core rather than sharing an existing UPF with another Slice. This is
done to ensure isolation, and illustrates one possible touch-point
between Runtime Control and the Lifecycle Management subsystem:
Runtime Control, via an Adapter, engages Lifecycle Management to
Runtime Control, via an Adaptor, engages Lifecycle Management to
launch the necessary set of Kubernetes containers that implement an
isolated slice.

Expand Down Expand Up @@ -802,7 +802,7 @@ Giving enterprises the ability to set isolation and QoS parameters is
an illustrative example in Aether. Auto-generating that API from a
set of models is an attractive approach to realizing such a control
interface, if for no other reason than it forces a decoupling of the
interface definition from the underlying implementation (with Adapters
interface definition from the underlying implementation (with Adaptors
bridging the gap).

.. sidebar:: UX Considerations
Expand Down Expand Up @@ -839,15 +839,15 @@ configuration change requires a container restart, then there may be
little choice. But ideally, microservices are implemented with their
own well-defined management interfaces, which can be invoked from
either a configuration-time Operator (to initialize the component at
boot time) or a control-time Adapter (to change the component at
boot time) or a control-time Adaptor (to change the component at
runtime).

For resource-related operations, such as spinning up additional
containers in response to a user request to create a *Slice* or
activate an edge service, a similar implementation strategy is
feasible. The Kubernetes API can be called from either Helm (to
initialize a microservice at boot time) or from a Runtime Control
Adapter (to add resources at runtime). The remaining challenge is
Adaptor (to add resources at runtime). The remaining challenge is
deciding which subsystem maintains the authoritative copy of that
state, and ensuring that decision is enforced as a system invariant.\ [#]_
Such decisions are often situation-dependent, but our experience is
Expand Down
Loading