What is cisco’s recommendation regarding the number of transport interfaces on a vsmart controller?

As you know, in the Cisco SDWAN solution, all of the connections (consist of control connections and data connections) are established built-in based on IPSEC connection and so in this solutions, security is not challenging matter. All of the connections are known as a zero trust connection.

The Cisco SDWAN solution supports segmentation and you can have vpn 1 up to vpn 511 as a service side vpn throughout your enterprise overlay. On the other hand, vpn 0 is named transport vpn and this vpn is responsible to make relation between overlay and underlay and you can have vpn 512 as an out-of-band management vpn.

In this solution, the wan edge routers make a transient dtls control connection with the vbond to authenticate and inform the information about vmanage and vsmart controllers. (note that vbond controller use only 12346 udp port to establish these transient dtls connections). After authenticating and informing about vmanage and vsmart controllers via vbond, the wan edge router establishes one (only one) dtls/tls control connection with the vmanage (even if you have clustering in the management plane, wan edge routers establish only one connection with the vmanage controller). To manipulate the behavior of the wan edge router in this section and make the connection with the vmanage on the specific transport interface, you can define the priority with using of “vmanage-connection-priority” command on the transport interface of wan edge routers. By default, the value of this priority is 5 and higher value has higher priority. Besides, wan edge routers establish dtls/tls control connections with 2 vsmart controllers by default. You can change this manner with using of “max-control-connections” on wan edge routers under the transport tunnel interface. By default, the value of this parameter is equal to the parameter of “max-omp-sessions”. (you could use “max-controllers” command to define maximum vsmart controller to make the connections on wan edge router traditionally, but after releasing of 15.4, this command was deprecated and you can use “max-control-connections” command instead). The protocol for control connections is dtls by default. You can change this protocol with use of “security control protocol tls” and “security control tls-port”. By default, the tls port is 23456. (it is recommended that make the connections on tls port, specifically, when you use the firewall between components). Finally, each one of the wan edge routers establishes a data plane connection with one of the other wan edge routers. These tunnels are encrypted with "ipsec" by default and you can use "gre" instead.

What is cisco’s recommendation regarding the number of transport interfaces on a vsmart controller?

The Cisco SDWAN solution supports BFD on the data plane connections and you are not able to disable this protocol, because you do not have any reason to do that at all. There are 2 goals for using of BFD in Cisco SDWAN solutions: 1- liveliness detection 2-link quality measurement. So BFD in Cisco SDWAN solution works on echo and echo-back mode and so the wan edge routers do not process these packets and it is amazing for enterprise solutions without any overhead process consideration. Based on the bfd status for echo and echo-back bfd packets, the status of bfd sessions are determined as the table mentioned below:

Furthermore, the BFD packets are marked with the DSCP48 (equivalent to CS6) and so it is placed in LLQ before transmitting.

Because of the bfd operations, in the Cisco SDWAN solution you can use app-route policy to ensure minimum defined SLA per app.

High availability can be supported in Cisco SDWAN solution for all of the components based on this description: 1- for vbond HA, you can use more than one vbond controller and in this situation, you must use a dns fqdn name which is resolved to different vbond controllers per query. 2- for vmanage HA, you can implement clustering in the management plane with 3 vmanage controllers at least. you must use odd quantities of vmanage in your cluster (because of split-brain avoiding). (for even numbers which are mentioned in the table below, you need to turn off services on one vmanage). Each one of vmanage can support up to 2K wan edge routers. 3-for vsmart HA, you can use more than one vsmart controller but you pay attention to this fact that when you use more than 2 vsmart controllers the control connections shared between vsmart controllers based on the viptela os or ios XE SDWAN behavior. Of-course you can change this behavior with use of controller-group-list on wan edge router with determining of vsmart IDs. before that, you must define ID for each one of vsmart controllers with use of controller-group-id command.

In the Cisco SDWAN solution, reporting mechanism is supported with alarms, events and logs. Between all of them, alarms can be configured to send email via SMTP and the others can not. Severity level for alarms is consist of: 1-critical 2- major 3- medium 4-minor, but severity levels for events is consist of: 1-critical 2- major 3- minor. There are 2 types of logs: 1- audit 2- acl. you can define to store logs in local disk of wan edge router or in a remote log server. The maximum size of log file is 20 MB and default value is 10 MB. you can change it with “system logging disk file size” command. Likewise, you can define up to 10 file and rotate files with use of “system logging disk file rotate” command.

By using of “show control connections-history” (on Viptela platform) or “show sdwan control connections-history” (on IOS XE SDWAN platform), you can be aware of failure reasons about control-plane tunnels. There are different error code, but some of the most important error codes are mentioned bellow:

1-     BIDNTVRFD, CRTREJSER, SERNTPRES : these error codes show that the serial number of the device is missing and to solve this problem you can send it to the controllers again via vmanage GUI with this path: configuration->certificates->vEdge-List tab->select the device and click on send to controllers option.

2-     CTORGNMMIS: this error code shows a mismatch on organization name between devices and to solve it you need to check the device configuration and use the organization-name command.

3-     DCONFAIL: this error code says that there is a problem around vsmart reachability.

4-     DISCVBD, SYSIPCHNG: these error codes show the normal behavior of overlay network and so no action is required.

5-     DISTLOC: this error code tell us that the TLOC is disabled. So there is a question that: when a TLOC is disabled? To answer this question, we can cite that if one of these conditions are occurred, the TLOC will be disabled: 1- being clear of control connections. 2- changing of TLOC color. 3- changing of system-ip. So you can check these matters to solve the problem.

6-     LISFD: this error code says that a socket error has been occurred. Duplicate ip address through the overlay network, corruption of the packets, dtls/tls mismatch at the any end of the tunnles and blocking of the forwarding ports can cause the socket error.

7-     NOVMCFG: this error code shows that there is no device/configuration template which is attached to the vedge during bringing up.

8-     RDSIGFBD, TXCHTOBD: these error codes show that the trusted board-id is not initialized. Especially in unstable networks, when the connections are going up and down frequently, this error will be occurred. (as you well aware, the board-id of the wan edge routers is one the parameter to handle the authentication process).

9-     VB_TMO, VS_TMO, VM_TMO: these error codes show a peer timeout problem. To solve the problem you need to check the connection to the controllers, but you would better pay attention to some considerations. For instance, increasing the “hello-interval” and “hello-tolerance” values on the transport interfaces. On the other hand, if the packets are rate-limited to below 1 Mbps on the underlay, the connections may not be formed and VM-TMO will be appeared.

10-  VECRTREV, VSCRTREV: these error codes say that the certificate of the vedge or vsmart has been revoked. To solve this problem, you need to check the time synchronization through all components at first.

One of the benefits of Cisco SDWAN solution is the policies definitely. In this concept, different kinds of policies and their characteristics have been mentioned below briefly:

a- Centralized Policy (you can define the topology of the overlay network (for example implementation of Custom Hub&Spoke design with use of “tloc-list” as an action in the centralized policy and etc.)(these policies are applied to vsmart controller in both direction of in (before calculation of omp best path (when vsmart receives the omp updates from a wan edge router)) and out (after calculating of omp best path (when vsmart is going to send the best path to the other wan edge routers))))

a-1-Control Policy

a-1-1- Control Policy

a-1-2- VPN Membership Policy

a-2- Data policy

a-2-1- Data Policy

you can manipulate the data packet forwarding around the overlay network such as DIA, FEC and etc.

a-2-2- Application-Aware Routing Policy

It is one of the useful results of the Cisco SDWAN solution by using of BFD packets. You can ensure the minimum SLA level (Loss, Latency and Jitter) fulfillment for any application based on per-Application requirement. With the command of “sla-class-preferred color <color>” you can define the preferable allowed color to forward the traffic of the application. to ensure that the traffic of the application is forwarded through the transport interfaces even if they do not pass the SLA threshold, when all of the allowed tunnels can not make the SLA conditions, , you can use the command of “sla-bakcup-preferred-color <color1> <color2>”...if you want to reject the traffic when no one of tunnels can pass the SLA conditions, you need to use the “strict” keyword. finally, don't forget that the app-route policy is worked for ECMP routes.

a-2-3- cFlowd Policy

       at first, write a cFlowd template to define the netflow process configurations and then use the cflowd as the action in a centralized data policy.

b- Localized policy (by using of the device template) (you can define the preferred condition on a wan edge router)

b-1- Data Policy

b-1-1-Route Policy

you can define the behavior of the wan edge router to send or receive data packets via underlay routing protocol.

b-1-2- QOS

WAN edge routers can have 8 hardware queue from queue 0 up to queue 7. In the normal behavior, all of the control-plane traffic is mapped to queue 0 with the scheduling method of llq (Low Latency Queueing) and drops method of tail-drop. For the other 7 queues, scheduling method is wrr (weighted round-robin) and drops method can be red- drop (random-early-detection) or tail-drop. User’s traffic is mapped to the queue 2 by default.

b-1-3- ACL

b-2- Security policies (based on the Snort Engine and are supported on IOS XE SDWAN)

b-2-1- Application Enterprise Firewall

with supporting of over than 1400 applications, you can distribute firewall service through the wan edge routers based on zone-based policy. The actions are included: pass, drop and inspect (stateful). Audit trail option is available only for inspect action. to ignore the DOS attack through the TCP SYN Flood attack, you can use the enable TCP-SYN Flood LIMIT. default value for TCP SYN Flood Limit is 2000 SYN packets. you can define upto 500 firewall rules in each policy

b-2-2- Intrusion Prevention / Detection

you can use the Cisco Tallos database to catch the latest update for signatures and so you can run the IPS/IDS on wan edge routers. Also, you can define your own signature list with the structure of generator-id:signature:id

b-2-3- URL-Filtering

you can implement web filtering based on url filtering on wan edge routers. You can define whitelist-url or blacklist-url based on your requirements.

b-2-4- Advanced Malware Protection with Threat Grid

you can implement the AMP service on the wan edge routers. After downloading any files by the clients in underlay of a wan edge router, the SNORT file processor calculate the hash-256 of file and compare with DB to make a decision for clean, malicious or unknown results. Also you can use the threat grid sandboxing for unknown states (if license is purchased). for APM policy in Cisco SDWAN solution, you can use logging mechanism based on 3 severity levels: critical, warning and info.

b-2-5- DNS Web Layer Security

umbrella cloud is an open source platform which was called Open DNS and is customized by the Cisco with the name of umbrella. By using of the Cisco umbrella cloud, you can use it as your dns server and so all of the dns querries relay to the cloud to check and resolve the ip address. You can enable DNSCrypt option to enable the encryption (DNSSEC, EDNS or TLS) for dns query packet. (This process is highly recommended).

To say about the order of operation of the policies, we can refer to this image:

One of the other advantage of the Cisco SDWAN solution is the compatibility with the integration of other cloud service providers such as Azure, AWS. To fulfill, you can use the Cloud OnRamp feature on Cisco SDWAN solution for SaaS, IaaS or even for Collocation.

In the SaaS. You can implement the DIA topology or gateway/client topology to receive the SaaS applications which are represented by the cloud service provider to decrease the latency and improve application experience. To define the wan edge router as a DIA site, you need to use the 16.3 on Viptela OS and 17.1 on IOS XE SDWAN. As the gateway/client site you need to 17.2 for both of them. Cloud OnRamp for SaaS supports following applications: AWS, Box, Concur, Dropbox, Google Apps, GoToMeeting, Intuit, Microsoft Office 365, Oracle, Salesforce, SugarCRM, Zendesk and Zoho CRM. This feature ensures that the best path will be selected to forward and receive the SaaS application traffic based on the higher vQoE (virtual quality of experience) scores for each transport interface.

About the Cloud OnRamp for IaaS, you can extend your overlay to the cloud service provider by implementing the Transit VPC (for AWS) or Transit VNET (for Azure) as the gateway of the mapped host VPCs which are launched on the cloud service provider region. Each Transit zone has 2 WAN Edge Cloud router and so you need to have the valid token for them and also need to feature and device template to configure those routers. 

For the Cloud OnRamp for Collocation to reduce the network costs and increase the visibility and manageability of the cloud traffic, you can implement some collocation point and so you are able to implement regional service chaining through your network. At the collocation point, there are 2 pair of Cisco CSP 5K and Cisco 9K Catalyst devices and to implement service chaining you can use define different VNF for different services. (each VNF as one service).

To protect the packets against the packet loss, you can 2 options in Cisco SDWAN solution: 1- FEC (fec-always or fec-adaptive) 2- packet-duplication. About the FEC, FEC Block concept is defined and has 5 packet: 4 Data Packet and 1 parity Packet. If one of the data packet is lost, because of the parity packet, the lost packet will be retrieved on the receiver. By FEC, you can protect the packet against loss with the rate of 25% loss and 25% resource overhead. FEC-always means that, this process is used all of the time. FEC-adaptive means that this process is used when the Packet Loss become higher than 2%. About the packet duplication. Because of this fact that this process need so many resources, so it is highly recommended that use the packet duplication feature for most important services such as financial transactions.

For auto provisioning, you need to 2 services at first: 1- DNS 2- DHCP. Also you need to make reachability to the ztp.viptela.com (for Viptela OS devices (udp port)) or devicehelper.cisco.com (for IOS XE SDWAN devices (tcp port)) via VPN0 transport interfaces throughout your overlay network. In Viptela platform hardwares, there is a default dedicated interface for onboarding based on each vedge hardware. You can see this table:

For manual configuration, you are obliged to configure some mandatory parameters on wan edge routers like organization-name, ntp-server, transport ip address and tunnel encryption, site-id, system-ip and vbond address. To join the vedge router to the overlay after installation of the root certificate (with the command of “request root-cert-chain install <path>”), you need to configure the “request vedge-cloud activate chassis-number <chassis number> token <OTP>” on vedge routers. because of this fact that the source image which is used as a vbond is vedge-cloud so you need to configure “vbond <vbond-ip-address> local” command to change it’s personality to vbond instead of vedge. For bootstrapping which is supported on IOS XE SDWAN devices, at first during booting up, device try to find the ciscosdwan.cfg or ciscosdwan_cloud_init.cfg based on the following hardware model. If the file is available, device loads the configuration and so the Plug and Play process is aborted and if the file is unavailable, plug and play process is continued and device try to search a bootable USB flash.

By default, each wan edge routers try to established the data plane connections through each own transport TLOC color. To change this manner you can use “group” (tunnel-group id) or “restrict” command for the colors. (tunnels will be established between two wan edge routers on the transport interfaces with the same tunnel-group id or with same color.). If you use both of them, both of those conditions are considered to data plane tunnel establishment. The other option around the data plane tunnels is called carrier. It’s default value is default. If the carrier parameters are the same at two wan edge routers, the tunnel would be teared up on private colors. If are not same at the both end of the tunnel, it is established on public colors.

The other parameter which is so important around data plane connections is IPSEC rekey timer value. (as you know that, the key exchange mechanism is handled by the OMP protocol). Default value for this parameter is 86400 seconds and it is highly recommended that it would be 2 times of the OMP Graceful Restart timer. Default value for omp graceful restart timer is 43200 seconds. You can change their values via “security ipsec rekey” and “vpn0 omp timers graceful-restart”.

About OMP; by default, it advertises only best path and you can change this manner by using of the “send-backup-paths” on vsmart. Also it advertises only 4 ecmp paths for any prefix. So to change this behavior because of design requirements, you need to use the “send-path-limit” command. The maximum value for this parameter (number of paths advertised per prefix) is 16.

At last, one of the benefit of Cisco SDWAN solutions is tloc-extension. There are later 2 and layer 3 tloc-extension and it is noteworthy that L3 tloc-extension is only supported on IOS XE SDWAN devices. Also the tloc-extension does not work on bounded interfaces to the loopback tunnel interfaces and LTE interfaces.

Of-course I had written about omp protocol in 2 past articles so I don’t state about it in this article.

Finally, I’m so appreciated to read this article and it is recommended to read the CVD doc for Cisco SDWAN Solution with this link carefully (it might be so useful definitely):

https://www.cisco.com/c/en/us/td/docs/solutions/CVD/SDWAN/cisco-sdwan-design-guide.html

(Also, for onboarding process, you can use this file: https://www.cisco.com/c/dam/en/us/td/docs/solutions/CVD/SDWAN/sdwan-wan-edge-onboarding-deploy-guide-2020nov.pdf)

Have a nice time and good luck.