On the Punditry of Network APIs

If you had to pick one thing for your networking vendors to give you to help you on your automation journey, what is it, and why is it a good API?

Cheeky question, but I suspect if there was a poll of network engineers, network automation engineers, network reliability engineers, or network-dev-op-sec-autobot engineers (or whatever cool title people are after nowadays), a resounding majority would say “API” or a “good API” or something to that effect. But what does that really even mean, and why is it “the answer”?

At least for me, when I hear people say API I tend to think of an HTTP based API – maybe its RESTful, maybe it isn’t, but I think about something you can poke over HTTP to get info and send requests to. This is obviously a fairly narrow definition of an API, but I don’t think I’m alone in having this be my first thought.

Given that the term API is probably wildly overloaded as it is, maybe it’s a good idea to just start with an actual definition from the dictionary:

API, noun; a set of functions and procedures allowing the creation of applications that access the features or data of an operating system, application, or other service.

Ah yes, super helpful in clearing up the vagueness eh? So… its some stuff that allows you to… do stuff…!

OK fine, thats some dictionary definition so it probably isn’t wildly helpful for us. Let’s think about the APIs that the lovely network vendor world provides to us. Sorry in advance for picking on Cisco here, I am most familiar with Cisco stuff… needless to say they are not doing anything particularly unique, so the broad categories should apply to other vendors as well:

  • HTTP - like my initial thought we of course have HTTP-based APIs that vendors are providing – things like Cisco ACI/DNA/NX-API(cli-y api thing)/NX-API REST/etc.
  • NETCONF without YANG (i.e. sending CLI commands over NETCONF) – as on some IOS-XE versions
  • RESTCONF/YANG – at least IOS-XE, maybe elsewhere?

And I’m not even touching on GRPC/GNMI as I’m not too savvy on that part of the world just yet, but I think you can just chuck that in the bucket of “APIs with YANG stuff” for purposes of this post.

But… surely that list doesn’t represent every possible way to interact with a device, right? What about… the original network API – SNMP!

SNMP isn’t new/sexy so we don’t like to think of it as an API, yet it falls squarely within the definition of an API above. SNMP is in fact a set of functions or procedures allowing us to access data about our networking devices! By that definition I think SNMP is certainly a valid API.

There is one other API that it seems we as an industry are doomed to hate on despite its ubiquitousness: the CLI. What is the CLI if not an API? It is a set of functions/procedures that allow us to interact with networking devices, is it not?

Clearly we are not talking about the CLI as the API of the future here in the (chaotic) year 2020. Even so, nearly every “api first” product out there also ships with a CLI (maybe minus Meraki?). So what is preventing folks in the industry from viewing the CLI as an API?

Almost always the first things that folks bring up when saying that CLI is an unsuitable mechanism for automation is that there is no structured data, and that the CLI is inherently unreliable. If you’ve found your way to this obscure blog post you have undoubtedly heard these statements before; but are they legit? Where do they come from?

Firstly, clearly most CLIs do not by default print out structured data, this much is obvious if you’ve ever logged into any network device ever. But! Many devices, especially with relatively recent software, can spit out structured data in response to “show” commands via options like “| display xml” or “| json” as we can see from this snippet from an NX-OS device here:

switch# show int e1/1 | json
{"TABLE_interface": {"ROW_interface": {"interface": "Ethernet1/1", "state": "up", "admin_state": "up", "share_state": "Dedicated", "et
h_hw_desc": "100/1000/10000 Ethernet", "eth_hw_addr": "5254.0007.9508", "eth_bia_addr": "5254.0007.9508", "eth_mtu": "1500", "eth_bw":
 "1000000", "eth_dly": "10", "eth_reliability": "255", "eth_txload": "1", "eth_rxload": "1", "encapsulation": "ARPA", "medium": "broad
cast", "eth_mode": "access", "eth_duplex": "full", "eth_speed": "auto-speed", "eth_beacon": "off", "eth_autoneg": "on", "eth_in_flowct
rl": "off", "eth_out_flowctrl": "off", "eth_mdix": "off", "eth_swt_monitor": "off", "eth_ethertype": "0x8100", "eth_eee_state": "n/a",
 "eth_link_flapped": "1d02h", "eth_clear_counters": "never", "eth_reset_cntr": "1", "eth_load_interval1_rx": "30", "eth_inrate1_bits":
 "0", "eth_inrate1_pkts": "0", "eth_load_interval1_tx": "30", "eth_outrate1_bits": "0", "eth_outrate1_pkts": "0", "eth_inrate1_summary
_bits": "0 bps", "eth_inrate1_summary_pkts": "0 pps", "eth_outrate1_summary_bits": "0 bps", "eth_outrate1_summary_pkts": "0 pps", "eth
_load_interval2_rx": "300", "eth_inrate2_bits": "0", "eth_inrate2_pkts": "0", "eth_load_interval2_tx": "300", "eth_outrate2_bits": "0"
, "eth_outrate2_pkts": "0", "eth_inrate2_summary_bits": "0 bps", "eth_inrate2_summary_pkts": "0 pps", "eth_outrate2_summary_bits": "0 
bps", "eth_outrate2_summary_pkts": "0 pps", "eth_inucast": "0", "eth_inmcast": "0", "eth_inbcast": "0", "eth_inpkts": "0", "eth_inbyte
s": "0", "eth_jumbo_inpkts": "0", "eth_storm_supp": "0", "eth_runts": "0", "eth_giants": "0", "eth_crc": "0", "eth_nobuf": "0", "eth_i
nerr": "0", "eth_frame": "0", "eth_overrun": "0", "eth_underrun": "0", "eth_ignored": "0", "eth_watchdog": "0", "eth_bad_eth": "0", "e
th_bad_proto": "0", "eth_in_ifdown_drops": "0", "eth_dribble": "0", "eth_indiscard": "0", "eth_inpause": "0", "eth_outucast": "0", "et
h_outmcast": "0", "eth_outbcast": "0", "eth_outpkts": "0", "eth_outbytes": "0", "eth_jumbo_outpkts": "0", "eth_outerr": "0", "eth_coll
": "0", "eth_deferred": "0", "eth_latecoll": "0", "eth_lostcarrier": "0", "eth_nocarrier": "0", "eth_babbles": "0", "eth_outdiscard": 
"0", "eth_outpause": "0"}}}

Certainly this does not apply to all devices, or all commands, but it is an option in many places. For any devices or commands that do not have this this type of functionality there are several great tools (TextFSM + ntc_templates, Genie, TTP, and probably many more) available to convert “unstructured” output into “structured data” – clearly this is not the same as the device natively returning “structured data” as in the JSON example above, however in the end… if it works and you get structured data, does it matter? One thing that may cause these “third party” data parsing tools to fail or to have issues leads us right back to point number 1: “SSH/CLIs are inherently unreliable”. I adamantly disagree with this notion.

What makes an API (of whatever flavor) NOT unreliable? Encoding. Thats it. For example, NETCONF simply runs over SSH, so why is it “better” than “screen scraping”? Because it has a reliable and standard encoding format so that there is never any ambiguity about when a “response” (to an rpc) is done. It does this because messages are in XML (obviously a standard), and there is an agreed upon message delimiter to indicate when a message or a response is finished sending. When “screen scraping” over SSH we don’t have these things, and so we must somehow determine when it is acceptable to send a request and perhaps more importantly, when the response to our request is done being printed to the screen.

So… is there no encoding on a CLI? Of course there is – if there was no standardized/formatted message data we would just be looking at byte streams all day and no human would be able to easily use a CLI! The critical difference between “vanilla” SSH and NETCONF is that the encoding for the “vanilla” (CLI) SSH has a different intended audience. NETCONF encoding is meant to allow robots (computers/programming languages/whatever) to be able to easily read/write NETCONF data, CLI however is encoded for human use.

When I was initially building scrapli I didn’t really understand this, I was just focused on reading data from the channel and making sure that I was properly matching prompt patterns to use as “anchors” to know when I could write and when I should be done reading from the SSH channel. When I started getting serious about building scrapli-netconf, however, things started to click at a more fundamental level for me, there is always encoding, its just who it is for and how to interpret it.

With the understanding that there is in fact encoding, the point that “SSH/CLIs are inherently unreliable” starts to seem a little odd… if there is encoding… and CLIs have been reliable for humans for… basically forever… then why is it that SSH/CLIs are unreliable? The encoding is there, and we have ways to get structured data, so we clearly need to pick this notion apart a bit more to figure out what the “problem” is…

  1. Historically, using things like expect was hard – we didn’t have good tooling to deal with the human intended encoding, things were all very explicit and error prone. I am obviously biased, but I believe that this is a non-issue now that scrapli exists! Certainly there will be some platform/software that scrapli will not work well with, but even so, if you needed to interact with a device over SSH, it is absolutely doable and reliable as soon as the notion that the encoding exists settles in.

  2. Getting structured data is not as simple as via a “real” API. Even if you have a great SSH client (scrapli, of course) and your device supports displaying outputs in JSON or XML, there is still “stuff” surrounding the output that needs cleaning up; the command input, the trailing prompt, or whatever other bits are there to make things human readable. In my opinion this is really a non-issue… basic string formatting (split!), or a simple regular expression can snag a JSON or XML blob out of a larger string very simply, then its just a matter of json.loads or similar and, you have your structured data.

  3. It seems to be “common knowledge” that CLIs are unreliable in the formatting/content of their output. Meaning that from version 1.0 to version 1.1 things (command output/command syntax/etc.) are liable to change. Certainly this is possible… but how is that any different than an API changing? HTTP APIs can obviously change with any release, YANG is no different – in fact there is a specific note in the YANG GitHub repository for IOS-XR that explicitly says that the models are generated from internal schema and that sometimes new features or bug fixes may cause backward incompatibility with models! None of this is a bad thing for APIs, but the point is that just like CLIs, APIs change – if you are not actively testing and validating your code (of whatever type) against the things you are interacting with, you are probably doing it wrong!

One final point on this “reliability”/“consistency” point: if you have a modern IOSXE device running let’s say something like IOSXE 17.2.2 (latest release at time of writing) and also an old Cisco 2960 running legacy IOS 12.2, scrapli (for example) can absolutely be used to interact with both of those devices with zero modifications… Think about that! Over the past ten or so years, the CLI or the “encoding” of the CLI has been so consistent that a modern/recently developed SSH library can be used with both of these devices!

Its around about this point that people begin to think I’m crazy (if they didn’t already). “So you’re telling me you’d rather do screen scraping than deal with an API?” – no, obviously not! My point is that screen scraping is not nearly as difficult and nowhere near as unreliable as it is made out to be. Moreover, the time to develop something that relies on commands you already know returning data you are already familiar with is much shorter than exploring a new API, or dealing with an API for new devices and falling back to screen scraping for others, etc.. Perhaps most importantly, nearly every device as a CLI, but many do not have modern API interfaces.

So… with all of that said… what do I think about all these APIs? I think that we should not be running away from the CLI. CLIs should however adapt and be modernized.

The most obvious path forward for a better network automation experience is that CLIs should simply be “wrappers” around the API (of whatever flavor) – this would unify the developer and operator experiences. If an operator can show a developer a command or set of commands or even a script (“scraping” the CLI) that achieves XYZ outcome, a developer can then very easily migrate from the CLI to using the “API”.

In the end, while I don’t think that I will be changing anybodies opinion of screen scraping, or what constitutes an API; I do think that folks should take a step back and think about why an “API” is the only way forward. Why we are all continually echoing this sentiment – especially without any specificity of what that “API” should be? Again, of course a modern API is a good thing, but the CLI is, and will continue to be the lowest common denominator when it comes to ways to interact with a device. There is value in very short development cycle of automating what you already know (the CLI) – even if it is nothing more than a stop gap or a proof of concept before migrating to the more robust, more modern, HTTP or NETCONF/YANG API. So, let’s all be a little more accepting of our old reliable friend the CLI, and let it be just another option in the tool belt of modern network automation.