Ginormous Scrapli Overhaul

It’s been about a year now that scrapli has been out there, and I have learned a tooooon from working on this (which was the original point anyway, so great job me!) and of course from continuing onward and upward with $dayjob things. It’s always been my intent to keep scrapli as nice and “clean” and tidy as possible… and now that I am one year smarter (allegedly) there is/was a lot of stuff in scrapli “internals” that I thought should/could be better. so I decided I should try to improve what I could! this post is going to try to detail those things that have changed, why I thought they should be changed, and at least briefly how they have changed.

Before diving into all that good stuff, a few notes:

  1. The public API has changed very little. I have always been of the mind that I will iterate as quickly as possible on scrapli and that folks should pin their versions and check release notes, but that said, of course if I monkey around with things too often that will probably piss people off! Even if I am building scrapli for my own educational purposes, part of the fun of it is hearing about people who are enjoying using it and are maybe themselves learning more about python by getting into scrapli.
  2. This is still (at time of writing, January 2021), not released on PyPI (or even into the main master or develop branches) so things still may change a bit.

With that out of the way, lets get into it!

One of the core design attributes of scrapli has always been that it is made up of three primary components working together; the “driver” – what users interact with, the “connection” object if you will. The “transport” – the actual I/O. Lastly, the “channel” – the “smarts” to know when we can send and receive data from the transport. In general I think this separation of duties has been really great for the overhaul design/structure of scrapli, it has kept complexity and the actual count of lines of code relatively distributed across the three main pillars, which is nice from a sanity perspective!

This is all well and good, however, things got a little bit messy over time! In particular, the system and telnet transports had fairly significant chunks of code in them for handling “in channel” authentication – authentication that is happening outside of an underlying transport driver. This “in channel” authentication process was all about reading the bytes on the wire and checking for where login, password, or passphrase (for ssh keys) prompts were, and responding to them appropriately. At first this made sense – some (most!) transports (ssh2/paramiko/asyncssh) handle all of this auth stuff for you, so it would make sense that other transports (system/telnet/asynctelnet) should also handle this… right? Right! Wrong! This stopped being a good idea at the addition of the asynctelnet driver, when I realized that I effectively had to duplicate all of the “in channel” authentication code that lived in system/telnet into the asynctelnet transport. Moreover, I had the realization that this was actually a channel thing – this had to deal with reading up till prompts, sending inputs, reading until the input was echoed back and then sending a return… all channel jobs. The transport, at the end of the day, needs to do very little – simply provide a mechanism to read and write to the device. So, all of this “in channel” authentication handling has been moved, you guessed it, to the channel!

While we are on the subject of transports… early on I made the decision to have the transport system be “pluggable”. this I believe is, and was, a good call. Recently I added the “asynctelnet” transport – it took an afternoon, and zero parts of scrapli had to be modified, a big reason why this was so “easy” is because the transports are merely loosely coupled plugins. While I don’t think any one other than me has built a scrapli transport, the flexibility of transports being pluggable is I think a net benefit to scrapli. One of the initial reasons for wanting to make the transports pluggable was to simply have “less stuff” in scrapli “core”. Less stuff to maintain, to test, to deal with. One final reason for the transports being relegated to their own repositories was to ensure that scrapli “core” had as few/zero dependencies as possible; perhaps an arbitrary goal, but a goal nonetheless!

Well… despite all of that, the transports have now all been moved back into scrapli core! Why?! Well, most obvious is that I am the only one (and only one for the foreseeable future probably) doing any kind of maintaining of the transports. This means I have scrapli, scrapli-ssh2, scrapli-paramiko, scrapli-asyncssh, nornir-scrapli, and scrapli-netconf to worry about. I’m not complaining, this is my baby, so its all good, but it was a bit tedious. By collapsing the transports back into core I can more easily manage dev dependencies, keep things more consistent from a testing/linting perspective, and not have to do a big shebang release day when/if I need to make changes. This also gave me the opportunity to rectify some bad decision making about how and where to import transport libraries. So, net/net is that from a maintainers perspective there have been some internal improvements (how/where to import) and some logistical improvements (collapse everything back to core), but from a users perspective nothing has changed here!

Next, and weirdly the thing that kicked off this huge overhaul…. logging! Python logging is relatively approachable… until it isn’t anymore :) I have always endeavored to have useful logging, but I haven’t always succeeded. scrapli has never had a “channel log” – or a log that is just the inputs/outputs going to the device, that has existed but its been intermingled with a bajillion other log messages… not ideal. Logs also have not always done the best job of containing extra contextual information about the connection that the log message is related to… making it hard to deal with logs if you are running lots of connections at once (via nornir, asyncio, or your own threading/multiprocessing solution). Lastly, basic examples of settup up some logging have proven extremely useful over time for troubleshooting issues, however this always could have been better – in terms of log output format and how to have folks enable this extra logging.

To address these shortcomings the first step was to use a LoggingAdapter – basically a “normal” python logger instance, but with pre-assigned “extras” (rather than having to add the “extras” argument to each log recording). The “extras” in this case would simply be the host and the port that the object represents. additionally there is a new driver argument “logging_uid” which adds a user provided unique identifier to log messages – useful for those running lots of connections. Next up is the addition of an “enable_basic_logging” (name subject to change!) function in scrapli’s logging module. This function does exactly what you expect it would based on the name: enable some basic opinionated logging. You should generally not be using this outside of debugging/troubleshooting as it takes a very opinionated approach to formatting all logging messages that will likely stomp on other libraries logging… however, I think that as a maintainer this option is a great easy button for folks to get some good logging captured for attaching to issues and things like that. This also can server as a base for users to build their own logging formatters/handlers (for scrapli or otherwise).

Continuing on… as I mentioned one of the core reasons for scrapli’s existence was to have a project that i could learn from, and one of the things that I had basically near-zero experience with at scrapli’s onset was testing. So, testing has always been an important topic for me, and testing has gone through quite a few complete overhauls over the past year! and… here we are again :)

scrapli has, from very early on, had two primary types of tests: “unit” and “functional”. unit tests are what you would expect from a normal software project, tests that assert that chunks of code do things they should do. The functional tests, are a different beast though! They require a “real” (virtual, but… real as in a real JunOS instance for example) device to connect to. While I am sure that nobody has taken the time to do so, I have endeavored to make it so that my functional testing can be reproduced by anybody willing to spend a bit of time, energy, and CPU cycles (thanks to the use of vrnetlab). More recently, I also created a “mock” ssh server using asyncssh – this server acted just like a “normal” iosxe device, but was able to be ran wherever – in GitHub actions containers for example. This final addition has given me even more confidence that scrapli is doing what I think it should be doing. Despite my affection for the mock ssh server setup, this has actually been removed from scrapli for the time being. The reason is twofold, firstly, I took the time and the things I’ve learned over the past year to greatly improve the existing unit test suite – covering far more of the code base in a meaningful way than before; and secondly, that there are some hopefully big things coming soon in relation to this – thats all I’ll write on this subject for now, but stay tuned :)

Lastly… I swear… docs! I generally am fairly… verbose… would you have guessed? In keeping with that theme, I have tried to have verbose documentation – documentation that explains the why behind things, so that folks can more clearly understand the thinking behind doing things in certain ways, which ultimately, I hope, will help them to find scrapli more effective, and more enjoyable to work with. Docs up to this point have come in two main flavors – 1) the README/manually written docs, and 2) the auto generated docs based on doc strings. Personally, the auto-generated docs (thanks to pdoc3) have been the most useful for reference, as I, of course, am pretty familiar with scrapli! For most other folks, I suspect the opposite is true; the verbose documentation in the README is probably most important. The scrapli docs, while some folks have said nice things about them, are, in my opinion, not super sexy! Sure, there is a lot there, and as I said I think/hope that them being fairly wordy helps to explain things, but at the end of the day it was really just a README in GitHub Pages ++ the auto generated API docs. We can do better!!

In keeping with the times, I have now moved the documentation to use Mkdocs and of course, the Material theme! The README has been slimmed WAY down… it just contains the bare minimum now, and most importantly it contains a link to the new doc site (still hosted with GitHub Pages). The new doc site contains pretty much all the same information, albeit a bit tidied up, and importantly more broken out to be, I think, a lot easier on the eyes.

One thing I tend to very much dislike about the Mkdocs setup that many modern projects are using is that they don’t tend to have good API docs – sure they have the wordy part down, but to actually find arguments to method XYZ I end up going to the source code a lot. Not a huge deal, but I do like decent docs that are search-able in a more friendly format like Mkdocs. So, I spent far too long obsessing over this, and settled on a reasonable solution! The “old” pdoc3 documentation is now embedded as pages in the Mkdocs site! This gives us the best of both worlds I think/hope!

There are even more changes that have been made in this fairly substantial overhaul, but for everyone’s sanity I’ll stop now. You can check out the changelog at the new site here. I really hope folks will chime in on Twitter/GitHub/networktocode Slack and let me know what they think, and what else could be improved, and perhaps most importantly, take the “overhaul” branch for a spin and let me know if anything that worked before is not working now!