In this article I'm describing how easy you could use bcc utilities (eBPF) to inspect tools on your system with ansible as an example.
After my first article about eBPF (Unlocking eBPF power), which was just-for-fun type of article, I wanted to start using eBPF in my daily work. I remember the situation from one of my interviews, where I was asked ‘How would I approach unknown binary/script?’. They were asking what tools I could use to get the most info about it and possibly perform some basic troubleshooting. I remember it quite vividly, mainly because I failed on it so badly. After reading this article you should probably be at much better position when dealing with similar type of problem.
If you hear eBPF for the first time I would recommend going through (Unlocking eBPF power), where I’m describing my first contact with eBPF. For quick recap: “eBPF is a functionality of linux kernel that allows lightweight execution of user code as a response to kernel events. The events could be hardware/software events, tracing events both static (compiled into code) and dynamic (attached in runtime), etc. The code itself is limited in a sense that it is guaranteed to finish (no loops) and is verified before loading into kernel.”
Do I need to be badass kernel developer to use eBPF?
NO! And this it what I’ll try to show in this article. Reading some comments regarding eBPF I see people falling into two groups: the ones who doesn’t know eBPF yet and those using it for some serious stuff and describing it with complicated language. This can be very intimidating. Concept of building a bridge between user and kernel space is very powerful and there are many complex projects around eBPF. C language is also very powerfully, but how often do you think about grep or curl internals? Guess what, eBPF has also project called BCC(BPF Compiler Collection) that has around 100 tools. To be honest BCC is made for writing eBPF programs and the tools part is “just” addition.
I will be using BCC toolkit for inspecting ansible, which is what I’m focused on at my work currently, but everything described here could be used for any other python based tools and most of it for practically any executable.
Official packages for Ubuntu seem to be outdated so I followed installation from source as described in here
Since BCC has Python front-end, most of the tools are essentially Python scripts and needs valid PYTHONPATH env variable setting. You can export it once (in. bashrc/.zshrc) to avoid missing modules problems. Assuming that you’ve installed it in your home, you should be fine with below export.
To be perfectly honest ansible execution in my case is wrapped in a script that does some extra steps. I don’t know what steps exactly, but this can be checked using execsnoop which traces all new processes.
You can easily get some info about what is happening, there is repository checkout, ansible roles download, playbook execution and it is using dynamic inventory in form of script. I’ve trimmed the output on purpose to hide some of the exact values, but with full output you would get repo names, executable versions, full paths to files and all other good stuff.
Don’t you have this uncomfortable thought for a second that whenever you install something new on your computer there is a chance that someone is gathering some info about you and sending it out?
Well, you can have similar feeling whenever you are working on a code that is not yet known to you. Ansible is all about handling configuration of remote systems, but target inventory is not the only system it connects to. Playbooks can first gather input from configuration store, secret store, infrastructure, etc. You can track all of those using tcpconnect. I changed the IPs in the output to some meaningful names.
So we know what is going on under the hood of the tool, where it is connecting to, it would be great to know which files are being used. This is exactly what opensnoop is doing. Not sure which ansible.cfg is effectively used? - use opensnoop. Looking for some variable source files? - use opensnoop. Or maybe you are just refactoring your ansible code and trying to get rid of some unused roles and plays - by seeing which files are read you can easily see what is left.
It is worth noting that you might observe limitations of some of those tools that manifests itself with entries like “Possibly lost X samples” this is sign that there are too many data coming into the script from BPF and user space script is not keeping up. Usually it happens when you started tracing too many calls, try limiting the tool to some specific PID or process name. If it is still too much the only option is to filter items on BPF side. The scripts has BPF snippets defined as strings so you can try editing it after some trial and error. I did that for opensnoop to limit the files to specific directory that contained all the playbooks. See the commented lines below.
Sometimes you want to get a little more from the underlying code. BCC tools has the ucalls utility that can track static tracing markers for different high-level languages including python (pythoncalls is a wrapper for ucalls). It allows tracking number of calls to the function and its latency. The problem is that you need to have python build from source with --with-dtrace flag to add USDT (User Statically Defined Trace) probes. Bear with me, it is not that complicated.
You can verify if everything is fine by using another BCC tool called tplist, which displays kernel tracepoints and USDT probes that we just enabled. Ucalls is using function__entry and function__return probes.
Now, when you start your script again you should see nice statistics of used functions. There are different flags that you can use with ucalls, in the following example I just used simple call count and latency.
Last thing that is missing from python script inspection perspective are the environment variables. The thing is a bit tricky, because there are possibly many ways programs are getting environment variables from the system. For C based tools it is usually getenv function from libc. So you could trace it using ltrace -e getenv <script>, or I should rather say ~/bcc/tools/trace.py 'r:c:getenv "%s=%s", arg1, retval' to trace return probes (r:) of the libc library (c:) for function getenv. We want to print first argument and return value of the function which corresponds to env var name and value - it might need some tweaking as the output is not fitting into trace output columns but you get the idea. Python is handling those differently and it loads the env variables into dictionary kept in special _Environ object.
If we are already building Python with USDT probes why not add another one dynamically to os.py module. I will use libstapsdt and python wrapper for it, called python-stapsdt. This library is dynamically creating shared library which is exposing USDT probes so that tools like trace can see markers to which it can attach to. The code that needs to be added to os.py (/usr/local/lib/pyhon3.9/os.py in my case) can be found below.
I’m importing stapsdt module, creating getenvProbe and running firing function as first thing in env variable dictionary getter. So whenever python code is trying to read env variable my probe will be fired. The extra while(not probe.is_enabled): loop allows waiting for the tracer to attach to the probe whenever PYTHON_INSPECT env variable is set. This is needed because environment variables are usually read as first thing and often BCC trace tool is not quick enough and ends up attaching after all the variables were already checked. Now you can see all the possible env variables and their values if set.
Hope that I was able to show how it is easy to use eBPF in your daily work. Obviously there are a lot more tools out there and writing your own script using BCC is probably the next step. Adding USDT probes in to the code opens the door to many possibilities regarding observability, audit and more - definitely need to think of little project regarding that.