Inspecting python tools using eBPF

Inspecting python tools using eBPF

In this article I'm describing how easy you could use bcc utilities (eBPF) to inspect tools on your system with ansible as an example.

After my first article about eBPF (Unlocking eBPF power), which was just-for-fun type of article, I wanted to start using eBPF in my daily work. I remember the situation from one of my interviews, where I was asked ‘How would I approach unknown binary/script?’. They were asking what tools I could use to get the most info about it and possibly perform some basic troubleshooting. I remember it quite vividly, mainly because I failed on it so badly. After reading this article you should probably be at much better position when dealing with similar type of problem.

Wait, what is eBPF again?

If you hear eBPF for the first time I would recommend going through (Unlocking eBPF power), where I’m describing my first contact with eBPF. For quick recap: “eBPF is a functionality of linux kernel that allows lightweight execution of user code as a response to kernel events. The events could be hardware/software events, tracing events both static (compiled into code) and dynamic (attached in runtime), etc. The code itself is limited in a sense that it is guaranteed to finish (no loops) and is verified before loading into kernel.”

Back to table of contents

Do I need to be badass kernel developer to use eBPF?

NO! And this it what I’ll try to show in this article. Reading some comments regarding eBPF I see people falling into two groups: the ones who doesn’t know eBPF yet and those using it for some serious stuff and describing it with complicated language. This can be very intimidating. Concept of building a bridge between user and kernel space is very powerful and there are many complex projects around eBPF. C language is also very powerfully, but how often do you think about grep or curl internals? Guess what, eBPF has also project called BCC(BPF Compiler Collection) that has around 100 tools. To be honest BCC is made for writing eBPF programs and the tools part is “just” addition.

I will be using BCC toolkit for inspecting ansible, which is what I’m focused on at my work currently, but everything described here could be used for any other python based tools and most of it for practically any executable.

Back to table of contents

Installation

Official packages for Ubuntu seem to be outdated so I followed installation from source as described in here

sudo apt-get -y install bison build-essential cmake flex git libedit-dev \
  libllvm6.0 llvm-6.0-dev libclang-6.0-dev python zlib1g-dev libelf-dev

git clone https://github.com/iovisor/bcc.git
mkdir bcc/build; cd bcc/build
cmake ..
make
sudo make install
cmake -DPYTHON_CMD=python3 .. # build python3 binding
pushd src/python/
make
sudo make install
popd

Since BCC has Python front-end, most of the tools are essentially Python scripts and needs valid PYTHONPATH env variable setting. You can export it once (in. bashrc/.zshrc) to avoid missing modules problems. Assuming that you’ve installed it in your home, you should be fine with below export.

export PYTHONPATH=$PYTHONPATH:~/bcc/src/python/

Back to table of contents

Inspection

execsnoop

To be perfectly honest ansible execution in my case is wrapped in a script that does some extra steps. I don’t know what steps exactly, but this can be checked using execsnoop which traces all new processes.

$ ./the_ansible_wrapper.sh & sudo ~/bcc/tools/execsnoop.py
PCOMM            PID    PPID   RET ARGS
git              2345   2333     0 /usr/bin/git checkout develop
()
ansible-galaxy   2333   2310     0 ansible-galaxy install -r ansible/playbooks/requirements.yml -f -v
()
ansible-playboo  2344   2331     0 ansible-playbook -i inventory.py some_playbook.yml
()
python3.9        2356   2351     0 /usr/local/bin/python3.9 inventory.py --list
()

You can easily get some info about what is happening, there is repository checkout, ansible roles download, playbook execution and it is using dynamic inventory in form of script. I’ve trimmed the output on purpose to hide some of the exact values, but with full output you would get repo names, executable versions, full paths to files and all other good stuff.

Back to table of contents

tcpconnect

Don’t you have this uncomfortable thought for a second that whenever you install something new on your computer there is a chance that someone is gathering some info about you and sending it out?

Well, you can have similar feeling whenever you are working on a code that is not yet known to you. Ansible is all about handling configuration of remote systems, but target inventory is not the only system it connects to. Playbooks can first gather input from configuration store, secret store, infrastructure, etc. You can track all of those using tcpconnect. I changed the IPs in the output to some meaningful names.

$ sudo ~/bcc/tools/tcpconnect.py &
$ ./the_ansible_wrapper.sh > /dev/null
ID    COMM               FD ERR PATH
PID    COMM         IP SADDR            DADDR            DPORT
6560   the_ansible_wrapper.sh   4  my_machine     vault    443 
6569   ssh          4  my_machine     github.com     22  
6586   ssh          4  my_machine     github.com     22
(...)

Back to table of contents

opensnoop

So we know what is going on under the hood of the tool, where it is connecting to, it would be great to know which files are being used. This is exactly what opensnoop is doing. Not sure which ansible.cfg is effectively used? - use opensnoop. Looking for some variable source files? - use opensnoop. Or maybe you are just refactoring your ansible code and trying to get rid of some unused roles and plays - by seeing which files are read you can easily see what is left.

$ ./the_ansible_wrapper.sh & sudo ~/bcc/tools/opensnoop.py -n ansible-pl
ID    COMM               FD ERR PATH
15638  ansible-playboo     3   0 /tmp/ansible_pdrczx2p/vault.json
(...)
15638  ansible-playboo     6   0 /tmp/ansible_pdrczx2p/ansible/group_vars/all.yaml
(...)
15638   ansible-playboo     4   0 /home/ansible/.ansible.cfg
(...)

It is worth noting that you might observe limitations of some of those tools that manifests itself with entries like “Possibly lost X samples” this is sign that there are too many data coming into the script from BPF and user space script is not keeping up. Usually it happens when you started tracing too many calls, try limiting the tool to some specific PID or process name. If it is still too much the only option is to filter items on BPF side. The scripts has BPF snippets defined as strings so you can try editing it after some trial and error. I did that for opensnoop to limit the files to specific directory that contained all the playbooks. See the commented lines below.

int trace_return(struct pt_regs *ctx)
{
    u64 id = bpf_get_current_pid_tgid();
    struct val_t *valp;
    struct data_t data = {};
    // defining target directory
    // const char searchString[] = "/tmp/ansible_";
    u64 tsp = bpf_ktime_get_ns();

    valp = infotmp.lookup(&id);
    if (valp == 0) {
        // missed entry
        return 0;
    }
    bpf_probe_read_kernel(&data.comm, sizeof(data.comm), valp->comm);
    bpf_probe_read_user(&data.fname, sizeof(data.fname), (void *)valp->fname);
    // comparing two strings if they don't match leave the function
    // if (memcmp(&data.fname, searchString,8) != 0) {
    //    return 0;
    // }

Back to table of contents

pythoncalls

Sometimes you want to get a little more from the underlying code. BCC tools has the ucalls utility that can track static tracing markers for different high-level languages including python (pythoncalls is a wrapper for ucalls). It allows tracking number of calls to the function and its latency. The problem is that you need to have python build from source with --with-dtrace flag to add USDT (User Statically Defined Trace) probes. Bear with me, it is not that complicated.

sudo apt-get install systemtap-sdt-dev
wget https://www.python.org/ftp/python/3.9.0/Python-3.9.0.tgz
tar -xf Python-3.9.0.tgz
cd Python-3.9.0/
./configure ––enable–optimizations --with-dtrace
# we want to install it locally next to existing Python
sudo make altinstall

You can verify if everything is fine by using another BCC tool called tplist, which displays kernel tracepoints and USDT probes that we just enabled. Ucalls is using function__entry and function__return probes.

$ ~/bcc/tools/tplist.py -l python3.9
/usr/local/bin/python3.9 python:line
/usr/local/bin/python3.9 python:function__entry
/usr/local/bin/python3.9 python:function__return
/usr/local/bin/python3.9 python:import__find__load__start
/usr/local/bin/python3.9 python:import__find__load__done
/usr/local/bin/python3.9 python:audit
/usr/local/bin/python3.9 python:gc__start
/usr/local/bin/python3.9 python:gc__done

Now, when you start your script again you should see nice statistics of used functions. There are different flags that you can use with ucalls, in the following example I just used simple call count and latency.

$ ./the_ansible_wrapper.sh & sudo ~/bcc/tools/pythoncalls.sh -Lm $!
METHOD                                              # CALLS TIME (ms)
(...)
/usr/local/lib/python3.9/shutil.py.rmtree                 1  65.35
/usr/local/lib/python3.9/logging/__init__.py.makeRecord     1089  66.35
/usr/local/lib/python3.9/ssl.py.read                      1  79.20
/usr/local/lib/python3.9/ssl.py.recv_into                 1  79.22
/usr/local/lib/python3.9/socket.py.readinto               1  79.24
/usr/local/lib/python3.9/http/client.py._read_status        1  79.27
/usr/local/lib/python3.9/http/client.py.begin             1  79.85
/usr/local/lib/python3.9/http/client.py.getresponse        1  79.94
(...)

Back to table of contents

trace

Last thing that is missing from python script inspection perspective are the environment variables. The thing is a bit tricky, because there are possibly many ways programs are getting environment variables from the system. For C based tools it is usually getenv function from libc. So you could trace it using ltrace -e getenv <script>, or I should rather say ~/bcc/tools/trace.py 'r:c:getenv "%s=%s", arg1, retval' to trace return probes (r:) of the libc library (c:) for function getenv. We want to print first argument and return value of the function which corresponds to env var name and value - it might need some tweaking as the output is not fitting into trace output columns but you get the idea. Python is handling those differently and it loads the env variables into dictionary kept in special _Environ object.

If we are already building Python with USDT probes why not add another one dynamically to os.py module. I will use libstapsdt and python wrapper for it, called python-stapsdt. This library is dynamically creating shared library which is exposing USDT probes so that tools like trace can see markers to which it can attach to. The code that needs to be added to os.py (/usr/local/lib/pyhon3.9/os.py in my case) can be found below.

import importlib.util
spec = importlib.util.spec_from_file_location("stapsdt", "/home/ansible/.local/lib/python3.9/site-packages/stapsdt.py")
stapsdt = importlib.util.module_from_spec(spec)
spec.loader.exec_module(stapsdt)
provider = stapsdt.Provider("pythonapp")
probe = provider.add_probe("getenvProbe", stapsdt.ArgTypes.uint64, stapsdt.ArgTypes.uint64)
provider.load()


class _Environ(MutableMapping):
    def __init__(self, data, encodekey, decodekey, encodevalue, decodevalue):
        self.encodekey = encodekey
        self.decodekey = decodekey
        self.encodevalue = encodevalue
        self.decodevalue = decodevalue
        self._data = data

    def getenv_probe(self, key):
        if(self.encodekey("PYTHON_INSPECT") in self._data):
            while(not probe.is_enabled):
                import time
                time.sleep(1)
            try:
                value = self._data[self.encodekey(key)]
                probe.fire(key, self.decodevalue(value))
            except KeyError:
                probe.fire(key, "")

    def __getitem__(self, key):
        self.getenv_probe(key) 

I’m importing stapsdt module, creating getenvProbe and running firing function as first thing in env variable dictionary getter. So whenever python code is trying to read env variable my probe will be fired. The extra while(not probe.is_enabled): loop allows waiting for the tracer to attach to the probe whenever PYTHON_INSPECT env variable is set. This is needed because environment variables are usually read as first thing and often BCC trace tool is not quick enough and ends up attaching after all the variables were already checked. Now you can see all the possible env variables and their values if set.

$ ./the_ansible_wrapper.sh & sudo ~/bcc/tools/trace.py -p $! 'u::getenvProbe "%s=%s", arg1, arg2'
(...)
6686    6686    ansible-playboo getenvProbe      ANSIBLE_SKIP_TAGS=
6686    6686    ansible-playboo getenvProbe      ANSIBLE_USE_PERSISTENT_CONNECTIONS=
6686    6686    ansible-playboo getenvProbe      ANSIBLE_PRECEDENCE=
6686    6686    ansible-playboo getenvProbe      ANSIBLE_YAML_FILENAME_EXT=
6686    6686    ansible-playboo getenvProbe      ANSIBLE_NETCONF_SSH_CONFIG=
6686    6686    ansible-playboo getenvProbe      ANSIBLE_STRING_CONVERSION_ACTION=
6686    6686    ansible-playboo getenvProbe      ANSIBLE_VERBOSE_TO_STDERR=
6686    6686    ansible-playboo getenvProbe      LANG=C.UTF-8
6686    6686    ansible-playboo getenvProbe      PATH=/home/ansible/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin
(...)

Back to table of contents

Conclusion

Hope that I was able to show how it is easy to use eBPF in your daily work. Obviously there are a lot more tools out there and writing your own script using BCC is probably the next step. Adding USDT probes in to the code opens the door to many possibilities regarding observability, audit and more - definitely need to think of little project regarding that.

Back to table of contents