Summary

I currently have a patch that needs review that adds a new drop-in replacement for ovs_lib that uses openvswitch’s python bindings to make OVSDB calls instead of running ovs-vsctl. Here is the spec that was approved for Juno which I will need to update for Kilo.

Both the current ovs_lib and ovs-vsctl seem to scale quadratically with the number of ports on a system where ovs_lib2 scales linearly.

Please take a look at the review and make suggestions. There’s still some stuff to do, but it should be in a testable state.

Benchmarking

Test setup

Test setup is just a devstack VM with the dummy network kernel module loaded and set to create 1000 dummy devices. Create /etc/modprobe.d/dummy.conf with:

options dummy numdummies=1000

and then:

sudo modprobe dummy

Baseline - bash and ovs-vsctl

First, let’s get rid of the necessity for using sudo by just quickly doing:

sudo chmod a+rwx /var/run/openvswitch/db.sock

Now we can test adding 100 ports w/o sudo overhead:

[terry@localhost neutron]$ ovs-vsctl del-br testbr -- add-br testbr
[terry@localhost neutron]$ time (for ((i=0;i<100;i++));do ovs-vsctl add-port testbr dummy${i};done)

real    0m1.389s

So that isn’t too bad, actually. What happens if we use sudo?

[terry@localhost neutron]$ ovs-vsctl del-br testbr -- add-br testbr
[terry@localhost neutron]$ time (for ((i=0;i<100;i++));do sudo ovs-vsctl add-port testbr dummy${i};done)

real    0m6.513s

So we’re about 5x slower just having to use sudo from the CLI. What about rootwrap?

[terry@localhost neutron]$ ovs-vsctl del-br testbr -- add-br testbr
[terry@localhost neutron]$ time (for ((i=0;i<100;i++));do sudo neutron-rootwrap /etc/neutron/rootwrap.conf ovs-vsctl add-port testbr dummy${i};done)

real    0m26.869s

Using sudo rootwrap is around 20x slower than the baseline of using no privilege escalation tool at all.

Now, what about adding 1000 ports? Does it scale linearly? Do we get around 13 seconds for adding 1000 ports with no sudo?

[terry@localhost neutron]$ ovs-vsctl del-br testbr -- add-br testbr
[terry@localhost neutron]$ time (for ((i=0;i<1000;i++));do ovs-vsctl add-port testbr dummy${i};done)

real    1m11.138s

No, we do not. ovs-vsctl does a dump of most of the database each time it runs. The more ports in the DB, the slower each successive call will be.

Testing ovs_lib1 against ovs_lib2

Here is a simple script to benchmark ovs_lib1 against ovs_lib2.

import logging
import time

from eventlet import greenpool
from neutron.agent.linux.ovs_lib import ovs_lib as ovs_lib1
from neutron.agent.linux.ovs_lib import ovs_lib2

logging.basicConfig()

def add_and_delete(br, iface):
    br.add_port(iface)
    br.delete_port(iface)

pool = greenpool.GreenPool()

for ovs_lib in (ovs_lib1, ovs_lib2):
    with ovs_lib.OVSBridge('test1', 'sudo') as br:
        start = time.time()
        for i in range(100):
            iface = "dummy%d" % i
            pool.spawn_n(add_and_delete, br, iface)
        pool.waitall()
        print ovs_lib.__name__, time.time() - start

which results in:

[terry@localhost neutron]$ python test.py
neutron.agent.linux.ovs_lib.ovs_lib 11.2306790352
neutron.agent.linux.ovs_lib.ovs_lib2 1.48559379578

So calling ovs-vsctl directly with sudo seems to be about 2x as fast as using ovs_lib1 to do the same thing and using ovs_lib2 is roughly the same speed as calling ovs-vsctl directly without sudo.

What about with 1000 ports instead? Does ovs_lib2 scale better than ovs-vsctl? Let’s bump the range(100) to range(1000) and remove the greenpool stuff (1000 spawns is just going to cause ovs-vsctl timeouts and open file descriptor errors) and see:

[terry@localhost neutron]$ python test.py
neutron.agent.linux.ovs_lib.ovs_lib 169.324312925
neutron.agent.linux.ovs_lib.ovs_lib2 16.3264598846

Yes! ovs_lib2 does scale linearly. Even though OVS’s python IDL library does cache the database, it does it upon connection and ovs_lib2 maintains that connection and reuses it.

Caveats

If we go this route, we’ll have to talk about the recommended way for handling privileges. This could be done via connecting to openvswitch via TCP/SSL and controlling access via firewall rules and/or having deployment tools/packaging modifying the owner/permissions of the ovsdb unix socket.

Conclusion

Even without the overhead of sudo or rootwrap, there is room for dramatically improving performance of OVSDB operations by moving away from calling ovs-vsctl. Please take a look at the review even though I have it marked as Work In Progress. I crave your feedback!