KSysGuard Network Plugin: Don't use std::regex to parse the network files.
Needs RevisionPublic

Authored by dkorth on May 17 2020, 5:15 PM.

Details

Reviewers
ahiemstra
Summary

std::regex is slow, so use fixed-field parsing instead.

This is an initial version that I've been using for a while with no major issues, though it might still need some improvements. CPU usage is significantly decreased compared to before when there is a lot of network activity.

Test Plan

Run programs that generate a lot of network activity.
Verify the network activity shows up correctly with this patch.

Diff Detail

Repository
R106 KSysguard
Lint
Lint Skipped
Unit
Unit Tests Skipped
dkorth created this revision.May 17 2020, 5:15 PM
Restricted Application added a project: Plasma. · View Herald TranscriptMay 17 2020, 5:15 PM
Restricted Application added a subscriber: plasma-devel. · View Herald Transcript
dkorth requested review of this revision.May 17 2020, 5:15 PM
ahiemstra requested changes to this revision.May 18 2020, 11:58 AM
ahiemstra added a subscriber: ahiemstra.

Hmm, this is good idea, I'm not really happy about the implementation though. It's all C code whereas the rest of the helper is C++. It also relies very heavily on magic numbers now.

I think a much simpler implementation would be to split each line on " ", select the fields we want and clean them up.

This revision now requires changes to proceed.May 18 2020, 11:58 AM

It's all C code whereas the rest of the helper is C++. It also relies very heavily on magic numbers now.

I think a much simpler implementation would be to split each line on " ", select the fields we want and clean them up.

I assume this is for performance reasons, but a tiny microbenchmark showing that it is actually faster would be nice.

sandsmark added inline comments.May 20 2020, 8:18 AM
plugins/process/network/helper/ConnectionMapping.cpp
157

if we're going for ultra optimized, const everything I guess.

167

instead of 87 + 1, maybe have a constexpr int lineLength = strlen("0: 00000000000000000000000001000000:0277 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 16741"); above with the comment?
or:

constexpr const char *exampleLine = "0: 00000000000000000000000001000000:0277 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 16741";
    constexpr int lineLength = strlen(exampleLine);

both more understandable and less prone to accidentally putting in the wrong number.

same with all other magic string offset numbers here.

176

strtoul stops at the first non-digit, so is this really necessary?

203

I prefer the more c++-ish cast of uint32_t(foo); (I think I got that from the Qt code style guidelines).

sandsmark added inline comments.May 20 2020, 8:22 AM
plugins/process/network/helper/ConnectionMapping.cpp
167

To see that it is actually optimized properly: https://godbolt.org/z/sL2_pK

It's all C code whereas the rest of the helper is C++. It also relies very heavily on magic numbers now.

I think a much simpler implementation would be to split each line on " ", select the fields we want and clean them up.

I assume this is for performance reasons, but a tiny microbenchmark showing that it is actually faster would be nice.

Hi! Just out of curiosity (I'm not the OP), I created a microbenchmark here: https://github.com/jpalecek/ksysguard-network-microbenchmark. As a fun project, I added a Boost.Sphinx implementation, which was remarkably easy. Not the most readable code ever, beware. Some bugs were found while doing this (in both old and new implementation), see the code.

plugins/process/network/helper/ConnectionMapping.cpp
167

Actually I don't agree with that view. I don't think lineLength = strlen("0: 00000000000000000000000...) is readable and less error prone than 87`, only longer by ~100 characters and a nightmare to check. With numbers, I can check with an editor that shows column numbers and simple arithmetic, but comparing such long strings made of spaces is hard.

If you want more robustness, I would suggest using the fact that the number columns are aligned with the header and using something like inode_col = header.find("inode"). Or counting the fields. That would also take care of the possibility that the columns may not be same width across architectures (?)

The original version I wrote a few months back as a quick fix to reduce CPU usage. I do agree that the magic numbers shouldn't be hard-coded, since that's not easily maintained, though I'm not sure of the best approach. constexpr strlen() seems like it'd work, or alternatively, splitting the line on spaces.

I'll look into rewriting it to be more robust and submit an update.