85 min listen
Episode 240: TCP Blackbox Recording | BSD Now 240
FromBSD Now
ratings:
Length:
99 minutes
Released:
Apr 7, 2018
Format:
Podcast episode
Description
New ZFS features landing in FreeBSD, MAP_STACK for OpenBSD, how to write safer C code with Clang’s address sanitizer, Michael W. Lucas on sponsor gifts, TCP blackbox recorder, and Dell disk system hacking.
Headlines
[A number of Upstream ZFS features landed in FreeBSD this week]
9188 increase size of dbuf cache to reduce indirect block decompression
With compressed ARC (6950) we use up to 25% of our CPU to decompress indirect blocks, under a workload of random cached reads. To reduce this decompression cost, we would like to increase the size of the dbuf cache so that more indirect blocks can be stored uncompressed.
If we are caching entire large files of recordsize=8K, the indirect blocks use 1/64th as much memory as the data blocks (assuming they have the same compression ratio). We suggest making the dbuf cache be 1/32nd of all memory, so that in this scenario we should be able to keep all the indirect blocks decompressed in the dbuf cache. (We want it to be more than the 1/64th that the indirect blocks would use because we need to cache other stuff in the dbuf cache as well.)
In real world workloads, this won't help as dramatically as the example above, but we think it's still worth it because the risk of decreasing performance is low. The potential negative performance impact is that we will be slightly reducing the size of the ARC (by ~3%).
9166 zfs storage pool checkpoint
The idea of Storage Pool Checkpoint (aka zpool checkpoint) deals with exactly that. It can be thought of as a “pool-wide snapshot” (or a variation of extreme rewind that doesn’t corrupt your data). It remembers the entire state of the pool at the point that it was taken and the user can revert back to it later or discard it. Its generic use case is an administrator that is about to perform a set of destructive actions to ZFS as part of a critical procedure. She takes a checkpoint of the pool before performing the actions, then rewinds back to it if one of them fails or puts the pool into an unexpected state. Otherwise, she discards it. With the assumption that no one else is making modifications to ZFS, she basically wraps all these actions into a “high-level transaction”.
More information
8484 Implement aggregate sum and use for arc counters
In pursuit of improving performance on multi-core systems, we should implements fanned out counters and use them to improve the performance of some of the arc statistics. These stats are updated extremely frequently, and can consume a significant amount of CPU time.
And a small bug fix authored by me:
9321 arcloancompressedbuf() can increment arcloanedbytes by the wrong value
arcloancompressedbuf() increments arcloanedbytes by psize unconditionally In the case of zfscompressedarcenabled=0, when the buf is returned via arcreturnbuf(), if ARCBUFCOMPRESSED(buf) is false, then arcloanedbytes is decremented by lsize, not psize.
Switch to using arcbufsize(buf), instead of psize, which will return psize or lsize, depending on the result of ARCBUF_COMPRESSED(buf).
MAP_STACK for OpenBSD
Almost 2 decades ago we started work on W^X. The concept was simple. Pages that are writable, should not be executable. We applied this concept object by object, trying to seperate objects with different qualities to different pages. The first one we handled was the signal trampoline at the top of the stack. We just kept making changes in the same vein. Eventually W^X came to some of our kernel address spaces also.
The fundamental concept is that an object should only have the
permissions necessary, and any other operation should fault. The only permission separations we have are kernel vs userland, and then read, write, and execute.
How about we add another new permission! This is not a hardware permission, but a software permission. It is opportunistically enforced by the kernel.
the permission is MAPSTACK. If you want to use memory as a stack, you must mmap it with that flag bit. The kernel does so au
Headlines
[A number of Upstream ZFS features landed in FreeBSD this week]
9188 increase size of dbuf cache to reduce indirect block decompression
With compressed ARC (6950) we use up to 25% of our CPU to decompress indirect blocks, under a workload of random cached reads. To reduce this decompression cost, we would like to increase the size of the dbuf cache so that more indirect blocks can be stored uncompressed.
If we are caching entire large files of recordsize=8K, the indirect blocks use 1/64th as much memory as the data blocks (assuming they have the same compression ratio). We suggest making the dbuf cache be 1/32nd of all memory, so that in this scenario we should be able to keep all the indirect blocks decompressed in the dbuf cache. (We want it to be more than the 1/64th that the indirect blocks would use because we need to cache other stuff in the dbuf cache as well.)
In real world workloads, this won't help as dramatically as the example above, but we think it's still worth it because the risk of decreasing performance is low. The potential negative performance impact is that we will be slightly reducing the size of the ARC (by ~3%).
9166 zfs storage pool checkpoint
The idea of Storage Pool Checkpoint (aka zpool checkpoint) deals with exactly that. It can be thought of as a “pool-wide snapshot” (or a variation of extreme rewind that doesn’t corrupt your data). It remembers the entire state of the pool at the point that it was taken and the user can revert back to it later or discard it. Its generic use case is an administrator that is about to perform a set of destructive actions to ZFS as part of a critical procedure. She takes a checkpoint of the pool before performing the actions, then rewinds back to it if one of them fails or puts the pool into an unexpected state. Otherwise, she discards it. With the assumption that no one else is making modifications to ZFS, she basically wraps all these actions into a “high-level transaction”.
More information
8484 Implement aggregate sum and use for arc counters
In pursuit of improving performance on multi-core systems, we should implements fanned out counters and use them to improve the performance of some of the arc statistics. These stats are updated extremely frequently, and can consume a significant amount of CPU time.
And a small bug fix authored by me:
9321 arcloancompressedbuf() can increment arcloanedbytes by the wrong value
arcloancompressedbuf() increments arcloanedbytes by psize unconditionally In the case of zfscompressedarcenabled=0, when the buf is returned via arcreturnbuf(), if ARCBUFCOMPRESSED(buf) is false, then arcloanedbytes is decremented by lsize, not psize.
Switch to using arcbufsize(buf), instead of psize, which will return psize or lsize, depending on the result of ARCBUF_COMPRESSED(buf).
MAP_STACK for OpenBSD
Almost 2 decades ago we started work on W^X. The concept was simple. Pages that are writable, should not be executable. We applied this concept object by object, trying to seperate objects with different qualities to different pages. The first one we handled was the signal trampoline at the top of the stack. We just kept making changes in the same vein. Eventually W^X came to some of our kernel address spaces also.
The fundamental concept is that an object should only have the
permissions necessary, and any other operation should fault. The only permission separations we have are kernel vs userland, and then read, write, and execute.
How about we add another new permission! This is not a hardware permission, but a software permission. It is opportunistically enforced by the kernel.
the permission is MAPSTACK. If you want to use memory as a stack, you must mmap it with that flag bit. The kernel does so au
Released:
Apr 7, 2018
Format:
Podcast episode
Titles in the series (100)
Episode 243: Understanding The Scheduler | BSD Now 243: OpenBSD 6.3 and DragonflyBSD 5.2 are released, bug fix for disappearing files in OpenZFS on Linux (and only Linux), understanding the FreeBSD CPU scheduler, NetBSD on RPI3, thoughts on being a committer for 20 years, and 5 reasons to use FreeBSD in 2018. by BSD Now