93 min listen
Virtualizing Time
ratings:
Length:
66 minutes
Released:
Jun 12, 2023
Format:
Podcast episode
Description
Jordan Hendricks joined Bryan and Adam to talk about her work virtualizing time--particularly challenging when migrating virtual machines from one physical machine to another!We've been hosting a live show weekly on Mondays at 5p for about an hour, and recording them all; here is the recording from June 12th, 2023.In addition to Bryan Cantrill and Adam Leventhal, we were joined by Oxide colleague Jordan Hendricks.The (lightly edited) live chat from the show:
DanCrossNYC: The TSC ticks at a fixed rate now days, regardless of voltage scaling on the CPU.
jbk: just x86 doesn't provide a consistent want to determine what the rate is
jbk: (I guess some chips will tell you via CPUID, but I've yet to actually encounter such chips)
jbk: some hypervisors will tell you via an MSR
zorg24: Looks the Linux kernel docs have some documentation on the x86 TSC and PIT https://www.kernel.org/doc/html/next/virt/kvm/x86/timekeeping.html
DanCrossNYC: CPUID or an MSR, but yeah, most systems sample over a fixed interval (determined by another time source) to figure it out.
jbk: no, versus some other present component that allows you to measure the frequency
DanCrossNYC: No, the PIT or HPET or something.
jbk: https://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/tscc_pit.c?r=236cb9a8
jbk: is how it uses the PIT
jbk: (the HPET code needs to improve it's accuracy, so it's only used when the PIT isn't there at the moment)
jbk: some Intel NUCs have no PIT
jbk: so HPET is the only option
bcantrill: https://github.com/illumos/illumos-gate/commit/717646f7112314de3f464bc0b75f034f009c861e
DanCrossNYC: Two big ones: system maintenance without disturbing guest workloads, and also load balancing across a rack.
"Sevan: ah, thanks.
https://github.com/illumos/illumos-gate/blob/717646f7112314de3f464bc0b75f034f009c861e/usr/src/test/bhyve-tests/tests/common/common.c#L166"
bcantrill: https://github.com/oxidecomputer/tsc-simulator/tree/master
DanCrossNYC: The guest may well be running NTP itself.
iangrunert: I assume you could also check that NTP is alive / has synced recently before doing a migration right?
aka_pugs: Do people use IEEE 1588/PTP in datacenters? Maybe finance wackos?
zorg24: also it might be tricky to check if NTP synced recently if it is happening in usermode
iangrunert: Might've missed this - is it just the hypervisor that has to run NTP recently or the VM as well?
saone: I believe it was just the hypervisor
DanCrossNYC: The host.
DanCrossNYC: A guest may or may not; that's up to the guest.
jbk: but IIUC, if the guest IS running NTP, then the host definitely needs it to avoid any time warps
DanCrossNYC: Yup.
DanCrossNYC: Fortunately, there's a bit of an out for the blackout window during migration: SMM mode can effectively pause a machine for an indefinite period of time.
DanCrossNYC: We don't USE SMM anywhere, but robust systems software kinda needs to handle the case where the machine goes out to lunch for a minute.
zorg24: ? hooray for hardware with no SMM use
DanCrossNYC: We have done everything we can to turn it off.
ahl: https://github.com/dtolnay/case-studies/blob/master/autoref-specialization/README.md
ahl: https://github.com/oxidecomputer/propolis
earltea: it worked so well I almost thought the VM didn't migrate ?
saone: It's easy to forget that there's a world outside the cloud, but edge deployments that have physical peripherals hooked up need to maintain those connections to peripherals; migrating those peripherals to cloud environments and managing that integration has been a big challenge for my group.
iangrunert: https://signalsandthreads.com/clock-synchronization/ Good listen about clock synchronization and PTP in the ""finance weirdos"" world. MiFID 2 time sync requirements require timestamping key trading event records to within 100 microseconds of UTC.
jhendricks: a bit belated, but the propolis side of these changes: https://github.com/oxidecomputer/propolis/commit/7ed480843d
DanCrossNYC: The TSC ticks at a fixed rate now days, regardless of voltage scaling on the CPU.
jbk: just x86 doesn't provide a consistent want to determine what the rate is
jbk: (I guess some chips will tell you via CPUID, but I've yet to actually encounter such chips)
jbk: some hypervisors will tell you via an MSR
zorg24: Looks the Linux kernel docs have some documentation on the x86 TSC and PIT https://www.kernel.org/doc/html/next/virt/kvm/x86/timekeeping.html
DanCrossNYC: CPUID or an MSR, but yeah, most systems sample over a fixed interval (determined by another time source) to figure it out.
jbk: no, versus some other present component that allows you to measure the frequency
DanCrossNYC: No, the PIT or HPET or something.
jbk: https://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/tscc_pit.c?r=236cb9a8
jbk: is how it uses the PIT
jbk: (the HPET code needs to improve it's accuracy, so it's only used when the PIT isn't there at the moment)
jbk: some Intel NUCs have no PIT
jbk: so HPET is the only option
bcantrill: https://github.com/illumos/illumos-gate/commit/717646f7112314de3f464bc0b75f034f009c861e
DanCrossNYC: Two big ones: system maintenance without disturbing guest workloads, and also load balancing across a rack.
"Sevan: ah, thanks.
https://github.com/illumos/illumos-gate/blob/717646f7112314de3f464bc0b75f034f009c861e/usr/src/test/bhyve-tests/tests/common/common.c#L166"
bcantrill: https://github.com/oxidecomputer/tsc-simulator/tree/master
DanCrossNYC: The guest may well be running NTP itself.
iangrunert: I assume you could also check that NTP is alive / has synced recently before doing a migration right?
aka_pugs: Do people use IEEE 1588/PTP in datacenters? Maybe finance wackos?
zorg24: also it might be tricky to check if NTP synced recently if it is happening in usermode
iangrunert: Might've missed this - is it just the hypervisor that has to run NTP recently or the VM as well?
saone: I believe it was just the hypervisor
DanCrossNYC: The host.
DanCrossNYC: A guest may or may not; that's up to the guest.
jbk: but IIUC, if the guest IS running NTP, then the host definitely needs it to avoid any time warps
DanCrossNYC: Yup.
DanCrossNYC: Fortunately, there's a bit of an out for the blackout window during migration: SMM mode can effectively pause a machine for an indefinite period of time.
DanCrossNYC: We don't USE SMM anywhere, but robust systems software kinda needs to handle the case where the machine goes out to lunch for a minute.
zorg24: ? hooray for hardware with no SMM use
DanCrossNYC: We have done everything we can to turn it off.
ahl: https://github.com/dtolnay/case-studies/blob/master/autoref-specialization/README.md
ahl: https://github.com/oxidecomputer/propolis
earltea: it worked so well I almost thought the VM didn't migrate ?
saone: It's easy to forget that there's a world outside the cloud, but edge deployments that have physical peripherals hooked up need to maintain those connections to peripherals; migrating those peripherals to cloud environments and managing that integration has been a big challenge for my group.
iangrunert: https://signalsandthreads.com/clock-synchronization/ Good listen about clock synchronization and PTP in the ""finance weirdos"" world. MiFID 2 time sync requirements require timestamping key trading event records to within 100 microseconds of UTC.
jhendricks: a bit belated, but the propolis side of these changes: https://github.com/oxidecomputer/propolis/commit/7ed480843d
Released:
Jun 12, 2023
Format:
Podcast episode
Titles in the series (100)
A Requiem for SPARC with Tom Lyon by Oxide and Friends