2018-08-29
In the last article, I write something about the software interrupt virtualization, the implementation of pic/ioapic/apic emulation in kvm. We know that in software emulation, for every guest-interrupt a VM-exit is needed, this is a very remarkable overhead for virtualization. As no surprise, the Intel has sollutions to solve this issue. This is called APIC virtualization.
Before we go to the APIC virtualization, we need first know something about local apic. The local APIC and IO APIC is for interrupt delivery in multi processors. Following picture shows the relations between IO APIC and local APIC.
In a word, every CPU has an accompanying local APIC (LAPIC) and the IOAPIC is used to dispatch interrupt to the LAPIC.
Software interacts with the local APIC by reading and writing its registers. APIC registers are memory-mapped to a 4-KByte region of the processor’s physical address space with an initial starting address of FEE00000H. For correct APIC operation, this address space must be mapped to an area of memory that has been designated as strong uncacheable (UC).
Here we should notice the FEE00000H is the physical address space, not the physical memory. What is the difference? I think physical address space is from CPU perspective. When the CPU reads/writes the APIC registers, it will process by the APIC just like intercept and will never go to the memory address. Though there is one LAPCI per CPU core, and they all map to the same address, when the CPU reads/writes this address, it will just access his own LAPIC and there is no conflicts.
So how to implement the feature in virtualization. I mean every VCPU can access their own physical address with the same address, but get the private data belong to the VCPU. Let’s first look at the qemu’s implementation. In LAPIC realize:
static void apic_realize(DeviceState *dev, Error **errp)
{
APICCommonState *s = APIC(dev);
if (s->id >= MAX_APICS) {
error_setg(errp, "%s initialization failed. APIC ID %d is invalid",
object_get_typename(OBJECT(dev)), s->id);
return;
}
memory_region_init_io(&s->io_memory, OBJECT(s), &apic_io_ops, s, "apic-msi",
APIC_SPACE_SIZE);
s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, apic_timer, s);
local_apics[s->id] = s;
msi_nonbroken = true;
}
We can see, the creating apic is stored in a global variable ‘local_apics’. In the access function, the function first need to decide the VCPU which is accessing the registers.
static void apic_mem_writel(void *opaque, hwaddr addr, uint32_t val)
{
DeviceState *dev;
APICCommonState *s;
int index = (addr >> 4) & 0xff;
if (addr > 0xfff || !index) {
/* MSI and MMIO APIC are at the same memory location,
* but actually not on the global bus: MSI is on PCI bus
* APIC is connected directly to the CPU.
* Mapping them on the global bus happens to work because
* MSI registers are reserved in APIC MMIO and vice versa. */
MSIMessage msi = { .address = addr, .data = val };
apic_send_msi(&msi);
return;
}
dev = cpu_get_current_apic();
if (!dev) {
return;
}
s = APIC(dev);
}
The idea behind qemu is easy, first get the current VCPU and then access his lapic. But how can this be done in APIC virtualization. How can CPU implement this without VM-exit. The secret is APIC-access page and virtual-APIC page. Here I will not go to the complicated detail of these two VMCS field. Just treat the virtual-APIC page as the shadow page of APIC-access page. And APIC-access page is for a VM, virtual-APIC page is for a VCPU. In fully APIC virtualization, When the guest access the APIC-access page the CPU will return the corresponding data in the virtual-APIC page.
The APIC-access page is set in ‘kvm->kvm_arch->apic_access_page’ and allocated in ‘alloc_apic_access_page’:
static int alloc_apic_access_page(struct kvm *kvm)
{
struct page *page;
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
mutex_lock(&kvm->slots_lock);
if (kvm->arch.apic_access_page)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr = 0xfee00000ULL;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
if (r)
goto out;
page = gfn_to_page(kvm, 0xfee00);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
}
kvm->arch.apic_access_page = page;
out:
mutex_unlock(&kvm->slots_lock);
return r;
}
Here we allocates the memslot of 0xfee00000 and set this to ‘apic_access_page’. The virtual-apic page is based in ‘kvm_lapic->regs’ and is allocated in :
int kvm_create_lapic(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic;
ASSERT(vcpu != NULL);
apic_debug("apic_init %d\n", vcpu->vcpu_id);
apic = kzalloc(sizeof(*apic), GFP_KERNEL);
if (!apic)
goto nomem;
vcpu->arch.apic = apic;
apic->regs = (void *)get_zeroed_page(GFP_KERNEL);
if (!apic->regs) {
printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
vcpu->vcpu_id);
goto nomem_free_apic;
}
apic->vcpu = vcpu;
...
}
Then in ‘vmx_vcpu_reset’, it writes the APIC-access page and virtual-apic page to VMCS.
static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
if (cpu_has_vmx_tpr_shadow()) {
vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0);
if (vm_need_tpr_shadow(vmx->vcpu.kvm))
vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
__pa(vmx->vcpu.arch.apic->regs));
vmcs_write32(TPR_THRESHOLD, 0);
}
if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm))
vmcs_write64(APIC_ACCESS_ADDR,
page_to_phys(vmx->vcpu.kvm->arch.apic_access_page));
if (vmx_vm_has_apicv(vcpu->kvm))
memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));
if (vmx->vpid != 0)
vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
vpid_sync_context(vmx);
}
When the guest access the APIC register(from base 0xfee00000) it will then access the virtual-APIC page of the corresponding VCPU.
Later article will discuss the virtual interrupt delivery in APIC virtualization.
https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/296237