Sorry for the delay. I'm just back from holidays.
Currently over-commitment of extended resources isn't supported by K8s. Once allocated a device stops being available, the scheduler simply can't schedule a pod on the same node (given there is one device on the node). Semantically it would make more sense if a user could request a fraction of a device, but this contradicts to the spec for K8s extended resources.
But we could advertise one physical device as many virtual GPUs though, e.g. 10 of
gpu.intel.com/i915-one-tenth. I think the exact number of such virtual devices per physical one could be configured with the plugin's command line options or node attributes.