So when I tried to install some packages on the gym server, I got this error:
(base) tk@gym:/scratch/Downloads$sudoapt-get-yinstallcudaReadingpackagelists...DoneBuildingdependencytreeReadingstateinformation...Donecudaisalreadythenewestversion (12.2.2-1).0upgraded,0newlyinstalled,0toremoveand0notupgraded.8notfullyinstalledorremoved.Afterthisoperation,0Bofadditionaldiskspacewillbeused.Settingupnvidia-compute-utils-535 (535.104.05-0ubuntu1) ...Warning:Thehomedir/nonexistentyouspecifiedcan't be accessed: No such file or directoryAdding system user `nvidia-persistenced' (UID 136) ...Addingnewgroup`nvidia-persistenced' (GID -1) ...groupadd: invalid group ID '-1'adduser: `/sbin/groupadd -g -1 nvidia-persistenced'returnederrorcode3.Exiting.dpkg:errorprocessingpackagenvidia-compute-utils-535 (--configure):installednvidia-compute-utils-535packagepost-installationscriptsubprocessreturnederrorexitstatus1dpkg:dependencyproblemspreventconfigurationofcuda-drivers-535:cuda-drivers-535dependsonnvidia-compute-utils-535 (>= 535.104.05); however:Packagenvidia-compute-utils-535isnotconfiguredyet.
Thanks to Kosta for explaning, it seems that the main issue is that linux cannot get a free user group ID for the daemon process that is performing the installation. This only happens on EECS servers since the computers here connects to the LDAP server to manage network account stuff. Their server could not respond a new available group id, since there are a ton of group entires within EECS.
To solve this, we need to disconnect the network group service temporarily.
This is configured in the /etc/nsswitch.conf file:
GNUnano4.8/etc/nsswitch.conf# /etc/nsswitch.conf## Example configuration of GNU Name Service Switch functionality.# If you have the `glibc-doc-reference' and `info' packages installed, try:# `info libc "Name Service Switch"' for information about this file.passwd:filesssssystemdgroup:filessystemdshadow:filessssgshadow:fileshosts:filesmdns4_minimal [NOTFOUND=return] dnsnetworks:filesprotocols:dbfilesservices:dbfilessssethers:dbfilesrpc:dbfilesnetgroup:filesssssudoers:filesautomount:filessss
sss is the EECS network account service. We need to remove it from the group entry.
Now it fixes the issue, and apt can install packages correctly.