qyliss changed the topic of #spectrum to: A compartmentalized operating system | https://spectrum-os.org/ | Logs: https://logs.spectrum-os.org/spectrum/
andi- has quit [Remote host closed the connection]
andi- has joined #spectrum
<qyliss> jpo_ had a very good argument for why using file systems instead of block devices would be a bad idea, which I would have liked to write down but I forgot what exactly it was
<qyliss> There's definitely value to be had from simplicity, though, and a block device is simpler than a filesystem.
<lejonet> qyliss: even more so from the fact that there are plenty of ways to "send" a block device over the network, from "simple" to "holy hell complex"
<lejonet> Sure, there is AFS, NFS, CIFS etc etc of networked filesystems, but they definitively have their problems, especially in how they handle temp network outtages and similar
<lejonet> (not to mention that they are quite complex, even NFS when you take into account the whole "requires that uid/gids be shared between all systems that mount the share")
<IdleBot_bf4161f7> I though network outages are exactly the problem we do not care about?
<IdleBot_bf4161f7> Block devices have this drawback that it is hard to share a subtree
<IdleBot_bf4161f7> Sending block devices over a network is only simple when you presuppose reliable networking anyway…
<IdleBot_bf4161f7> But in general, block device sharing sounds like «let us make Android, only out of desktop apps and with more careful isolation, but with all the most annoying limitations intact»
<qyliss> You can easily move a subtree into a new block device
<qyliss> Network outages shouldn't be a problem, correct.
<IdleBot_bf4161f7> Cross-device move is not a completely free action, and it creates a need to pass more block devices to all the machines that need access to the larger tree, and block device space management becomes annoying quickly…
<lejonet> The thing that block devices usually are better at, than network filesystems, is the whole caching and correctly handling read/write errors (as in make them hard errors, instead of soft errors that hangs the process trying to read/write *glares at NFS*)
<lejonet> ZFS + ZVOLs might be something to consider (or at least the idea of abstract datasets that can be presented in different ways)
<lejonet> IdleBot_bf4161f7: you do have a point in the fact that we should keep it as simple as we can, maybe its easier to both secure and ensure reliability with like a unix-socket-like transfer, as everything is presumed to be local anyway
<lejonet> There is probably a way (that I'm just not aware of) to take the pipe part of veth pairs without needing the networking above it
<IdleBot_bf4161f7> What is obviously easier with jails, is that I can have just bind mounts for free (and for network access in both directions I can use unix domain sockets with proper naming, not juggle IPv4 stuff)
<IdleBot_bf4161f7> But yes, for good security we want a unique-UID jailed VM process with a non-root process inside the VM, I guess
<IdleBot_bf4161f7> Speaking of block device based system: there are also physical external devices, and giving access to a subdirectory there makes a lot of sense, and the flash drive should be usable in a printer afterwards
<lejonet> Indeed, because jails are in the same kernelspace as the host, therefor there are plenty of ways for them to interact without needing to resort to networking at all
<qyliss> External storage is a good point.
<lejonet> honestly tho, I would say the "pass around everything as block device" would be a good abstraction when external storage comes into question
<lejonet> But it doesn't come without strings attached as discussed earlier
<IdleBot_bf4161f7> Do assume cherry-picking in my arguing, but I can also notice that attack-surface-wise I would prefer to use FS passing for shared stuff, and block devices only for stuff ever mounted by a single VM and not even the host.
<IdleBot_bf4161f7> If we can trust VM kernel, it handles 9P benignly; but for crafted FS corruption on block devices it is enough to have a root elevation inside the VM
<lejonet> IdleBot_bf4161f7: Fully understandable, FS passing doesn't necessarily (on the host side) require any drivers or code with elevated privileges invoked, while block device passing certainly will require that
<IdleBot_bf4161f7> It is true that apparently crashing Qemu by exploiting one more race condition in its 9P implementation is not that hard (maybe even without gaining root), but I cannot find anything with a realistic risk of cross-VM attack via shared subdirectory
<lejonet> Yeah, iirc those crashes just simple crashes the VM, so sure, it would mean a local DoS but that usually don't really give you anything meaningful unless you're out to annoy
<IdleBot_bf4161f7> For the record: block devices passed to qemu might be plain files just ACLed to the VM-running unique UID, so it is not host-side privilege escalation
<lejonet> simply*
<IdleBot_bf4161f7> In SpectrumOS context: misbehaving application isolated inside a VM can crash the VM where just this application runs!
<lejonet> Indeed true, coupled with some SELinux labeling and such similar to what rkt/docker can do, if we want to utilize such complex framework
<lejonet> Yeah, it would mitigate the effect of the DoS quite drastically
<IdleBot_bf4161f7> Also we might get lucky and find a 9P server implementation actually intended for life in a weakly-controlled LAN — this is very likely to have better security than FS mounting code
<qyliss> it does sound like 9p might be the way to go after all
<lejonet> I concur
<lejonet> From a security standpoint, it seems like it would be easier to audit a 9P codebase than an entire block device stack
<lejonet> (or code one from scratch if so inclined, but that might be a future plan instead of an immediate one)
<IdleBot_bf4161f7> I do not have any arguments against sometimes providing a huge file as a block device when the device is not intended to be ever reused with a different VM. Like, a PostgreSQL VM could actually get passthrough to a partition to store the DB there
<lejonet> IdleBot_bf4161f7: you know of any good lists of 9P implementations off the top of your head, or is google our friend on that perhaps? :P
<lejonet> Yeah, sometimes when performance is of essence, doing the block device dance might be warranted
<IdleBot_bf4161f7> It is actually cheap and easy — but limited, and even more limited if you want to stay on the safe side
<lejonet> IdleBot_bf4161f7: I guess you could say, the implementation used to bring a resource into a VM would have to be context dependent
<lejonet> But maybe for a prototype implementation, One To Rule Them All might be the way to go
<qyliss> Nice to be able to talk all this out before spending time implementing the wrong thing.
<IdleBot_bf4161f7> I am here to convince people that anything that would impede reuse in my slightly different system is a wrong thing to implement!
<lejonet> qyliss: indeed, brainstorming is always a good idea now and then :)
<IdleBot_bf4161f7> I would hope that plan9port contains a reasonably good 9P server
<lejonet> IdleBot_bf4161f7: +1, any good system is consistent of a few, highly reusable and simple components in my mind :)
<IdleBot_bf4161f7> And I actually use a lot of relatively similar functionality today on my day-to-day system, so I have learnt to notice a few catches
<IdleBot_bf4161f7> A minor point: using 9P means that the /nix/store part is basically free, and we can even manipulate it from the host side without too much risk or problems
* lejonet has hardly used 9p, but did some light testing of it yesterday
<lejonet> it was quite nice and simple to use
<IdleBot_bf4161f7> Frankly, I observe that a lot of things I want to have use files in a way that is trivial enough to run on GlusterFS. Well, if most things were not simple enough to survive GlusterFS, GlusterFS would not exist…
<IdleBot_bf4161f7> Which means that 9P, NFS, SSHFS, whatever is remotely similar to POSIX is OK-ish
<lejonet> And comes with the added benefit that they are quite well understod
<lejonet> NFSv3 doesn't support ACLs right? Because otherwise a generic NFS engine + ZFS autoshare capability might've been something for on-demand ACL'd shares
<IdleBot_bf4161f7> Aha, with CrosVM there is no 9P support built-in anyway
<lejonet> Yeah, someone mentioned that yesterday iirc
<lejonet> Depending on how complex the 9p protocol is, the effort to create a implementation for it, for crosvm might be worth it?
<Shell> why 9p vs virtio-fs?
<lejonet> Hadn't really heard anything about virtio-fs being an actual thing, seems to be fairly new
<qyliss> would be nice if we could use virtio-fs, since it's built-in...
<lejonet> Indeed, and actually made for the actual problem that is at the table, and not "just" a retro-fitted networkfs
<qyliss> Yeah.
<lejonet> It doesn't seem like crosvm has support for it (yet?)
<lejonet> Firecacker (Amazon-derivative of crosvm) seems to have support for virtio-block, thus it wouldn't be unthinkable that they'll implement virtio-fs too perhaps
<IdleBot_bf4161f7> Well, today 9P is a protocol that some people have deployed on the live network for years, and virtio-fs first appeared in the Linux version that I have not yet installed on my laptop…
<lejonet> There is also that argument ofc IdleBot_bf4161f7 :)
<lejonet> Cutting edge code vs "tried and proven" a bit older code
<IdleBot_bf4161f7> If it is officially early benchmarking, their words not mine, I would prefer leaving an easy way to swap it in, but not make it default
<lejonet> and a prototype is a prototype, there are arguments that for a prototype using what works, regardless, to prove the concept, is what is the point, on which you base the actual implementation on later
<IdleBot_bf4161f7> Re: Firecracker: no, they will not implement virtio-fs anytime soon, judging from their declared policy
<qyliss> It's early days for Spectrum too. I imagine virtio-fs will have matured a lot by the time Spectrum is ready for general use.
<IdleBot_bf4161f7> Look, I migrated my laptop to a different way of management in a week; given that your grant deliverables include running code, I expect to integrate them as they are written — and so my goal is to maximise the chance that whatever you have in three months is useful for me
<IdleBot_bf4161f7> I never deny imperfect goal alignment
<IdleBot_bf4161f7> But I do expect that you want to say that your notebook runs on SpectrumOS in six months, and given my experience with amount of work for some similar things, six months fulltime gives you each-app-activity-isolated system just fine
<IdleBot_bf4161f7> Sure, it will not be Ubuntu-easy by that time. Slightly more involved to setup and use than NixOS.
<lejonet> qyliss: my point exactly
<IdleBot_bf4161f7> Eventually, CrosVM is likely to get virtio-fs.
<lejonet> Yeah
<lejonet> Especially if the trend goes towards it
<IdleBot_bf4161f7> By that time we want to have SpectrumOS fully usable by Linux-related programmers, though
<IdleBot_bf4161f7> By the way, does CrosVM have any support for online RAM amount change, be it hotplug or ballooning?
<IdleBot_bf4161f7> (I just imagined guessing the RAM allocations for multiple Firefox instances)
<pie__> yay more website
<qyliss> :)
<IdleBot_bf4161f7> «More website» is compared to a week ago, not one day ago, right? So nothing new for me to complain about?
<qyliss> Nothing new since yesterday
<qyliss> Although I'm preparing some updates based on your feedback
<IdleBot_bf4161f7> Thanks!
<hyperfekt> Uhm. crosvm has virtfs (what y'all probably mean by 9P) support if I'm not very confused.
<puck> virtio-fs seems to be more like FUSE over virtio, the 9p thing is based on ... virtio sockets, iirc?
<IdleBot_bf4161f7> hyperfekt: trying to find crosvm and virtfs mentioned on the same page leads generally to #spectrum logs
<hyperfekt> IdleBot_bf4161f7: idk where it's documented but here's the impl: https://chromium.googlesource.com/chromiumos/platform/crosvm/+/refs/heads/master/p9/
<qyliss> IdleBot_bf4161f7: lol
<puck> like, the kernel 9p client allows you to connect to a virtio socket? then the server/vm host just has a 9p server on the other end of that socket
<qyliss> Another reason to use brand new stuff -- just think of the SEO potential! :P
<puck> virtio-vsock, that's it
<IdleBot_bf4161f7> Hm indeed, crosvm main.rs does mention Argument::value("shared-dir", "PATH:TAG",
<puck> that's not virtio-fs, afaik
<qyliss> Glad everyone else is as confused as I am
<IdleBot_bf4161f7> puck: yes, seems to be 9p. Format also implies that
<puck> the thing that crosvm etc use is most probably 9p over virtio-vsock
<qyliss> I found a search excerpt that says:
<qyliss> > In Section 3, we introduce the VirtFS design including an overview of the 9P protocol, ..... It is used by crosvm and the 9s daemon to share files and directories
<qyliss> But the web page is now a 404
<hyperfekt> yeah, something like that. virtio-fs is built for VMs instead of network so it doesn't do all the copying virtfs does
<puck> yeah
<qyliss> Which is a shame because that sounded so promising
<qyliss> Oh but that's virtfs, anyway, not virtio-fs
<qyliss> Found the paper, anyway
<hyperfekt> as discussed earlier, virtfs should still be good enough for us. just don't expect to run a webscale db instance on it
<IdleBot_bf4161f7> qyliss: but virtfs (9p) seems to be there in crosvm source, and even connected to the arguments
<qyliss> Yeah.
<IdleBot_bf4161f7> As far as I understand, virtio-fs should have the overhead doubling, as the FUSE protocol is still driven by a userspace application
<IdleBot_bf4161f7> And if we do not want to do likely-unsafe stuff when it appears, maybe 9p will be good enough for a long time
<puck> it's not actual FUSE, fwiw
<puck> oh it is
<puck> with a mmap extension
<puck> it is hardened tho
<puck> doing an impl of that in rust would be fun, i bet
<IdleBot_bf4161f7> Well, FUSE has mmap…
<pie__> I cant rememebr if this is a repost but are you guys familiar with https://trmm.net/Heads
<qyliss> pie__: I run it
<pie__> oh
<qyliss> Although I don't actually get any security benefit from it currently because I haven't set up an activation script to sign the kernels and stuff
<pie__> interesting, i havent gotten around to reading the reading material for it yet
<qyliss> Getting it to work with NixOS at all was a huge pain
<qyliss> but I'd like to improve that
<pie__> *thumbs up*
<qyliss> virtio-fs's DAX stuff looks extremely promising for us
<lejonet> Yeah, when I read the first page you get when googling virtio-fs, looks very promising, going to be interesting to see how it can be implemented (like if we will need to make our own implementation for firecracker/crosvm/<W/E that is chosen>)