posix_spawn syscall added (2012)

JdeBP1 pts0 comments

NetBSD Blog

" The Guide | Manual pages | Mailing lists and Archives | CVS repository | Report or query a bug | Software Packages

Home |<br>RSS |<br>Release engineering | Development | The NetBSD Foundation | Networking | General | Ports | Security | Events | Packages |<br>Login

Bookmarks

The NetBSD Project

NetBSD Wiki

Feeds

All

/Release engineering

/Development

/The NetBSD Foundation

/Networking

/General

/Ports

/Security

/Events

/Packages

Comments

posix_spawn syscall added

February 26, 2012 posted by Martin Husemann

Charles Zhang implemented the posix_spawn syscall during Google Summer of Code 2011.<br>After a lot of polishing and rework based on feedback during public discussion of the code, this has now been committed to NetBSD-current.

This caused some fallout and ended in a tight race with the imminent branch date for NetBSD 6.<br>Now that the dust has settled, it is time for a look back at the mistakes made and lessons learned.

What is posix_spawn?

Traditionally BSD systems used the vfork(2) hack to improve speed of process creation. However, this does (in general) not play well with multi-threaded applications. The posix_spawn call is a thread-safe way to create new processes and manipulate a tiny bit of state (like dup/close/open file descriptors) upfront.

Work continued after GSoC

The results Charles had at the end of his GSoC term were a working in-kernel implementation of posix_spawn and a few free-form test cases, one of which failed. The kernel code duplicated a lot of other code, which clearly was not acceptable for commit to the NetBSD source tree. The reason Charles solved it this way was the short time frame available - and that the best solution we could think of during the summer was very intrusive.

In preparation for a potential merge into the NetBSD code base, I reworked the code to avoid copying helper functions (like file descriptor manipulations for other processes), cleaned up and debugged a bit using a LOCKDEBUG kernel, which pointed out a few more issues. After solving those as well as intensively testing all error paths, I posted a patch for review.

At this point the integration was already prepared completely - a new syscall, new libc functions, new manual pages need a lot of set lists updates and test building a "release" at least once (preferably on an architecture providing 32bit compat libraries), furthermore the posix_spawn code needed (simple) machine-dependent code to be added to all architectures, which at least requires test-building a representative set of kernels.

Another complete rework

In response to the posted, very intrusive, patch, YAMAMOTO Takashi suggested a pretty elegant way to solve the problem without a lot of the intrusive changes. The idea was simple, and it actually worked after a few adjustments. This led to another public patch for review.

This version already included an atf version of the test programs, which all passed (both on amd64 and sparc64). I felt pretty confident with this state and expected a smooth integration.

Unexpected fallout

More for completeness I did a full test run (not only the posix_spawn related tests) - and found some unexpected test failures, all in rump based tests. I retried and got different failures. Suspicious - I did not touch rump, besides regenerating the syscall definitions. I rebooted a standard kernel (without posix_spawn), did a full test run and only got failures in the posix_spawn tests (of course). So something in the change must have broken something else.

Analysis was a painful process, so only a short summary of the results: the modified kernel exec path used a pointer to a kernel stack variable, which was later copied to a saved data structure - but the pointer was not adjusted accordingly. Later the pointer was referenced, and only a single bit checked. Depending on what was in memory at the stale old stack location at that time, a branch was taken or not. This caused the ELF auxiliary data vector to sometimes contain a different effective UID, and ld.elf_so switching into secure mode - in which case it ignores environment variables like LD_PRELOAD. This causes big failure in many test programs using rump (at least).

While I was debugging this, discussions continued. We were not sure if we should add complex code like this to the kernel, where a pure userland implementation clearly is possible (FreeBSD uses this, for example). I did a few benchmark runs, but was unable to show any clear performance benefit for either implementation - the differences were in the sub-promille ranges, with noise in the 2-3 percent range, clearly no usable result from a statistical point of view. Another topic under discussion was the near planned branch for NetBSD 6. According to our rules, we do not want to add a syscall post-branch to a release branch.

Go ahead, finally!

The discussions ended with the core team voting for a kernel version, and the release engineering team voting for a...

posix_spawn netbsd code kernel test syscall

Related Articles