If you attended AFUP Day 2022, you might know that I am currently working on a PHP module for web servers written in Go.
While testing my upcoming library, I encountered strange memory access issues related to the threads created by the Go runtime (you know, cgo…).
By default, the Go scheduler runs many goroutines on the same system threads. This behavior is incompatible with C code that relies on thread-local storage, such as PHP compiled with the ZTS option.
Fortunately, the Go standard library provides a utility to workaround this problem: runtime.LockOSThread()
. Go guarantees that the goroutine calling this function always runs in the same system thread, and that no other goroutine can run in the same thread until runtime.UnlockOSThread()
is called.
While debugging my library, I discovered that PHP uses the ID of the current thread as a key to access thread-local data. Can this be a problem? Are we sure that the system never reuses the same thread ID while the same process is running? The man page of pthread_self()
provided by Apple on Mac OS X (from 1996!) doesn’t give any information about this, however, the Linux man page states this:
A thread ID may be reused after a terminated thread has been joined, or a detached thread has terminated.
Let’s write a test program to understand how runtime.LockOSThread()
works, and if the system really reuses thread IDs:
// This Go program demonstrates the behavior of runtime.LockOSThread() and runtime.UnlockOSThread().
// runtime.LockOSThread() forces the wiring of a Goroutine to a system thread.
// No other goroutine can run in the same thread, unless runtime.UnlockOSThread() is called at some point.
//
// According to the manual of phthread_self, when many threads are created, the system may reassign an ID that was used by a terminated thread to a new thread.
//
// This programs shows that thread IDs are indeed reused (tested on Linux and Mac), but that the thread itself is actually destroyed.
// To prove this, we store used thread IDs in a global variable, and some data in each local thread using pthread_setspecific().
//
// When we hint the Go runtime to reuse the threads by calling runtime.UnlockOSThread(), we can see that the local data is still available when a thread is reused.
package main
/*
#include <stdlib.h>
#include <pthread.h>
int setspecific(pthread_key_t key, int i) {
int *ptr = calloc(1, sizeof(int)); // memory leak on purpose
*ptr = i;
return pthread_setspecific(key, ptr);
}
*/
import "C"
import (
"bytes"
"fmt"
"runtime"
"strconv"
"sync"
)
const nbGoroutines = 1000
type goroutine struct {
num int // App specific goroutine ID
id uint64 // Internal goroutine ID (debug only, do not rely on this in real programs)
}
var seenThreadIDs map[C.pthread_t]goroutine = make(map[C.pthread_t]goroutine, nbGoroutines+1)
var seenThreadIDsMutex sync.RWMutex
// getGID gets the current goroutine ID (copied from https://blog.sgmansfield.com/2015/12/goroutine-ids/)
func getGID() uint64 {
b := make([]byte, 64)
b = b[:runtime.Stack(b, false)]
b = bytes.TrimPrefix(b, []byte("goroutine "))
b = b[:bytes.IndexByte(b, ' ')]
n, _ := strconv.ParseUint(string(b), 10, 64)
return n
}
// isThreadIDReused checks if the passed thread ID has already be used before
func isThreadIDReused(t1 C.pthread_t, currentGoroutine goroutine) bool {
seenThreadIDsMutex.RLock()
defer seenThreadIDsMutex.RUnlock()
for t2, previousGoroutine := range seenThreadIDs {
if C.pthread_equal(t1, t2) != 0 {
fmt.Printf("Thread ID reused (previous goroutine: %v, current goroutine: %v)\n", previousGoroutine, currentGoroutine)
return true
}
}
return false
}
func main() {
runtime.LockOSThread()
seenThreadIDsMutex.Lock()
seenThreadIDs[C.pthread_self()] = goroutine{0, getGID()}
seenThreadIDsMutex.Unlock()
// It could be better to use C.calloc() to prevent the GC to destroy the key
var tlsKey C.pthread_key_t
if C.pthread_key_create(&tlsKey, nil) != 0 {
panic("problem creating pthread key")
}
for i := 1; i <= nbGoroutines; i++ {
go func(i int) {
runtime.LockOSThread()
// Uncomment the following line to see how the runtime behaves when threads can be reused
//defer runtime.UnlockOSThread()
// Check if data has already been associated with this thread
oldI := C.pthread_getspecific(tlsKey)
if oldI != nil {
fmt.Printf("Thread reused, getspecific not empty (%d)\n", *(*C.int)(oldI))
}
g := goroutine{i, getGID()}
// Get the current thread ID
t := C.pthread_self()
isThreadIDReused(t, g)
// Associate some data to the local thread
if C.setspecific(tlsKey, C.int(i)) != 0 {
panic("problem setting specific")
}
// Add the current thread ID in the list of already used IDs
seenThreadIDsMutex.Lock()
defer seenThreadIDsMutex.Unlock()
seenThreadIDs[C.pthread_self()] = g
}(i)
}
}
Here is an example output when runtime.UnlockOSThread()
is commented.
Thread ID reused (previous goroutine: {6 9}, current goroutine: {1 4})
Thread ID reused (previous goroutine: {1 4}, current goroutine: {13 16})
Thread ID reused (previous goroutine: {3 6}, current goroutine: {32 52})
So yes, threads ID are reused, while the threads themselves are properly destroyed. We may have a problem with the PHP codebase!
And here is an example when runtime.UnlockOSThread()
is uncommented.
Thread reused, getspecific not empty (35)
Thread ID reused (previous goroutine: {35 53}, current goroutine: {1 19})
Thread reused, getspecific not empty (26)
Thread ID reused (previous goroutine: {26 44}, current goroutine: {19 37})
Thread reused, getspecific not empty (19)
Thread reused, getspecific not empty (84)
Thread ID reused (previous goroutine: {84 102}, current goroutine: {36 54})
Thread reused, getspecific not empty (36)
Thread ID reused (previous goroutine: {36 54}, current goroutine: {37 55})
Thread reused, getspecific not empty (37)
Thread ID reused (previous goroutine: {37 55}, current goroutine: {38 56})
Thread reused, getspecific not empty (38)
Thread ID reused (previous goroutine: {38 56}, current goroutine: {39 57})
Thread reused, getspecific not empty (18)
Thread reused, getspecific not empty (39)
Thread ID reused (previous goroutine: {39 57}, current goroutine: {40 58})
Thread reused, getspecific not empty (40)
Thread ID reused (previous goroutine: {18 36}, current goroutine: {2 20})
Thread reused, getspecific not empty (1)
Thread ID reused (previous goroutine: {40 58}, current goroutine: {41 59})
We can see that thread are reused, as documented!
If you like this kind of material, or if you want to give me more time to work on my free software projects like this one, consider sponsoring me on GitHub!
If you’re looking for Go, C or PHP experts to hire, contact Les-Tilleuls.coop!