Managing Linux Processes in Go

January 14, 2023 11-minute read

go • linux • processes

This article explores basic concepts behind Linux processes and how one can manage them using the Go programming language.

Linux Processes Link to heading

As far as Linux is concerned, a process is just a running program. There is a special process called init which is the first process to start during OS startup. It continues running until the system shuts down. Init is the “ultimate” parent process for all other processes and has a PID (process ID) of 1. It is responsible for starting all other processes. They can start children of their own as well. However, when the parent process exits, it is the init who becomes the new parent.

There are several popular init systems for Linux such as SysV, SystemD, and Upstart, but they are out of scope of this article.

Foreground vs Background Processes Link to heading

Linux, as do many other operating systems, distinguishes between two types of processes: foreground and background. A foreground process is also called an interactive process. It has to be started by the user and requires direct user input. Whereas a background process is run independently of the user and is not attached to a terminal.

Process State Link to heading

A process can have one of the five possible states at any given time:

Running/Runnable (‘R’)
Interruptable Sleep (‘S’)
Uninterruptable Sleep (‘D’)
Stopped (‘T’)
Zombie (‘Z’)

Running/Runnable indicates that the process is either currently running or is ready to be run.

Interruptable sleep indicates that the process is waiting for external resources (i.e., data from a file on the disk). However, the process can react to signals while waiting for the resources.

Uninterruptable sleep also denotes the process that is waiting for resources, but such a process does not react to any signals.

Stopped indicates that the process was put on hold by the SIGSTOP or SIGTSTP signals. Such a process can be brought back into a Runnable state by sending a SIGCONT signal.

Zombie indicates that the process has exited but its parent hasn’t removed it from the process list. The parent needs to “reap” the child by waiting on it and reading its exit status. This prevents the child process from becoming a zombie.

Signals Link to heading

Linux supports various signals to aid in managing processes. A signal is a software interrupt sent to a program to announce some kind of event. It has a name and a number representation. A program will receive a signal and choose what to do with it.

The three ways a process can handle a signal:

React with a custom action. A program may specify a custom behavior for handling a signal (i.e., reread a configuration file or terminate itself).
Use the default action. Every signal has an associated default action. It may, however, be an action to ignore a signal.
Ignore a signal. Although not all signals can be ignored. For example, SIGKILL cannot be ignored.

This table describes some of the common signals:

Number	Name	Default Action	Description
1	SIGHUP	Terminate	Hang up controlling terminal or process. Sometimes used as a signal to reread configuration file for the program.
2	SIGINT	Terminate	Interrupt from keyboard, Ctrl + C.
3	SIGQUIT	Dump	Quit from keyboard, Ctrl + \.
9	SIGKILL	Terminate	Forced termination.
15	SIGTERM	Terminate	Graceful termination.
17	SIGCHLD	Ignore	Child process exited.
18	SIGCONT	Continue	Resume process execution.
19	SIGSTOP	Stop	Stop process execution, Ctrl + Z.

You can read more about signals here.

Managing Processes in Go Link to heading

Relevant Go Packages Link to heading

Packages os, os/exec, and syscall provide a lot of useful functionality for interacting with an OS from a Go application. os provides a platform-independent interface to operating system functionality. os/exec allows running external shell commands. syscall provides an interface to the low-level OS primitives and allows executing system calls. The behavior and functionality of these packages are OS-specific. This article focuses specifically on Linux behavior.

Start Process Link to heading

Here is a simple example showing how to start an external process from a Go application:

package main

import (
  "log"
  "os/exec"
)

func main() {
  runtimeArgs := []string{"-v", "-f"}
  cmd := exec.Command("/usr/bin/myapp", runtimeArgs...)
  err := cmd.Start()
  if err != nil {
    log.Fatalln(err)
  }
  select {} // block forever for demo purposes
}

runtimeArgs := []string{"-v", "-f"} defines a list of optional parameters you may wish to pass to a program.
cmd := exec.Command("/usr/bin/myapp", runtimeArgs...) creates an instance of *exec.Cmd struct which represents an external shell command. The first parameter is a path to executable you wish to invoke, the second is a variadic list of runtime arguments.
err := cmd.Start() executes the specified command but does not wait for it to complete. This line of code is non-blocking. The caveat is that you should call cmd.Wait() method at some point to release the associated system resources. Otherwise, the executed program will become a zombie process once it exits.

Note: there is also cmd.Run() method that can be used instead of cmd.Start(). The difference is that Run() actually blocks the code execution and waits for the command to complete releasing the associtated resources.

Reap Child Process Link to heading

Expanding on the previous example, here is how to properly wait for the child process in a non-blocking manner:

package main

import (
  "log"
  "os/exec"
)

func main() {
  runtimeArgs := []string{"-v", "-f"}
  cmd := exec.Command("/usr/bin/myapp", runtimeArgs...)

  err := cmd.Start()
  if err != nil {
    log.Fatalln(err)
  }
  log.Println("started the child process")

  go func(cmd *exec.Cmd) {
    err := cmd.Wait()
    if err != nil {
      log.Fatalln(err)
    }
    log.Println("cleaned up the child process")
  }(cmd)

  select {} // block forever for demo purposes
}

This example adds an additional function executed in a separate goroutine that takes cmd *exec.Cmd as a parameter and calls cmd.Wait() on it.
It is important to note that the command had to be started by cmd.Start() for cmd.Wait() to work.
cmd.Wait() is a blocking operation that returns nil if the process exited with status 0, otherwise it returns an error.

Detach Child Process on Parent Exit Link to heading

Another issue worth considering is what happens to the child process when the parent (the Go application) exits as a result of SIGINT. By default, the child process will exit as well. However, in some cases, it may be necessary to keep the child process alive.

This can be achieved by placing the child in a different process group:

package main

import (
  "log"
  "os/exec"
  "syscall"
)

func main() {
  runtimeArgs := []string{"-v", "-f"}
  cmd := exec.Command("/usr/bin/myapp", runtimeArgs...)

  cmd.SysProcAttr = &syscall.SysProcAttr{
    // Puts the child process in a different process group:
    Setpgid: true,
  }

  err := cmd.Start()
  if err != nil {
    log.Fatalln(err)
  }

  select {} // block forever for demo purposes
}

This example sets the SysProcAttr field on the cmd object before calling cmd.Start().
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true} ensures that the child process will be put in a different process group.
As a result, hitting Ctrl+C on the Go application will not affect the child. It will continue running. The init process will become a new parent for the child and release its resources on exit. Therefore, it will not turn into a zombie.

Note: it is still necessary to wait on the child process in case it exits before the parent.

Find Existing Process Link to heading

This example shows how to find an existing process using its pid:

package main

import (
  "log"
  "os"
  "syscall"
)

func main() {
  pid := 5768 // Example pid
  process, _ := os.FindProcess(pid) // Always succeeds on Unix systems
  err := process.Signal(syscall.Signal(0))
  if err != nil {
    log.Fatalf("pid %d returned: %v\n", pid, err)
  }
  // Process does exist here
}

First, the example makes use of os.FindProcess(pid int) function. It returns an *os.Process instance that can be used to send signals to the requested process.
The caveat is that os.FindProcess(pid int) always successfully returns on Linux even if the process does not exist.
To check if the process actually exists it is necessary to send it an “empty” signal. Signal 0 on Linux does not actually send a signal, but it still performs the OS error checking.
The err := process.Signal(syscall.Signal(0)) call will return nil if the process exists and the Go application has enough permissions to send a signal. Otherwise, it will return an error.

Kill Process Link to heading

Killing a process is a fairly straightforward task but some details require special attention. For one, the process from which the signal is sent needs to have sufficient permissions to do so. It is also often a good idea to try terminating the process gracefully before killing it.

The example below attempts to gracefully terminate a process. If the process does not exit after 10 seconds, the program sends a SIGKILL signal. SIGKILL cannot be ignored and terminates a process immediately (the only exception is the init process which can ignore SIGKILL).

package main

import (
  "log"
  "os"
  "syscall"
  "time"
)

func main() {
  pid := 5768 // Example pid
  process, _ := os.FindProcess(pid) // Always succeeds on Unix systems

  err := process.Signal(syscall.SIGTERM) // Attempt graceful termination
  if err != nil {
    log.Fatalf("pid %d returned: %v\n", pid, err)
  }
  
  // Poll for 10 seconds to make sure the process has been terminated
  for i := 0; i < 10; i++ {
    time.Sleep(1 * time.Second)
    err = process.Signal(syscall.Signal(0))
    if err != nil {
      return // Terminated successfully, safe to exit
    }
  }

  log.Printf("failed to terminate pid %d gracefully, sending SIGKILL\n", pid)
  err = process.Signal(syscall.SIGKILL)
  if err != nil {
    log.Fatalf("pid %d returned: %v\n", pid, err)
  }
}

First, the example acquires an instance of *os.Process by calling os.FindProcess(pid). Again, this function call always succeeds on Linux and does not guarantee that the process exists.
err := process.Signal(syscall.SIGTERM) sends a SIGTERM signal to the specified process. If the Go application has sufficient permissions and the target process exists, the call will return nil.
The for loop keeps polling the process with signal 0 until it returns an error. If the process returns an error at this point, it has been shut down.
If the process is still alive after 10 seconds, err = process.Signal(syscall.SIGKILL) will send a SIGKILL signal and terminate it immediately.

Note: the attempt to terminate a process gracefully is useful because many programs implement custom behavior for a SIGTERM signal. A program may do a self-cleanup routine, persist the state, close network connections, etc., and then terminate itself.

Wait for a Non-child Process Link to heading

Waiting for a non-child process to exit is a tricky task on Linux systems. The issue is that it is impossible to wait for a non-child process using wait(2) system call. As a result, calling process.Wait() in the Go code will not work. There are, of course, other ways to achieve this.

This example shows a more or less stable and error-proof approach but it relies on the pidfd_open(2) system call which is available starting from Linux 5.3.

Note: the example below uses “golang.org/x/sys” package for implementation of poll(2) system call. Run go get golang.org/x/sys to add it as a dependency.

package main

import (
  "errors"
  "log"
  "syscall"

  "golang.org/x/sys/unix"
)

const syscallPidfdOpen = 434

type pidFD int // file descriptor that refers to a process

func pidfdOpen(pid int, flags uint) (pidFD, error) {
  fd, _, errno := syscall.Syscall(syscallPidfdOpen, uintptr(pid), uintptr(flags), 0)
  if errno != 0 {
    return 0, errno
  }
  return pidFD(fd), nil
}

func (fd pidFD) waitForExit() error {
  fds := []unix.PollFd{{Fd: int32(fd), Events: unix.POLLIN}}
  _, err := unix.Poll(fds, -1)
  if err != nil {
    return err
  }
  if fds[0].Events & unix.POLLIN != unix.POLLIN {
    return errors.New("unexpected poll event")
  }
  // Process exited
  return nil
}

func main() {
  pid := 5768 // Example pid

  pidfd, err := pidfdOpen(pid, 0)
  if err != nil {
    log.Fatalf("opening pid fd: %v\n", err)
  }
  defer syscall.Close(int(pidfd))

  err = pidfd.waitForExit() // blocks until the process exits
  if err != nil {
    log.Fatalf("polling pid %d: %v\n", pid, err)
  }
  // Process exited
}

const syscallPidfdOpen = 434 defines the system call number of pidfd_open(2). This value will be passed as a first argument to syscall.Syscall() function to specify the system call for execution.
type pidFD int defines a custom type that wraps a file descriptor for convenience.
func pidfdOpen(pid int, flags uint) (pidFD, error) function implements the pidfd_open(2) system call that returns a pid file descriptor on success. The file descriptor can then be used in poll(2) system call.
func (fd pidFD) waitForExit() error function executes poll(2) system call that will block until the target pid process terminates. Notice how unix.Poll(fds, -1) call takes -1 as a second parameter to block indefinitely, as opposed to having a set timeout for waiting.
func main() makes use of these functions to first open a file descriptor for the target pid and then wait until the process exits.