Skip to content

Fix waitForProcess logic #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

mitchellwrosen
Copy link
Contributor

@mitchellwrosen mitchellwrosen commented Jul 22, 2017

Hello! I think I identified a couple problems with the current process cleanup logic.

waitingThread is spawned with exceptions masked (when using withProcess), so cancel waitingThread does not actually do anything. I believe therefore we would always enter the Right branch of case eec of ...

Also, I don't think the general idea of killing the waiting thread and possibly re-waiting is correct, Since an async ThreadKilled might bring the thread down after waitpid returns, and thus the exit code is lost forever.

So, my fix is to leave the responsibility of filling pExitCode to the waiting thread. And because it does not expect to be sent a ThreadKilled during cleanup, I think a simple forkIO suffices to spawn it.

Finally, there is a small race condition fix in there involving calling terminateProcess on a process that just ended.

@snoyberg
Copy link
Member

Interesting, there are a few things intertwined here. Thanks for raising this, let's hash it out.

Firstly: you're right about cancel being useless. That could be fixed with an explicit unmask. So let's put that to the side and look at the rest.

I think both of our approaches in theory are open to a race condition. The race condition you're describing in the current code, IIUC, looks like:

  • Waiting thread calls waitpid
  • waitpid succeeds
  • After waitpid succeeds, an async exception is received, and the value is never put in the TMVar

However, I'm not convinced this is possible, since (as you pointed out) we have async exceptions masked. In fact, if we have them masked with mask and not uninterruptibleMask, we probably have exactly the behavior we want: cancel can interrupt the waitpid call, but not the putTMVar.

On the other hand, I see a potential issue with your approach: what if the waitpid call completes successfully but, before updating the MVar indicating that the process is complete, we call terminateProcess in the other thread? It's entirely possible that the OS will have already assigned that process ID to a new process (since the successful waitpid frees the PID to be reused).

@mitchellwrosen
Copy link
Contributor Author

However, I'm not convinced this is possible, since (as you pointed out) we have async exceptions masked. In fact, if we have them masked with mask and not uninterruptibleMask, we probably have exactly the behavior we want: cancel can interrupt the waitpid call, but not the putTMVar.

Just to clarify, you mean we have the behavior we want if waitForProcess was unmasked, correct? But if waitForProcess was unmasked, the problem is it's not an atomic operation that just wraps a C call. There's room for an async exception to fire in between the foreign call and returning from the Haskell waitForProcess, so it would still be possible for the three-bullet-point race condition that you described above to occur.

On the other hand, I see a potential issue with your approach: what if the waitpid call completes successfully but, before updating the MVar indicating that the process is complete, we call terminateProcess in the other thread? It's entirely possible that the OS will have already assigned that process ID to a new process (since the successful waitpid frees the PID to be reused).

Yeah, this is the race condition I attempted to handle with the eSRCH case, though you are right a new PID could have been assigned between checking that the process is still running and killing it. Hmm... tricky! But could that not happen with the current code as well? I'm not sure it's even possible in general to ascertain the identity of the PID you're referring to, and for this reason I imagine the operating system does its best never to assign a PID that was very recently terminated.

Thanks for looking!

@snoyberg
Copy link
Member

Just to clarify, you mean we have the behavior we want if waitForProcess was unmasked, correct?

No, I mean if it's masked, but not uninterruptibly masked.

But could that not happen with the current code as well?

The idea is that, with the current code, we never kill the process when there's a chance that waitpid has succeeded. As long as we haven't called waitpid, we know that our PID still exists in the process table, and we're sending signals to the correct process (even if it's already dead and is in zombie state before we reap it).

@mitchellwrosen
Copy link
Contributor Author

No, I mean if it's masked, but not uninterruptibly masked.

Then a cancel won't cancel it! :)

The idea is that, with the current code, we never kill the process when there's a chance that waitpid has succeeded. As long as we haven't called waitpid, we know that our PID still exists in the process table, and we're sending signals to the correct process (even if it's already dead and is in zombie state before we reap it).

But the mechanism to determine whether or not waitpid has succeeded is inaccurate. Specifically, if waitpid was called, we might not actually know since an async exception could fire after it but before the putTMVar!

Cheers!

@snoyberg
Copy link
Member

Are you familiar with interruptible actions? http://hackage.haskell.org/package/base-4.10.0.0/docs/Control-Exception.html#g:13

I'm having trouble following your argument at this point, I don't think it adds up at all with how masking works.

@mitchellwrosen
Copy link
Contributor Author

Yes I am! It's my understanding that an interruptible foreign call is not interruptible vis-a-vis mask. It's a name clash tragedy.

@snoyberg
Copy link
Member

The docs and my experience both imply the opposite:

most operations which perform some I/O with the outside world

Do you have any evidence to demonstrate that interruptible FFI calls are not, in fact, interruptible?

@snoyberg
Copy link
Member

https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/ffi-chap.html#interruptible-foreign-calls

This concerns the interaction of foreign calls with Control.Concurrent.throwTo. Normally when the target of a throwTo is involved in a foreign call, the exception is not raised until the call returns, and in the meantime the caller is blocked. This can result in unresponsiveness, which is particularly undesirable in the case of user interrupt (e.g. Control-C). The default behaviour when a Control-C signal is received (SIGINT on Unix) is to raise the UserInterrupt exception in the main thread; if the main thread is blocked in a foreign call at the time, then the program will not respond to the user interrupt.

And

./System/Process.hs:732:foreign import ccall interruptible "waitForProcess"

@mitchellwrosen
Copy link
Contributor Author

mitchellwrosen commented Jul 25, 2017

Huh, you are totally right! I just played around with some sleep function and it's indeed interrupted even inside a mask. I think I know the source of my confusion, though, which is that waitForProcess retries c_waitForProcess in the case of an interruption (via throwErrnoRetryIfMinus1).

Therefore, waitForProcess is effectively uninterruptible! No?

@mitchellwrosen
Copy link
Contributor Author

Ah, that doesn't explain anything, as once c_waitForProcess returns, the pending exception would be delivered. Ok, let me cook up a snippet that demonstrates the confusing behavior I'm seeing...

@mitchellwrosen
Copy link
Contributor Author

mitchellwrosen commented Jul 25, 2017

module Main where

import System.Process
import Control.Concurrent
import Control.Exception

main :: IO ()
main = do
  (_, _, _, p) <- createProcess (shell "sleep 2")

  tid <-
    forkIO $ mask_ $ do
      putStrLn "thread: waitForProcess"
      code <- waitForProcess p
      putStrLn ("thread: " ++ show code)

  threadDelay 1000000
  putStrLn "main: killThread"
  killThread tid
  putStrLn "main: bye"

This prints:

thread: waitForProcess
main: killThread
thread: ExitSuccess
main: bye

@mitchellwrosen
Copy link
Contributor Author

mitchellwrosen commented Jul 25, 2017

I think what's going on is:

  • An interruptible foreign call will indeed be sent SIGPIPE if the masking state is Unmasked or MaskedInterruptible (but not MaskedUninterruptible)
  • However, if not Unmasked, the corresponding Haskell exception won't actually be delivered after the foreign call returns!

@snoyberg
Copy link
Member

It sounds like the solution would be in the process package, to explicitly poll for masked exceptions before retrying the system call. Does that make sense?

@mitchellwrosen
Copy link
Contributor Author

I agree, c_waitForProcess should be wrapped in Control.Exception.interruptible. (In fact, maybe all calls to interruptible foreign calls should!)

@mitchellwrosen
Copy link
Contributor Author

I think this patch is no longer necessary, though there is another small issue. The Left ... branch doesn't fill the exit code TMVar, so withProcess_ (for example) would throw BlockedIndefinitelyOnSTM if the process didn't complete before the callback finished, as in:

withProcess_ config (\p -> {- something short -})

@snoyberg
Copy link
Member

snoyberg commented Jul 28, 2017 via email

@mitchellwrosen
Copy link
Contributor Author

Sure, #3. Does that sound like the right fix to process? I can make that PR as well if you'd like

@snoyberg
Copy link
Member

I'm not sure about the process change, as I'm not sure of the actual behavior of interruptible. In fact, I'm not entirely certain on the behavior of async exceptions for interruptible operations. The docs are vague, and I would have thought that once the FFI call returns, the exception would be rethrown. I'd like to get clarity on the actual behavior before smashing the code in ways I don't understand.

@mitchellwrosen
Copy link
Contributor Author

I agree the docs are vague! But without a fix to process, there is still a bug in this library:

withProcess config (\_ -> pure ()) -- this won't return until the process completes!

Control.Exception.interruptible allows for any pending async exceptions to be delivered when inside an interruptible mask. So if it was an async exception that caused c_waitForProcess to return with errno = EINTR, then once we're back in haskell, it can bring the thread down.

Here are some demonstrations of how safe/interruptible interacts with unmasked/masked/uninterruptible (file is called interrupt.hs):

#!/usr/bin/env stack
-- stack script --resolver lts-9.0 interrupt.hs -- +RTS -V0 -RTS

{-# LANGUAGE LambdaCase       #-}
{-# LANGUAGE InterruptibleFFI #-}

import Control.Concurrent
import Control.Exception
import System.Environment
import System.Timeout

foreign import ccall safe
  "sleep" sleep_safe :: Int -> IO Int

foreign import ccall interruptible
  "sleep" sleep_interruptible :: Int -> IO Int

main :: IO ()
main = do
  [_, safety, masking] <- getArgs

  let sleep =
        case safety of
          "safe" -> sleep_safe
          "interruptible" -> sleep_interruptible

      maskf =
        case masking of
          "unmasked" -> id
          "interruptible" -> mask_
          "uninterruptible" -> uninterruptibleMask_

  tid <- forkIO $ maskf $ do
           putStrLn "Worker: sleeping"
           _ <- sleep 1
           putStrLn "Worker: woke up"

  threadDelay 100000

  putStrLn "Main: killing worker"
  timeout 100000 (killThread tid) >>= \case
    Nothing -> putStrLn "Main: failed to kill worker"
    Just _  -> putStrLn "Main: killed worker"
  • Safe foreign call, unmasked masking state
$ ./interrupt.hs safe unmasked
Worker: sleeping
Main: killing worker
Main: failed to kill worker
  • Safe foreign call, interruptible masking state
$ ./interrupt.hs safe interruptible
Worker: sleeping
Main: killing worker
Main: failed to kill worker
  • Safe foreign call, uninterruptible masking state
$ ./interrupt.hs safe uninterruptible
Worker: sleeping
Main: killing worker
Main: failed to kill worker
  • Interruptible foreign call, unmasked masking state
$ ./interrupt.hs interruptible unmasked
Worker: sleeping
Main: killing worker
Main: killed worker
  • Interruptible foreign call, interruptible masking state
$ ./interrupt.hs interruptible interruptible
Worker: sleeping
Main: killing worker
Worker: woke up
Main: killed worker
  • Interruptible foreign call, uninterruptible masking state
$ ./interrupt.hs interruptible uninterruptible
Worker: sleeping
Main: killing worker
Main: failed to kill worker

Notice that in the interruptible/interruptible case, the worker's foreign call is interrupted, but it keeps executing!

With my proposed process change,

#!/usr/bin/env stack
-- stack script --resolver lts-9.0 interrupt2.hs -- +RTS -V0 -RTS

{-# LANGUAGE InterruptibleFFI #-}

import Control.Concurrent
import Control.Exception
import System.Environment
import System.Timeout

foreign import ccall interruptible
  "sleep" sleep :: Int -> IO Int

main :: IO ()
main = do
  tid <-
    forkIO $ mask_ $ do
      putStrLn "Worker: interruptible sleep"
      _ <- interruptible (sleep 1)
      putStrLn "Worker: woke up"

  threadDelay 100000

  putStrLn "Main: killing worker"
  killThread tid
$ ./interrupt2.hs
Worker: interruptible sleep
Main: killing worker

The async exception is delivered as intended (no Worker: woke up output)

@snoyberg
Copy link
Member

snoyberg commented Aug 3, 2017

To clarify my previous comment: I agree that something needs to be fixed in process, I just couldn't give a sign-off on your idea above due to the ambiguity I mentioned in the docs. Proving (as you have here) how the interrupts work is perfect. Even better would be including a test case on the process repo. If you open up a PR there, I'll be able to review/merge it in. Referencing this PR is a good idea.

@mitchellwrosen
Copy link
Contributor Author

This issue was resolved by #3 and haskell/process#101, so closing. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants