fix opaque error messages in lunatik_monitor #398

sneaky-potato · 2026-01-17T18:22:22Z

This patch introduces better error handling and fixes opaque error messages by passing method name as second upvalue to lunatik_monitor.

Summary of my approach:

Preserve method name in lunatik_monitorobject by duplicating it on stack
Pass method name when pushing the c closure (lunatik_monitor) with 2 upvalues:
- actual C function
- preserved function name
Get the method name in lunatik_monitor, it is to be passed correctly to lunatik_error_handler
Pass lunatik_error_handler as the error handler to lua_pcall (this will update the error message)
Clean the stack after this operation

Fixes #382

lunatik_obj.c

sneaky-potato · 2026-01-18T16:02:52Z

Testing summary:
I tested my changes for the bind example mentioned in the issue and was able to patch the error message correctly

$ sudo lunatik run bind_bug
bad argument #2 to 'bind' (out of bounds)
stack traceback:
	[C]: in ?
	[C]: in method 'bind'
	/lib/modules/lua/bind_bug.lua:10: in main chunk
	[C]: in ?

$ sudo lunatik run syscall_bug
stack overflow

EDIT: I realized syscall.address is a module-level function, NOT a method on a userdata object. I suggest we investigate the stack overflow issue separately?

This patch introduces better error handling and fixes opaque error messages by passing method name as second upvalue to lunatik_monitor. Signed-off-by: Ashwani Kumar Kamal <ashwanikamal.im421@gmail.com>

lneto · 2026-01-18T16:51:09Z

EDIT: I realized syscall.address is a module-level function, NOT a method on a userdata object. I suggest we investigate the stack overflow issue separately?

Yup, exactly. Also, it shouldn't be related to your patch, right? Did you try it on master? One thing we could do though to improve the error message in this case is to show stack trace on the driver.

sneaky-potato · 2026-01-18T17:17:00Z

Also, it shouldn't be related to your patch, right? Did you try it on master?

Yup this issue is unrelated to this patch. I got confused because of luaL_argcheck (still learning the embeddable nature of Lua :D)

lneto · 2026-01-18T21:31:16Z

lunatik_obj.c

+	lua_pushvalue(L, lua_upvalueindex(2)); /* method name */
+	lua_pushcclosure(L, lunatik_error_handler, 1); /* stack: object, args..., errhandler */


one concern I have is creating this other closure on every call.. perhaps it's a bit of premature optimization on my side, but it would be good if you can run some benchmarks for this change.. overall, I found it quite an elegant solution.. I'm most concerned on this on hotpaths..

btw, one thing that I've left for later and never got back is to apply memoization to monitor; perhaps it's a good opportunity to do it now..

btw, one thing that I've left for later and never got back is to apply memoization to monitor;

Could you give me some pointers on how to memoize it? I'm thinking of caching the wrapped monitor closures (monitorobject) in the registry to avoid recreating them on every call, but I want to make sure I'm following the right pattern for Lunatik.

I thought of doing something like this:

int lunatik_monitorobject(lua_State *L) { lua_pushlightuserdata(L, (void*)lunatik_monitorobject); lua_rawget(L, LUA_REGISTRYINDEX); if (!lua_isnil(L, -1)) // found closure cached return 1; } lua_pop(L, 1); // function body lua_pushlightuserdata(L, (void *)lunatik_monitorobject); lua_pushvalue(L, -2); lua_rawset(L, LUA_REGISTRYINDEX); return 1; }

perhaps we should just update the object metatable, right?

but I would measure the effect of this change before going to memoization; it's a good exercise anyway ;-).. you can mesure the time of n-calls on a luadata method (e.g., getbyte).. and assess the difference.. it might be negligible.. then we can merge this as is and leave memoization for later.. (or not ;-)

sneaky-potato · 2026-01-20T18:35:45Z

Hello @lneto
I've completed the benchmark comparing the current master against the new closure-based monitor implementation.
Results (1M iterations of getbyte):
Master: ~266 ns/call (avg)
This branch: ~370 ns/call (avg)
The closure creation adds roughly 104ns of overhead per call. The monitoring logic introduces a ~35 - 45% performance hit
this is the script I used

local linux = require("linux")
local data = require("data")

local d = data.new(1024)
local iterations = 1000000

local start_ns = linux.time()
for i = 1, iterations do
    d:getbyte(0)
end
local end_ns = linux.time()
local total_ns = end_ns - start_ns

local ns_per_call = total_ns / iterations

print("Total ns: " .. total_ns)
print("Ns per call: " .. ns_per_call)

lneto · 2026-01-20T19:39:46Z

Hello @lneto I've completed the benchmark comparing the current master against the new closure-based monitor implementation. Results (1M iterations of getbyte): Master: ~266 ns/call (avg) This branch: ~370 ns/call (avg) The closure creation adds roughly 104ns of overhead per call. The monitoring logic introduces a ~35 - 45% performance hit this is the script I used
local linux = require("linux")
local data = require("data")

local d = data.new(1024)
local iterations = 1000000

local start_ns = linux.time()
for i = 1, iterations do
    d:getbyte(0)
end
local end_ns = linux.time()
local total_ns = end_ns - start_ns

local ns_per_call = total_ns / iterations

print("Total ns: " .. total_ns)
print("Ns per call: " .. ns_per_call)

Nice job! Perhaps we can add this to our tests? 35% is looking to much for enriching error messages; what do you think? I think we should try memoization. Btw, can you disable lunatik_monitorobject on luadata (by commenting out it on its metatable) and run the same benchmark? Perhaps we are already imposing an unnecessary overhead by creating the monitor closure all the time.. memoization could help us there as well.. also the new "single" option that has been discussed for not sharing objects.. good work! Thanks!

sneaky-potato · 2026-01-20T20:50:52Z

what do you think? I think we should try memoization.

Yes I agree, we should try memoization.

Btw, can you disable lunatik_monitorobject on luadata (by commenting out it on its metatable) and run the same benchmark? Perhaps we are already imposing an unnecessary overhead by creating the monitor closure all the time..

disabling the monitorobject, I can see latency drops to just ~62 ns/call (avg).
How should we go about memoizing the monitor wrapper?
I tried caching the wrapped monitor closures lazily via __index by updating the object metatable (inside lunatik_monitorobject), but that turned out to be unsafe (kernel panic).

lneto · 2026-01-20T22:17:08Z

what do you think? I think we should try memoization.

Yes I agree, we should try memoization.

perhaps, a better idea is to create closures on the object creation instead of memoizing it. what do you think? We could even leverage uservalues.

Btw, can you disable lunatik_monitorobject on luadata (by commenting out it on its metatable) and run the same benchmark? Perhaps we are already imposing an unnecessary overhead by creating the monitor closure all the time..

disabling the monitorobject, I can see latency drops to just ~62 ns/call (avg).

Wow; that would be a great optimization!

How should we go about memoizing the monitor wrapper? I tried caching the wrapped monitor closures lazily via __index by updating the object metatable (inside lunatik_monitorobject), but that turned out to be unsafe (kernel panic).

can you share your draft and the stack trace?

sneaky-potato · 2026-01-21T18:47:14Z

perhaps, a better idea is to create closures on the object creation instead of memoizing it. what do you think? We could even leverage uservalues.

That will be better yes. But where in the lifecycle will we create these closures? Won't this require too many changes in the current implementation, given that __index and lunatik_monitorobject handle dynamic lookup during lua_newobject ?

Wow; that would be a great optimization!

I have a pushed a change which memoizes the monitor, the latency for same script for latest change is ~330 ns/cal

can you share your draft and the stack trace?

I was trying to mess with metatable and __index, got a kernel panic related to some reentry issue. With the latest push, I’ve been able to safely memoize it to some extent. (330 ns/call, so 21% performance hit)

sneaky-potato · 2026-01-21T20:10:35Z

This is what I could think about creating closures instead of memoization, (taken example of luadata class)

C side:

luadata_lnew
lunatik_newobject
- userdata allocated
- metatable
- create uservalue table
- create monitored closures ONCE and store in uservalue
return object

Lua side:

obj:method()
- __index
- return uservalue[method]

and thus we get rid of lunatik_monitorobject

what do you think @lneto ?

lneto · 2026-01-21T23:30:46Z

This is what I could think about creating closures instead of memoization, (taken example of luadata class)

C side:

luadata_lnew

lunatik_newobject

userdata allocated

metatable

create uservalue table

create monitored closures ONCE and store in uservalue

return object

Lua side:

obj:method()

__index

return uservalue[method]

and thus we get rid of lunatik_monitorobject

what do you think @lneto ?

I was thinking on something like this, but on lunatik_newclass (or setclass) itself, we could create our modified metatable there..

Signed-off-by: Ashwani Kumar Kamal <ashwanikamal.im421@gmail.com>

lneto · 2026-01-22T10:07:19Z

Hello @lneto I've completed the benchmark comparing the current master against the new closure-based monitor implementation. Results (1M iterations of getbyte): Master: ~266 ns/call (avg) This branch: ~370 ns/call (avg) The closure creation adds roughly 104ns of overhead per call. The monitoring logic introduces a ~35 - 45% performance hit this is the script I used
local linux = require("linux")
local data = require("data")

local d = data.new(1024)
local iterations = 1000000

local start_ns = linux.time()
for i = 1, iterations do
    d:getbyte(0)
end
local end_ns = linux.time()
local total_ns = end_ns - start_ns

local ns_per_call = total_ns / iterations

print("Total ns: " .. total_ns)
print("Ns per call: " .. ns_per_call)

I was thinking here.. we could also measure the impact of memoization by doing local m_getbyte = d.getbyte before the loop and the initialization of start_ns. So, we can estimate what's the impact of the closure creation and the impact of synchronization itself. What do you think? Can you run it as well?

sneaky-potato · 2026-01-22T10:26:06Z

So, we can estimate what's the impact of the closure creation and the impact of synchronization itself.

Some metrics for the following script

local linux = require("linux")
local data = require("data")
local d = data.new(1024)
local iterations = 1000000

local m_getbyte = d.getbyte 

local start_ns = linux.time()
for i = 1, iterations do
    m_getbyte(d, 0)
end
local end_ns = linux.time()

local total_ns = end_ns - start_ns
print("Localized Method Total ns: " .. total_ns)
print("Localized Method Ns per call: " .. (total_ns / iterations))

master: Localized Method Ns per call: 117ns
memoization approach (specifically this commit: Localized Method Ns per call: 227ns
latest changes (tried something for creating closures on new_class): Localized Method Ns per call: 220ns

and for the following script

local linux = require("linux")
local data = require("data")

local d = data.new(1024)
local iterations = 1000000

local start_ns = linux.time()
for i = 1, iterations do
    d:getbyte(0)
end
local end_ns = linux.time()
local total_ns = end_ns - start_ns

local ns_per_call = total_ns / iterations

print("Total ns: " .. total_ns)
print("Ns per call: " .. ns_per_call)

latest changes in this branch report ~240ns/call (master reports 266 ns/call)
It means the original Lunatik method lookup was actually quite expensive. By pre-creating the closures in lunatik_newclass, we can optimize the lookup phase.

lneto · 2026-01-23T22:42:05Z

lunatik.h

+			continue;
+		}
+
+		// Skip metamethods (starting with __)


please, mind our style; we don't do C++ style comments

lneto · 2026-01-23T22:46:43Z

So, we can estimate what's the impact of the closure creation and the impact of synchronization itself.

Some metrics for the following script
local linux = require("linux")
local data = require("data")
local d = data.new(1024)
local iterations = 1000000

local m_getbyte = d.getbyte 

local start_ns = linux.time()
for i = 1, iterations do
    m_getbyte(d, 0)
end
local end_ns = linux.time()

local total_ns = end_ns - start_ns
print("Localized Method Total ns: " .. total_ns)
print("Localized Method Ns per call: " .. (total_ns / iterations))
master: Localized Method Ns per call: 117ns memoization approach (specifically this commit: Localized Method Ns per call: 227ns latest changes (tried something for creating closures on new_class): Localized Method Ns per call: 220ns

and for the following script
local linux = require("linux")
local data = require("data")

local d = data.new(1024)
local iterations = 1000000

local start_ns = linux.time()
for i = 1, iterations do
    d:getbyte(0)
end
local end_ns = linux.time()
local total_ns = end_ns - start_ns

local ns_per_call = total_ns / iterations

print("Total ns: " .. total_ns)
print("Ns per call: " .. ns_per_call)
latest changes in this branch report ~240ns/call (master reports 266 ns/call) It means the original Lunatik method lookup was actually quite expensive. By pre-creating the closures in lunatik_newclass, we can optimize the lookup phase.

I think it's getting very complex. I would give a step back and break up this problem in two. Firstly, I would try to implement the wrapper, without the improved error handling; then, after it's round, I would introduce the error hadling improvement. What do you think?

sneaky-potato · 2026-01-24T09:33:03Z

I think it's getting very complex. I would give a step back and break up this problem in two.

Yes, I have created a PR. Once that’s reviewed, we can continue iterating here on the remaining pieces.

lneto reviewed Jan 17, 2026

View reviewed changes