Skip to content

Conversation

@sneaky-potato
Copy link
Member

@sneaky-potato sneaky-potato commented Jan 17, 2026

This patch introduces better error handling and fixes opaque error messages by passing method name as second upvalue to lunatik_monitor.

Summary of my approach:

  • Preserve method name in lunatik_monitorobject by duplicating it on stack
  • Pass method name when pushing the c closure (lunatik_monitor) with 2 upvalues:
    • actual C function
    • preserved function name
  • Get the method name in lunatik_monitor, it is to be passed correctly to lunatik_error_handler
  • Pass lunatik_error_handler as the error handler to lua_pcall (this will update the error message)
  • Clean the stack after this operation

Fixes #382

@sneaky-potato
Copy link
Member Author

sneaky-potato commented Jan 18, 2026

Testing summary:
I tested my changes for the bind example mentioned in the issue and was able to patch the error message correctly

$ sudo lunatik run bind_bug
bad argument #2 to 'bind' (out of bounds)
stack traceback:
	[C]: in ?
	[C]: in method 'bind'
	/lib/modules/lua/bind_bug.lua:10: in main chunk
	[C]: in ?

$ sudo lunatik run syscall_bug
stack overflow

EDIT: I realized syscall.address is a module-level function, NOT a method on a userdata object. I suggest we investigate the stack overflow issue separately?

This patch introduces better error handling and fixes
opaque error messages by passing method name as second
upvalue to lunatik_monitor.

Signed-off-by: Ashwani Kumar Kamal <ashwanikamal.im421@gmail.com>
@sneaky-potato sneaky-potato marked this pull request as ready for review January 18, 2026 16:14
@lneto
Copy link
Contributor

lneto commented Jan 18, 2026

EDIT: I realized syscall.address is a module-level function, NOT a method on a userdata object. I suggest we investigate the stack overflow issue separately?

Yup, exactly. Also, it shouldn't be related to your patch, right? Did you try it on master? One thing we could do though to improve the error message in this case is to show stack trace on the driver.

@sneaky-potato
Copy link
Member Author

sneaky-potato commented Jan 18, 2026

Also, it shouldn't be related to your patch, right? Did you try it on master?

Yup this issue is unrelated to this patch. I got confused because of luaL_argcheck (still learning the embeddable nature of Lua :D)

Comment on lines +140 to +141
lua_pushvalue(L, lua_upvalueindex(2)); /* method name */
lua_pushcclosure(L, lunatik_error_handler, 1); /* stack: object, args..., errhandler */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one concern I have is creating this other closure on every call.. perhaps it's a bit of premature optimization on my side, but it would be good if you can run some benchmarks for this change.. overall, I found it quite an elegant solution.. I'm most concerned on this on hotpaths..

btw, one thing that I've left for later and never got back is to apply memoization to monitor; perhaps it's a good opportunity to do it now..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, one thing that I've left for later and never got back is to apply memoization to monitor;

Could you give me some pointers on how to memoize it? I'm thinking of caching the wrapped monitor closures (monitorobject) in the registry to avoid recreating them on every call, but I want to make sure I'm following the right pattern for Lunatik.

I thought of doing something like this:

int lunatik_monitorobject(lua_State *L)
{
    lua_pushlightuserdata(L, (void*)lunatik_monitorobject);
    lua_rawget(L, LUA_REGISTRYINDEX);
    if (!lua_isnil(L, -1))
        // found closure cached
        return 1;
    }
    lua_pop(L, 1);
    // function body
    lua_pushlightuserdata(L, (void *)lunatik_monitorobject);
    lua_pushvalue(L, -2);
    lua_rawset(L, LUA_REGISTRYINDEX);
    return 1;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should just update the object metatable, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I would measure the effect of this change before going to memoization; it's a good exercise anyway ;-).. you can mesure the time of n-calls on a luadata method (e.g., getbyte).. and assess the difference.. it might be negligible.. then we can merge this as is and leave memoization for later.. (or not ;-)

@sneaky-potato
Copy link
Member Author

sneaky-potato commented Jan 20, 2026

Hello @lneto
I've completed the benchmark comparing the current master against the new closure-based monitor implementation.
Results (1M iterations of getbyte):
Master: ~266 ns/call (avg)
This branch: ~370 ns/call (avg)
The closure creation adds roughly 104ns of overhead per call. The monitoring logic introduces a ~35 - 45% performance hit
this is the script I used

local linux = require("linux")
local data = require("data")

local d = data.new(1024)
local iterations = 1000000

local start_ns = linux.time()
for i = 1, iterations do
    d:getbyte(0)
end
local end_ns = linux.time()
local total_ns = end_ns - start_ns

local ns_per_call = total_ns / iterations

print("Total ns: " .. total_ns)
print("Ns per call: " .. ns_per_call)

@lneto
Copy link
Contributor

lneto commented Jan 20, 2026

Hello @lneto I've completed the benchmark comparing the current master against the new closure-based monitor implementation. Results (1M iterations of getbyte): Master: ~266 ns/call (avg) This branch: ~370 ns/call (avg) The closure creation adds roughly 104ns of overhead per call. The monitoring logic introduces a ~35 - 45% performance hit this is the script I used

local linux = require("linux")
local data = require("data")

local d = data.new(1024)
local iterations = 1000000

local start_ns = linux.time()
for i = 1, iterations do
    d:getbyte(0)
end
local end_ns = linux.time()
local total_ns = end_ns - start_ns

local ns_per_call = total_ns / iterations

print("Total ns: " .. total_ns)
print("Ns per call: " .. ns_per_call)

Nice job! Perhaps we can add this to our tests? 35% is looking to much for enriching error messages; what do you think? I think we should try memoization. Btw, can you disable lunatik_monitorobject on luadata (by commenting out it on its metatable) and run the same benchmark? Perhaps we are already imposing an unnecessary overhead by creating the monitor closure all the time.. memoization could help us there as well.. also the new "single" option that has been discussed for not sharing objects.. good work! Thanks!

@sneaky-potato
Copy link
Member Author

what do you think? I think we should try memoization.

Yes I agree, we should try memoization.

Btw, can you disable lunatik_monitorobject on luadata (by commenting out it on its metatable) and run the same benchmark? Perhaps we are already imposing an unnecessary overhead by creating the monitor closure all the time..

disabling the monitorobject, I can see latency drops to just ~62 ns/call (avg).
How should we go about memoizing the monitor wrapper?
I tried caching the wrapped monitor closures lazily via __index by updating the object metatable (inside lunatik_monitorobject), but that turned out to be unsafe (kernel panic).

@lneto
Copy link
Contributor

lneto commented Jan 20, 2026

what do you think? I think we should try memoization.

Yes I agree, we should try memoization.

perhaps, a better idea is to create closures on the object creation instead of memoizing it. what do you think? We could even leverage uservalues.

Btw, can you disable lunatik_monitorobject on luadata (by commenting out it on its metatable) and run the same benchmark? Perhaps we are already imposing an unnecessary overhead by creating the monitor closure all the time..

disabling the monitorobject, I can see latency drops to just ~62 ns/call (avg).

Wow; that would be a great optimization!

How should we go about memoizing the monitor wrapper? I tried caching the wrapped monitor closures lazily via __index by updating the object metatable (inside lunatik_monitorobject), but that turned out to be unsafe (kernel panic).

can you share your draft and the stack trace?

@sneaky-potato
Copy link
Member Author

sneaky-potato commented Jan 21, 2026

perhaps, a better idea is to create closures on the object creation instead of memoizing it. what do you think? We could even leverage uservalues.

That will be better yes. But where in the lifecycle will we create these closures? Won't this require too many changes in the current implementation, given that __index and lunatik_monitorobject handle dynamic lookup during lua_newobject ?

Wow; that would be a great optimization!

I have a pushed a change which memoizes the monitor, the latency for same script for latest change is ~330 ns/cal

can you share your draft and the stack trace?

I was trying to mess with metatable and __index, got a kernel panic related to some reentry issue. With the latest push, I’ve been able to safely memoize it to some extent. (330 ns/call, so 21% performance hit)

@sneaky-potato
Copy link
Member Author

This is what I could think about creating closures instead of memoization, (taken example of luadata class)

C side:

  • luadata_lnew
  • lunatik_newobject
    • userdata allocated
    • metatable
    • create uservalue table
    • create monitored closures ONCE and store in uservalue
  • return object

Lua side:

  • obj:method()
    • __index
    • return uservalue[method]

and thus we get rid of lunatik_monitorobject

what do you think @lneto ?

@lneto
Copy link
Contributor

lneto commented Jan 21, 2026

This is what I could think about creating closures instead of memoization, (taken example of luadata class)

C side:

  • luadata_lnew

  • lunatik_newobject

    • userdata allocated
    • metatable
    • create uservalue table
    • create monitored closures ONCE and store in uservalue
  • return object

Lua side:

  • obj:method()

    • __index
    • return uservalue[method]

and thus we get rid of lunatik_monitorobject

what do you think @lneto ?

I was thinking on something like this, but on lunatik_newclass (or setclass) itself, we could create our modified metatable there..

Signed-off-by: Ashwani Kumar Kamal <ashwanikamal.im421@gmail.com>
@lneto
Copy link
Contributor

lneto commented Jan 22, 2026

Hello @lneto I've completed the benchmark comparing the current master against the new closure-based monitor implementation. Results (1M iterations of getbyte): Master: ~266 ns/call (avg) This branch: ~370 ns/call (avg) The closure creation adds roughly 104ns of overhead per call. The monitoring logic introduces a ~35 - 45% performance hit this is the script I used

local linux = require("linux")
local data = require("data")

local d = data.new(1024)
local iterations = 1000000

local start_ns = linux.time()
for i = 1, iterations do
    d:getbyte(0)
end
local end_ns = linux.time()
local total_ns = end_ns - start_ns

local ns_per_call = total_ns / iterations

print("Total ns: " .. total_ns)
print("Ns per call: " .. ns_per_call)

I was thinking here.. we could also measure the impact of memoization by doing local m_getbyte = d.getbyte before the loop and the initialization of start_ns. So, we can estimate what's the impact of the closure creation and the impact of synchronization itself. What do you think? Can you run it as well?

@sneaky-potato
Copy link
Member Author

sneaky-potato commented Jan 22, 2026

So, we can estimate what's the impact of the closure creation and the impact of synchronization itself.

Some metrics for the following script

local linux = require("linux")
local data = require("data")
local d = data.new(1024)
local iterations = 1000000

local m_getbyte = d.getbyte 

local start_ns = linux.time()
for i = 1, iterations do
    m_getbyte(d, 0)
end
local end_ns = linux.time()

local total_ns = end_ns - start_ns
print("Localized Method Total ns: " .. total_ns)
print("Localized Method Ns per call: " .. (total_ns / iterations))

master: Localized Method Ns per call: 117ns
memoization approach (specifically this commit: Localized Method Ns per call: 227ns
latest changes (tried something for creating closures on new_class): Localized Method Ns per call: 220ns

and for the following script

local linux = require("linux")
local data = require("data")

local d = data.new(1024)
local iterations = 1000000

local start_ns = linux.time()
for i = 1, iterations do
    d:getbyte(0)
end
local end_ns = linux.time()
local total_ns = end_ns - start_ns

local ns_per_call = total_ns / iterations

print("Total ns: " .. total_ns)
print("Ns per call: " .. ns_per_call)

latest changes in this branch report ~240ns/call (master reports 266 ns/call)
It means the original Lunatik method lookup was actually quite expensive. By pre-creating the closures in lunatik_newclass, we can optimize the lookup phase.

continue;
}

// Skip metamethods (starting with __)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, mind our style; we don't do C++ style comments

@lneto
Copy link
Contributor

lneto commented Jan 23, 2026

So, we can estimate what's the impact of the closure creation and the impact of synchronization itself.

Some metrics for the following script

local linux = require("linux")
local data = require("data")
local d = data.new(1024)
local iterations = 1000000

local m_getbyte = d.getbyte 

local start_ns = linux.time()
for i = 1, iterations do
    m_getbyte(d, 0)
end
local end_ns = linux.time()

local total_ns = end_ns - start_ns
print("Localized Method Total ns: " .. total_ns)
print("Localized Method Ns per call: " .. (total_ns / iterations))

master: Localized Method Ns per call: 117ns memoization approach (specifically this commit: Localized Method Ns per call: 227ns latest changes (tried something for creating closures on new_class): Localized Method Ns per call: 220ns

and for the following script

local linux = require("linux")
local data = require("data")

local d = data.new(1024)
local iterations = 1000000

local start_ns = linux.time()
for i = 1, iterations do
    d:getbyte(0)
end
local end_ns = linux.time()
local total_ns = end_ns - start_ns

local ns_per_call = total_ns / iterations

print("Total ns: " .. total_ns)
print("Ns per call: " .. ns_per_call)

latest changes in this branch report ~240ns/call (master reports 266 ns/call) It means the original Lunatik method lookup was actually quite expensive. By pre-creating the closures in lunatik_newclass, we can optimize the lookup phase.

I think it's getting very complex. I would give a step back and break up this problem in two. Firstly, I would try to implement the wrapper, without the improved error handling; then, after it's round, I would introduce the error hadling improvement. What do you think?

@sneaky-potato
Copy link
Member Author

I think it's getting very complex. I would give a step back and break up this problem in two.

Yes, I have created a PR. Once that’s reviewed, we can continue iterating here on the remaining pieces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

Bug with luaL_argcheck

2 participants