Skip to content

Conversation

@Spaceghost
Copy link

I was going through the readme, checking it out, and a few things kept distracting me. So I fixed 'em.

@bdonlan
Copy link
Contributor

bdonlan commented Sep 6, 2011

Please note that pull requests are not the proper procedure to submit patches to the Linux kernel (Linus put the kernel up here because kernel.org's master mirror is down; it seems that he doesn't like the pull request system[1], but github does not allow him to disable it). Please read Documentation/SubmittingPatches - you must write a proper commit message (actually describing what changed, not just 'tinkered with'), add a Signed-Off-By line, and submit to the linux kernel mailing list.

[1] - http://blueparen.com/node/12

@Spaceghost
Copy link
Author

That's a bit too much work for the usual github stuff. Perhaps I'll just leave it alone and let the usual kernel.org hackers help out.

@Spaceghost Spaceghost closed this Sep 6, 2011
jankeromnes pushed a commit to jankeromnes/linux that referenced this pull request Nov 5, 2011
The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs
 RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [<ffffffff810d2845>] lock_acquire+0x95/0x140
  [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
  [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
  [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
  [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
  [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
  [<ffffffff811c40df>] blkdev_put+0x5f/0x190
  [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
  [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
  [<ffffffff8119003a>] deactivate_super+0x4a/0x70
  [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
  [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
  [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: [email protected]
Signed-off-by: Jens Axboe <[email protected]>
iksaif pushed a commit to iksaif/platform-drivers-x86 that referenced this pull request Nov 6, 2011
This patch validates sdev pointer in scsi_dh_activate before proceeding further.

Without this check we might see the panic as below. I have seen this
panic multiple times..

Call trace:

 #0 [ffff88007d647b50] machine_kexec at ffffffff81020902
 #1 [ffff88007d647ba0] crash_kexec at ffffffff810875b0
 #2 [ffff88007d647c70] oops_end at ffffffff8139c650
 #3 [ffff88007d647c90] __bad_area_nosemaphore at ffffffff8102dd15
 #4 [ffff88007d647d50] page_fault at ffffffff8139b8cf
    [exception RIP: scsi_dh_activate+0x82]
    RIP: ffffffffa0041922  RSP: ffff88007d647e00  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 00000000000093c5
    RDX: 00000000000093c5  RSI: ffffffffa02e6640  RDI: ffff88007cc88988
    RBP: 000000000000000f   R8: ffff88007d646000   R9: 0000000000000000
    R10: ffff880082293790  R11: 00000000ffffffff  R12: ffff88007cc88988
    R13: 0000000000000000  R14: 0000000000000286  R15: ffff880037b845e0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #5 [ffff88007d647e38] run_workqueue at ffffffff81060268
 torvalds#6 [ffff88007d647e78] worker_thread at ffffffff81060386
 torvalds#7 [ffff88007d647ee8] kthread at ffffffff81064436
 torvalds#8 [ffff88007d647f48] kernel_thread at ffffffff81003fba

Signed-off-by: Babu Moger <[email protected]>
Cc: [email protected]
Signed-off-by: James Bottomley <[email protected]>
fengguang pushed a commit to fengguang/linux that referenced this pull request Nov 7, 2011
The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs
 RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [<ffffffff810d2845>] lock_acquire+0x95/0x140
  [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
  [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
  [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
  [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
  [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
  [<ffffffff811c40df>] blkdev_put+0x5f/0x190
  [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
  [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
  [<ffffffff8119003a>] deactivate_super+0x4a/0x70
  [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
  [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
  [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo <[email protected]>
Cc: [email protected]
Signed-off-by: Jens Axboe <[email protected]>
baerwolf pushed a commit to baerwolf/linux-stephan that referenced this pull request Nov 12, 2011
commit a18a920 upstream.

This patch validates sdev pointer in scsi_dh_activate before proceeding further.

Without this check we might see the panic as below. I have seen this
panic multiple times..

Call trace:

 #0 [ffff88007d647b50] machine_kexec at ffffffff81020902
 #1 [ffff88007d647ba0] crash_kexec at ffffffff810875b0
 #2 [ffff88007d647c70] oops_end at ffffffff8139c650
 #3 [ffff88007d647c90] __bad_area_nosemaphore at ffffffff8102dd15
 #4 [ffff88007d647d50] page_fault at ffffffff8139b8cf
    [exception RIP: scsi_dh_activate+0x82]
    RIP: ffffffffa0041922  RSP: ffff88007d647e00  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 00000000000093c5
    RDX: 00000000000093c5  RSI: ffffffffa02e6640  RDI: ffff88007cc88988
    RBP: 000000000000000f   R8: ffff88007d646000   R9: 0000000000000000
    R10: ffff880082293790  R11: 00000000ffffffff  R12: ffff88007cc88988
    R13: 0000000000000000  R14: 0000000000000286  R15: ffff880037b845e0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #5 [ffff88007d647e38] run_workqueue at ffffffff81060268
 torvalds#6 [ffff88007d647e78] worker_thread at ffffffff81060386
 torvalds#7 [ffff88007d647ee8] kthread at ffffffff81064436
 torvalds#8 [ffff88007d647f48] kernel_thread at ffffffff81003fba

Signed-off-by: Babu Moger <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
baerwolf pushed a commit to baerwolf/linux-stephan that referenced this pull request Nov 12, 2011
commit f992ae8 upstream.

The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs
 RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [<ffffffff810d2845>] lock_acquire+0x95/0x140
  [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
  [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
  [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
  [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
  [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
  [<ffffffff811c40df>] blkdev_put+0x5f/0x190
  [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
  [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
  [<ffffffff8119003a>] deactivate_super+0x4a/0x70
  [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
  [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
  [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
torvalds pushed a commit that referenced this pull request Dec 15, 2011
If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.

 gdb> bt
 #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
 ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
 #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
 os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
 #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
 xffff88001e26be40) at mm/filemap.c:2600
 #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
 zed out>, pos=<value optimized out>) at mm/filemap.c:2632
 #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
 t fs/ext4/file.c:136
 #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
 ppos=0xffff88001e26bf48) at fs/read_write.c:406
 #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
 000, pos=0xffff88001e26bf48) at fs/read_write.c:435
 #9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
 4000) at fs/read_write.c:487
 #10 <signal handler called>
 #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
 #12 0x0000000000000000 in ?? ()
 gdb> print offset
 $22 = 0xffffffffffffffff
 gdb> print idx
 $23 = 0xffffffff
 gdb> print inode->i_blkbits
 $24 = 0xc
 gdb> up
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 2512                    if (ext4_da_should_update_i_disksize(page, end)) {
 gdb> print start
 $25 = 0x0
 gdb> print end
 $26 = 0xffffffffffffffff
 gdb> print pos
 $27 = 0x108000
 gdb> print new_i_size
 $28 = 0x108000
 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
 $29 = 0xd9000
 gdb> down
 2467            for (i = 0; i < idx; i++)
 gdb> print i
 $30 = 0xd44acbee

This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.

Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Cc: [email protected]
torvalds pushed a commit that referenced this pull request Jan 11, 2012
Nothing requires that we lock the filesystem until the root inode is
provided.

Also iget5_locked() triggers a warning because we are holding the
filesystem lock while allocating the inode, which result in a lockdep
suspicion that we have a lock inversion against the reclaim path:

[ 1986.896979] =================================
[ 1986.896990] [ INFO: inconsistent lock state ]
[ 1986.896997] 3.1.1-main #8
[ 1986.897001] ---------------------------------
[ 1986.897007] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[ 1986.897016] kswapd0/16 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 1986.897023]  (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c01f8bd4>] reiserfs_write_lock+0x20/0x2a
[ 1986.897044] {RECLAIM_FS-ON-W} state was registered at:
[ 1986.897050]   [<c014a5b9>] mark_held_locks+0xae/0xd0
[ 1986.897060]   [<c014aab3>] lockdep_trace_alloc+0x7d/0x91
[ 1986.897068]   [<c0190ee0>] kmem_cache_alloc+0x1a/0x93
[ 1986.897078]   [<c01e7728>] reiserfs_alloc_inode+0x13/0x3d
[ 1986.897088]   [<c01a5b06>] alloc_inode+0x14/0x5f
[ 1986.897097]   [<c01a5cb9>] iget5_locked+0x62/0x13a
[ 1986.897106]   [<c01e99e0>] reiserfs_fill_super+0x410/0x8b9
[ 1986.897114]   [<c01953da>] mount_bdev+0x10b/0x159
[ 1986.897123]   [<c01e764d>] get_super_block+0x10/0x12
[ 1986.897131]   [<c0195b38>] mount_fs+0x59/0x12d
[ 1986.897138]   [<c01a80d1>] vfs_kern_mount+0x45/0x7a
[ 1986.897147]   [<c01a83e3>] do_kern_mount+0x2f/0xb0
[ 1986.897155]   [<c01a987a>] do_mount+0x5c2/0x612
[ 1986.897163]   [<c01a9a72>] sys_mount+0x61/0x8f
[ 1986.897170]   [<c044060c>] sysenter_do_call+0x12/0x32
[ 1986.897181] irq event stamp: 7509691
[ 1986.897186] hardirqs last  enabled at (7509691): [<c0190f34>] kmem_cache_alloc+0x6e/0x93
[ 1986.897197] hardirqs last disabled at (7509690): [<c0190eea>] kmem_cache_alloc+0x24/0x93
[ 1986.897209] softirqs last  enabled at (7508896): [<c01294bd>] __do_softirq+0xee/0xfd
[ 1986.897222] softirqs last disabled at (7508859): [<c01030ed>] do_softirq+0x50/0x9d
[ 1986.897234]
[ 1986.897235] other info that might help us debug this:
[ 1986.897242]  Possible unsafe locking scenario:
[ 1986.897244]
[ 1986.897250]        CPU0
[ 1986.897254]        ----
[ 1986.897257]   lock(&REISERFS_SB(s)->lock);
[ 1986.897265] <Interrupt>
[ 1986.897269]     lock(&REISERFS_SB(s)->lock);
[ 1986.897276]
[ 1986.897277]  *** DEADLOCK ***
[ 1986.897278]
[ 1986.897286] no locks held by kswapd0/16.
[ 1986.897291]
[ 1986.897292] stack backtrace:
[ 1986.897299] Pid: 16, comm: kswapd0 Not tainted 3.1.1-main #8
[ 1986.897306] Call Trace:
[ 1986.897314]  [<c0439e76>] ? printk+0xf/0x11
[ 1986.897324]  [<c01482d1>] print_usage_bug+0x20e/0x21a
[ 1986.897332]  [<c01479b8>] ? print_irq_inversion_bug+0x172/0x172
[ 1986.897341]  [<c014855c>] mark_lock+0x27f/0x483
[ 1986.897349]  [<c0148d88>] __lock_acquire+0x628/0x1472
[ 1986.897358]  [<c0149fae>] lock_acquire+0x47/0x5e
[ 1986.897366]  [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
[ 1986.897384]  [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
[ 1986.897397]  [<c043b5ef>] mutex_lock_nested+0x35/0x26f
[ 1986.897409]  [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
[ 1986.897421]  [<c01f8bd4>] reiserfs_write_lock+0x20/0x2a
[ 1986.897433]  [<c01e2edd>] map_block_for_writepage+0xc9/0x590
[ 1986.897448]  [<c01b1706>] ? create_empty_buffers+0x33/0x8f
[ 1986.897461]  [<c0121124>] ? get_parent_ip+0xb/0x31
[ 1986.897472]  [<c043ef7f>] ? sub_preempt_count+0x81/0x8e
[ 1986.897485]  [<c043cae0>] ? _raw_spin_unlock+0x27/0x3d
[ 1986.897496]  [<c0121124>] ? get_parent_ip+0xb/0x31
[ 1986.897508]  [<c01e355d>] reiserfs_writepage+0x1b9/0x3e7
[ 1986.897521]  [<c0173b40>] ? clear_page_dirty_for_io+0xcb/0xde
[ 1986.897533]  [<c014a6e3>] ? trace_hardirqs_on_caller+0x108/0x138
[ 1986.897546]  [<c014a71e>] ? trace_hardirqs_on+0xb/0xd
[ 1986.897559]  [<c0177b38>] shrink_page_list+0x34f/0x5e2
[ 1986.897572]  [<c01780a7>] shrink_inactive_list+0x172/0x22c
[ 1986.897585]  [<c0178464>] shrink_zone+0x303/0x3b1
[ 1986.897597]  [<c043cae0>] ? _raw_spin_unlock+0x27/0x3d
[ 1986.897611]  [<c01788c9>] kswapd+0x3b7/0x5f2

The deadlock shouldn't happen since we are doing that allocation in the
mount path, the filesystem is not available for any reclaim.  Still the
warning is annoying.

To solve this, acquire the lock later only where we need it, right before
calling reiserfs_read_locked_inode() that wants to lock to walk the tree.

Reported-by: Knut Petersen <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jeff Mahoney <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Pfiver pushed a commit to Pfiver/linux that referenced this pull request Jan 16, 2012
$ wget "http://pkgs.fedoraproject.org/gitweb/?p=kernel.git;a=blob_plain;f=mac80211_offchannel_rework_revert.patch;h=859799714cd85a58450ecde4a1dabc5adffd5100;hb=refs/heads/f16" -O mac80211_offchannel_rework_revert.patch
$ patch -p1 --dry-run < mac80211_offchannel_rework_revert.patch
patching file net/mac80211/ieee80211_i.h
Hunk #1 succeeded at 702 (offset 8 lines).
Hunk #2 succeeded at 712 (offset 8 lines).
Hunk #3 succeeded at 1143 (offset -57 lines).
patching file net/mac80211/main.c
patching file net/mac80211/offchannel.c
Hunk #1 succeeded at 18 (offset 1 line).
Hunk #2 succeeded at 42 (offset 1 line).
Hunk #3 succeeded at 78 (offset 1 line).
Hunk #4 succeeded at 96 (offset 1 line).
Hunk #5 succeeded at 162 (offset 1 line).
Hunk torvalds#6 succeeded at 182 (offset 1 line).
patching file net/mac80211/rx.c
Hunk #1 succeeded at 421 (offset 4 lines).
Hunk #2 succeeded at 2864 (offset 87 lines).
patching file net/mac80211/scan.c
Hunk #1 succeeded at 213 (offset 1 line).
Hunk #2 succeeded at 256 (offset 2 lines).
Hunk #3 succeeded at 288 (offset 2 lines).
Hunk #4 succeeded at 333 (offset 2 lines).
Hunk #5 succeeded at 482 (offset 2 lines).
Hunk torvalds#6 succeeded at 498 (offset 2 lines).
Hunk torvalds#7 succeeded at 516 (offset 2 lines).
Hunk torvalds#8 succeeded at 530 (offset 2 lines).
Hunk torvalds#9 succeeded at 555 (offset 2 lines).
patching file net/mac80211/tx.c
Hunk #1 succeeded at 259 (offset 1 line).
patching file net/mac80211/work.c
Hunk #1 succeeded at 899 (offset -2 lines).
Hunk #2 succeeded at 949 (offset -2 lines).
Hunk #3 succeeded at 1046 (offset -2 lines).
Hunk #4 succeeded at 1054 (offset -2 lines).
jkstrick pushed a commit to jkstrick/linux that referenced this pull request Feb 11, 2012
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513
torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6
torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <[email protected]>
Tested-by: Ross Brattain <[email protected]>
Tested-by: Stephen Ko <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
zachariasmaladroit pushed a commit to galaxys-cm7miui-kernel/linux that referenced this pull request Feb 11, 2012
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513
torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6
torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <[email protected]>
Tested-by: Ross Brattain <[email protected]>
Tested-by: Stephen Ko <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
tworaz pushed a commit to tworaz/linux that referenced this pull request Feb 13, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8d
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
xXorAa pushed a commit to xXorAa/linux that referenced this pull request Feb 17, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8d
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Feb 23, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koct9i pushed a commit to koct9i/linux that referenced this pull request Mar 1, 2012
ata_port lifetime in libata follows the host.  In libsas it follows the
scsi_target.  Once scsi_remove_device() has caused all commands to be
completed it allows scsi_remove_target() to immediately proceed to
freeing the ata_port causing bug reports like:

[  848.393333] BUG: spinlock bad magic on CPU#4, kworker/u:2/5107
[  848.400262] general protection fault: 0000 [#1] SMP
[  848.406244] CPU 4
[  848.408310] Modules linked in: nls_utf8 ipv6 uinput i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma dca sg sd_mod sr_mod cdrom ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan]
[  848.432060]
[  848.434137] Pid: 5107, comm: kworker/u:2 Not tainted 3.2.0-isci+ torvalds#8 Intel Corporation S2600CP/S2600CP
[  848.445310] RIP: 0010:[<ffffffff8126a68c>]  [<ffffffff8126a68c>] spin_dump+0x5e/0x8c
[  848.454787] RSP: 0018:ffff8807f868dca0  EFLAGS: 00010002
[  848.461137] RAX: 0000000000000048 RBX: ffff8807fe86a630 RCX: ffffffff817d0be0
[  848.469520] RDX: 0000000000000000 RSI: ffffffff814af1cf RDI: 0000000000000002
[  848.477959] RBP: ffff8807f868dcb0 R08: 00000000ffffffff R09: 000000006b6b6b6b
[  848.486327] R10: 000000000003fb8c R11: ffffffff81a19448 R12: 6b6b6b6b6b6b6b6b
[  848.494699] R13: ffff8808027dc520 R14: 0000000000000000 R15: 000000000000001e
[  848.503067] FS:  0000000000000000(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000
[  848.512899] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  848.519710] CR2: 00007ff77d001000 CR3: 00000007f7a5d000 CR4: 00000000000406e0
[  848.528072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  848.536446] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  848.544831] Process kworker/u:2 (pid: 5107, threadinfo ffff8807f868c000, task ffff8807ff348000)
[  848.555327] Stack:
[  848.557959]  ffff8807fe86a630 ffff8807fe86a630 ffff8807f868dcd0 ffffffff8126a6e0
[  848.567072]  ffffffff817c142f ffff8807fe86a630 ffff8807f868dcf0 ffffffff8126a703
[  848.576190]  ffff8808027dc520 0000000000000286 ffff8807f868dd10 ffffffff814af1bb
[  848.585281] Call Trace:
[  848.588409]  [<ffffffff8126a6e0>] spin_bug+0x26/0x28
[  848.594357]  [<ffffffff8126a703>] do_raw_spin_unlock+0x21/0x88
[  848.601283]  [<ffffffff814af1bb>] _raw_spin_unlock_irqrestore+0x2c/0x65
[  848.609089]  [<ffffffffa001c103>] ata_scsi_port_error_handler+0x548/0x557 [libata]
[  848.618331]  [<ffffffff81061813>] ? async_schedule+0x17/0x17
[  848.625060]  [<ffffffffa004f30f>] async_sas_ata_eh+0x45/0x69 [libsas]
[  848.632655]  [<ffffffff810618aa>] async_run_entry_fn+0x97/0x125
[  848.639670]  [<ffffffff81057439>] process_one_work+0x207/0x38d
[  848.646577]  [<ffffffff8105738c>] ? process_one_work+0x15a/0x38d
[  848.653681]  [<ffffffff810576f7>] worker_thread+0x138/0x21c
[  848.660305]  [<ffffffff810575bf>] ? process_one_work+0x38d/0x38d
[  848.667493]  [<ffffffff8105b098>] kthread+0x9d/0xa5
[  848.673382]  [<ffffffff8106e1bd>] ? trace_hardirqs_on_caller+0x12f/0x166
[  848.681304]  [<ffffffff814b7704>] kernel_thread_helper+0x4/0x10
[  848.688324]  [<ffffffff814af534>] ? retint_restore_args+0x13/0x13
[  848.695530]  [<ffffffff8105affb>] ? __init_kthread_worker+0x5b/0x5b
[  848.702929]  [<ffffffff814b7700>] ? gs_change+0x13/0x13
[  848.709155] Code: 00 00 48 8d 88 38 04 00 00 44 8b 80 84 02 00 00 31 c0 e8 cf 1b 24 00 41 83 c8 ff 44 8b 4b 08 48 c7 c1 e0 0b 7d 81 4d 85 e4 74 10 <45> 8b 84 24 84 02 00 00 49 8d 8c 24 38 04 00 00 8b 53 04 48 89
[  848.732467] RIP  [<ffffffff8126a68c>] spin_dump+0x5e/0x8c
[  848.738905]  RSP <ffff8807f868dca0>
[  848.743743] ---[ end trace 143161646eee8caa ]---

...so arrange for the ata_port to have the same end of life as the domain
device.

Reported-by: Marcin Tomczak <[email protected]>
Acked-by: Jeff Garzik <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Mar 1, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Mar 19, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Mar 22, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Apr 2, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Apr 9, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Apr 11, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Apr 12, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
psanford pushed a commit to retailnext/linux that referenced this pull request Apr 16, 2012
BugLink: http://bugs.launchpad.net/bugs/890952

commit a18a920 upstream.

This patch validates sdev pointer in scsi_dh_activate before proceeding further.

Without this check we might see the panic as below. I have seen this
panic multiple times..

Call trace:

 #0 [ffff88007d647b50] machine_kexec at ffffffff81020902
 #1 [ffff88007d647ba0] crash_kexec at ffffffff810875b0
 #2 [ffff88007d647c70] oops_end at ffffffff8139c650
 #3 [ffff88007d647c90] __bad_area_nosemaphore at ffffffff8102dd15
 #4 [ffff88007d647d50] page_fault at ffffffff8139b8cf
    [exception RIP: scsi_dh_activate+0x82]
    RIP: ffffffffa0041922  RSP: ffff88007d647e00  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 00000000000093c5
    RDX: 00000000000093c5  RSI: ffffffffa02e6640  RDI: ffff88007cc88988
    RBP: 000000000000000f   R8: ffff88007d646000   R9: 0000000000000000
    R10: ffff880082293790  R11: 00000000ffffffff  R12: ffff88007cc88988
    R13: 0000000000000000  R14: 0000000000000286  R15: ffff880037b845e0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #5 [ffff88007d647e38] run_workqueue at ffffffff81060268
 torvalds#6 [ffff88007d647e78] worker_thread at ffffffff81060386
 torvalds#7 [ffff88007d647ee8] kthread at ffffffff81064436
 torvalds#8 [ffff88007d647f48] kernel_thread at ffffffff81003fba

Signed-off-by: Babu Moger <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Tim Gardner <[email protected]>
psanford pushed a commit to retailnext/linux that referenced this pull request Apr 16, 2012
BugLink: http://bugs.launchpad.net/bugs/890952

commit f992ae8 upstream.

The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs
 RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [<ffffffff810d2845>] lock_acquire+0x95/0x140
  [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
  [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
  [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
  [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
  [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
  [<ffffffff811c40df>] blkdev_put+0x5f/0x190
  [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
  [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
  [<ffffffff8119003a>] deactivate_super+0x4a/0x70
  [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
  [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
  [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Tim Gardner <[email protected]>
psanford pushed a commit to retailnext/linux that referenced this pull request Apr 16, 2012
BugLink: http://bugs.launchpad.net/bugs/907778

commit ea51d13 upstream.

If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.

 gdb> bt
 #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
 ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
 #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
 os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
 #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
 xffff88001e26be40) at mm/filemap.c:2600
 #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
 zed out>, pos=<value optimized out>) at mm/filemap.c:2632
 torvalds#6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
 t fs/ext4/file.c:136
 torvalds#7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
 ppos=0xffff88001e26bf48) at fs/read_write.c:406
 torvalds#8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
 000, pos=0xffff88001e26bf48) at fs/read_write.c:435
 torvalds#9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
 4000) at fs/read_write.c:487
 torvalds#10 <signal handler called>
 torvalds#11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
 torvalds#12 0x0000000000000000 in ?? ()
 gdb> print offset
 $22 = 0xffffffffffffffff
 gdb> print idx
 $23 = 0xffffffff
 gdb> print inode->i_blkbits
 $24 = 0xc
 gdb> up
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 2512                    if (ext4_da_should_update_i_disksize(page, end)) {
 gdb> print start
 $25 = 0x0
 gdb> print end
 $26 = 0xffffffffffffffff
 gdb> print pos
 $27 = 0x108000
 gdb> print new_i_size
 $28 = 0x108000
 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
 $29 = 0xd9000
 gdb> down
 2467            for (i = 0; i < idx; i++)
 gdb> print i
 $30 = 0xd44acbee

This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.

Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Tim Gardner <[email protected]>
Signed-off-by: Brad Figg <[email protected]>
psanford pushed a commit to retailnext/linux that referenced this pull request Apr 16, 2012
…S block during isolation for migration

BugLink: http://bugs.launchpad.net/bugs/931719

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8d
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Tim Gardner <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 29, 2025
[ Upstream commit 163e5f2 ]

When using perf record with the `--overwrite` option, a segmentation fault
occurs if an event fails to open. For example:

  perf record -e cycles-ct -F 1000 -a --overwrite
  Error:
  cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
  perf: Segmentation fault
      #0 0x6466b6 in dump_stack debug.c:366
      #1 0x646729 in sighandler_dump_stack debug.c:378
      #2 0x453fd1 in sigsegv_handler builtin-record.c:722
      #3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090]
      #4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862
      #5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943
      torvalds#6 0x458090 in record__synthesize builtin-record.c:2075
      torvalds#7 0x45a85a in __cmd_record builtin-record.c:2888
      torvalds#8 0x45deb6 in cmd_record builtin-record.c:4374
      torvalds#9 0x4e5e33 in run_builtin perf.c:349
      torvalds#10 0x4e60bf in handle_internal_command perf.c:401
      torvalds#11 0x4e6215 in run_argv perf.c:448
      torvalds#12 0x4e653a in main perf.c:555
      torvalds#13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72]
      torvalds#14 0x43a3ee in _start ??:0

The --overwrite option implies --tail-synthesize, which collects non-sample
events reflecting the system status when recording finishes. However, when
evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist
is not initialized and remains NULL. The code unconditionally calls
record__synthesize() in the error path, which iterates through the NULL
evlist pointer and causes a segfault.

To fix it, move the record__synthesize() call inside the error check block, so
it's only called when there was no error during recording, ensuring that evlist
is properly initialized.

Fixes: 4ea648a ("perf record: Add --tail-synthesize option")
Signed-off-by: Shuai Xue <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 30, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 30, 2025
Patch series "kallsyms: Prevent invalid access when showing module
buildid", v3.

We have seen nested crashes in __sprint_symbol(), see below.  They seem to
be caused by an invalid pointer to "buildid".  This patchset cleans up
kallsyms code related to module buildid and fixes this invalid access when
printing backtraces.

I made an audit of __sprint_symbol() and found several situations
when the buildid might be wrong:

  + bpf_address_lookup() does not set @modbuildid

  + ftrace_mod_address_lookup() does not set @modbuildid

  + __sprint_symbol() does not take rcu_read_lock and
    the related struct module might get removed before
    mod->build_id is printed.

This patchset solves these problems:

  + 1st, 2nd patches are preparatory
  + 3rd, 4th, 6th patches fix the above problems
  + 5th patch cleans up a suspicious initialization code.

This is the backtrace, we have seen. But it is not really important.
The problems fixed by the patchset are obvious:

  crash64> bt [62/2029]
  PID: 136151 TASK: ffff9f6c981d4000 CPU: 367 COMMAND: "btrfs"
  #0 [ffffbdb687635c28] machine_kexec at ffffffffb4c845b3
  #1 [ffffbdb687635c80] __crash_kexec at ffffffffb4d86a6a
  #2 [ffffbdb687635d08] hex_string at ffffffffb51b3b61
  #3 [ffffbdb687635d40] crash_kexec at ffffffffb4d87964
  #4 [ffffbdb687635d50] oops_end at ffffffffb4c41fc8
  #5 [ffffbdb687635d70] do_trap at ffffffffb4c3e49a
  torvalds#6 [ffffbdb687635db8] do_error_trap at ffffffffb4c3e6a4
  torvalds#7 [ffffbdb687635df8] exc_stack_segment at ffffffffb5666b33
  torvalds#8 [ffffbdb687635e20] asm_exc_stack_segment at ffffffffb5800cf9
  ...


This patch (of 7)

The function kallsyms_lookup_buildid() initializes the given @namebuf by
clearing the first and the last byte.  It is not clear why.

The 1st byte makes sense because some callers ignore the return code and
expect that the buffer contains a valid string, for example:

  - function_stat_show()
    - kallsyms_lookup()
      - kallsyms_lookup_buildid()

The initialization of the last byte does not make much sense because it
can later be overwritten.  Fortunately, it seems that all called functions
behave correctly:

  -  kallsyms_expand_symbol() explicitly adds the trailing '\0'
     at the end of the function.

  - All *__address_lookup() functions either use the safe strscpy()
    or they do not touch the buffer at all.

Document the reason for clearing the first byte.  And remove the useless
initialization of the last byte.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkman <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Luis Chamberalin <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: "Masami Hiramatsu (Google)" <[email protected]>
Cc: Petr Pavlu <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Daniel Gomez <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 30, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 30, 2025
Patch series "kallsyms: Prevent invalid access when showing module
buildid", v3.

We have seen nested crashes in __sprint_symbol(), see below.  They seem to
be caused by an invalid pointer to "buildid".  This patchset cleans up
kallsyms code related to module buildid and fixes this invalid access when
printing backtraces.

I made an audit of __sprint_symbol() and found several situations
when the buildid might be wrong:

  + bpf_address_lookup() does not set @modbuildid

  + ftrace_mod_address_lookup() does not set @modbuildid

  + __sprint_symbol() does not take rcu_read_lock and
    the related struct module might get removed before
    mod->build_id is printed.

This patchset solves these problems:

  + 1st, 2nd patches are preparatory
  + 3rd, 4th, 6th patches fix the above problems
  + 5th patch cleans up a suspicious initialization code.

This is the backtrace, we have seen. But it is not really important.
The problems fixed by the patchset are obvious:

  crash64> bt [62/2029]
  PID: 136151 TASK: ffff9f6c981d4000 CPU: 367 COMMAND: "btrfs"
  #0 [ffffbdb687635c28] machine_kexec at ffffffffb4c845b3
  #1 [ffffbdb687635c80] __crash_kexec at ffffffffb4d86a6a
  #2 [ffffbdb687635d08] hex_string at ffffffffb51b3b61
  #3 [ffffbdb687635d40] crash_kexec at ffffffffb4d87964
  #4 [ffffbdb687635d50] oops_end at ffffffffb4c41fc8
  #5 [ffffbdb687635d70] do_trap at ffffffffb4c3e49a
  torvalds#6 [ffffbdb687635db8] do_error_trap at ffffffffb4c3e6a4
  torvalds#7 [ffffbdb687635df8] exc_stack_segment at ffffffffb5666b33
  torvalds#8 [ffffbdb687635e20] asm_exc_stack_segment at ffffffffb5800cf9
  ...


This patch (of 7)

The function kallsyms_lookup_buildid() initializes the given @namebuf by
clearing the first and the last byte.  It is not clear why.

The 1st byte makes sense because some callers ignore the return code and
expect that the buffer contains a valid string, for example:

  - function_stat_show()
    - kallsyms_lookup()
      - kallsyms_lookup_buildid()

The initialization of the last byte does not make much sense because it
can later be overwritten.  Fortunately, it seems that all called functions
behave correctly:

  -  kallsyms_expand_symbol() explicitly adds the trailing '\0'
     at the end of the function.

  - All *__address_lookup() functions either use the safe strscpy()
    or they do not touch the buffer at all.

Document the reason for clearing the first byte.  And remove the useless
initialization of the last byte.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkman <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Luis Chamberalin <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: "Masami Hiramatsu (Google)" <[email protected]>
Cc: Petr Pavlu <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Daniel Gomez <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Jan 1, 2026
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Jan 1, 2026
Patch series "kallsyms: Prevent invalid access when showing module
buildid", v3.

We have seen nested crashes in __sprint_symbol(), see below.  They seem to
be caused by an invalid pointer to "buildid".  This patchset cleans up
kallsyms code related to module buildid and fixes this invalid access when
printing backtraces.

I made an audit of __sprint_symbol() and found several situations
when the buildid might be wrong:

  + bpf_address_lookup() does not set @modbuildid

  + ftrace_mod_address_lookup() does not set @modbuildid

  + __sprint_symbol() does not take rcu_read_lock and
    the related struct module might get removed before
    mod->build_id is printed.

This patchset solves these problems:

  + 1st, 2nd patches are preparatory
  + 3rd, 4th, 6th patches fix the above problems
  + 5th patch cleans up a suspicious initialization code.

This is the backtrace, we have seen. But it is not really important.
The problems fixed by the patchset are obvious:

  crash64> bt [62/2029]
  PID: 136151 TASK: ffff9f6c981d4000 CPU: 367 COMMAND: "btrfs"
  #0 [ffffbdb687635c28] machine_kexec at ffffffffb4c845b3
  #1 [ffffbdb687635c80] __crash_kexec at ffffffffb4d86a6a
  #2 [ffffbdb687635d08] hex_string at ffffffffb51b3b61
  #3 [ffffbdb687635d40] crash_kexec at ffffffffb4d87964
  #4 [ffffbdb687635d50] oops_end at ffffffffb4c41fc8
  #5 [ffffbdb687635d70] do_trap at ffffffffb4c3e49a
  torvalds#6 [ffffbdb687635db8] do_error_trap at ffffffffb4c3e6a4
  torvalds#7 [ffffbdb687635df8] exc_stack_segment at ffffffffb5666b33
  torvalds#8 [ffffbdb687635e20] asm_exc_stack_segment at ffffffffb5800cf9
  ...


This patch (of 7)

The function kallsyms_lookup_buildid() initializes the given @namebuf by
clearing the first and the last byte.  It is not clear why.

The 1st byte makes sense because some callers ignore the return code and
expect that the buffer contains a valid string, for example:

  - function_stat_show()
    - kallsyms_lookup()
      - kallsyms_lookup_buildid()

The initialization of the last byte does not make much sense because it
can later be overwritten.  Fortunately, it seems that all called functions
behave correctly:

  -  kallsyms_expand_symbol() explicitly adds the trailing '\0'
     at the end of the function.

  - All *__address_lookup() functions either use the safe strscpy()
    or they do not touch the buffer at all.

Document the reason for clearing the first byte.  And remove the useless
initialization of the last byte.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkman <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Luis Chamberalin <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: "Masami Hiramatsu (Google)" <[email protected]>
Cc: Petr Pavlu <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Daniel Gomez <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Jan 2, 2026
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Jan 2, 2026
Patch series "kallsyms: Prevent invalid access when showing module
buildid", v3.

We have seen nested crashes in __sprint_symbol(), see below.  They seem to
be caused by an invalid pointer to "buildid".  This patchset cleans up
kallsyms code related to module buildid and fixes this invalid access when
printing backtraces.

I made an audit of __sprint_symbol() and found several situations
when the buildid might be wrong:

  + bpf_address_lookup() does not set @modbuildid

  + ftrace_mod_address_lookup() does not set @modbuildid

  + __sprint_symbol() does not take rcu_read_lock and
    the related struct module might get removed before
    mod->build_id is printed.

This patchset solves these problems:

  + 1st, 2nd patches are preparatory
  + 3rd, 4th, 6th patches fix the above problems
  + 5th patch cleans up a suspicious initialization code.

This is the backtrace, we have seen. But it is not really important.
The problems fixed by the patchset are obvious:

  crash64> bt [62/2029]
  PID: 136151 TASK: ffff9f6c981d4000 CPU: 367 COMMAND: "btrfs"
  #0 [ffffbdb687635c28] machine_kexec at ffffffffb4c845b3
  #1 [ffffbdb687635c80] __crash_kexec at ffffffffb4d86a6a
  #2 [ffffbdb687635d08] hex_string at ffffffffb51b3b61
  #3 [ffffbdb687635d40] crash_kexec at ffffffffb4d87964
  #4 [ffffbdb687635d50] oops_end at ffffffffb4c41fc8
  #5 [ffffbdb687635d70] do_trap at ffffffffb4c3e49a
  torvalds#6 [ffffbdb687635db8] do_error_trap at ffffffffb4c3e6a4
  torvalds#7 [ffffbdb687635df8] exc_stack_segment at ffffffffb5666b33
  torvalds#8 [ffffbdb687635e20] asm_exc_stack_segment at ffffffffb5800cf9
  ...


This patch (of 7)

The function kallsyms_lookup_buildid() initializes the given @namebuf by
clearing the first and the last byte.  It is not clear why.

The 1st byte makes sense because some callers ignore the return code and
expect that the buffer contains a valid string, for example:

  - function_stat_show()
    - kallsyms_lookup()
      - kallsyms_lookup_buildid()

The initialization of the last byte does not make much sense because it
can later be overwritten.  Fortunately, it seems that all called functions
behave correctly:

  -  kallsyms_expand_symbol() explicitly adds the trailing '\0'
     at the end of the function.

  - All *__address_lookup() functions either use the safe strscpy()
    or they do not touch the buffer at all.

Document the reason for clearing the first byte.  And remove the useless
initialization of the last byte.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkman <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Luis Chamberalin <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: "Masami Hiramatsu (Google)" <[email protected]>
Cc: Petr Pavlu <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Daniel Gomez <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 4, 2026
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 4, 2026
Patch series "kallsyms: Prevent invalid access when showing module
buildid", v3.

We have seen nested crashes in __sprint_symbol(), see below.  They seem to
be caused by an invalid pointer to "buildid".  This patchset cleans up
kallsyms code related to module buildid and fixes this invalid access when
printing backtraces.

I made an audit of __sprint_symbol() and found several situations
when the buildid might be wrong:

  + bpf_address_lookup() does not set @modbuildid

  + ftrace_mod_address_lookup() does not set @modbuildid

  + __sprint_symbol() does not take rcu_read_lock and
    the related struct module might get removed before
    mod->build_id is printed.

This patchset solves these problems:

  + 1st, 2nd patches are preparatory
  + 3rd, 4th, 6th patches fix the above problems
  + 5th patch cleans up a suspicious initialization code.

This is the backtrace, we have seen. But it is not really important.
The problems fixed by the patchset are obvious:

  crash64> bt [62/2029]
  PID: 136151 TASK: ffff9f6c981d4000 CPU: 367 COMMAND: "btrfs"
  #0 [ffffbdb687635c28] machine_kexec at ffffffffb4c845b3
  #1 [ffffbdb687635c80] __crash_kexec at ffffffffb4d86a6a
  #2 [ffffbdb687635d08] hex_string at ffffffffb51b3b61
  #3 [ffffbdb687635d40] crash_kexec at ffffffffb4d87964
  #4 [ffffbdb687635d50] oops_end at ffffffffb4c41fc8
  #5 [ffffbdb687635d70] do_trap at ffffffffb4c3e49a
  torvalds#6 [ffffbdb687635db8] do_error_trap at ffffffffb4c3e6a4
  torvalds#7 [ffffbdb687635df8] exc_stack_segment at ffffffffb5666b33
  torvalds#8 [ffffbdb687635e20] asm_exc_stack_segment at ffffffffb5800cf9
  ...


This patch (of 7)

The function kallsyms_lookup_buildid() initializes the given @namebuf by
clearing the first and the last byte.  It is not clear why.

The 1st byte makes sense because some callers ignore the return code and
expect that the buffer contains a valid string, for example:

  - function_stat_show()
    - kallsyms_lookup()
      - kallsyms_lookup_buildid()

The initialization of the last byte does not make much sense because it
can later be overwritten.  Fortunately, it seems that all called functions
behave correctly:

  -  kallsyms_expand_symbol() explicitly adds the trailing '\0'
     at the end of the function.

  - All *__address_lookup() functions either use the safe strscpy()
    or they do not touch the buffer at all.

Document the reason for clearing the first byte.  And remove the useless
initialization of the last byte.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkman <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Luis Chamberalin <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: "Masami Hiramatsu (Google)" <[email protected]>
Cc: Petr Pavlu <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Daniel Gomez <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jan 5, 2026
[ Upstream commit 163e5f2 ]

When using perf record with the `--overwrite` option, a segmentation fault
occurs if an event fails to open. For example:

  perf record -e cycles-ct -F 1000 -a --overwrite
  Error:
  cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
  perf: Segmentation fault
      #0 0x6466b6 in dump_stack debug.c:366
      #1 0x646729 in sighandler_dump_stack debug.c:378
      #2 0x453fd1 in sigsegv_handler builtin-record.c:722
      #3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090]
      #4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862
      #5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943
      torvalds#6 0x458090 in record__synthesize builtin-record.c:2075
      torvalds#7 0x45a85a in __cmd_record builtin-record.c:2888
      torvalds#8 0x45deb6 in cmd_record builtin-record.c:4374
      torvalds#9 0x4e5e33 in run_builtin perf.c:349
      torvalds#10 0x4e60bf in handle_internal_command perf.c:401
      torvalds#11 0x4e6215 in run_argv perf.c:448
      torvalds#12 0x4e653a in main perf.c:555
      torvalds#13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72]
      torvalds#14 0x43a3ee in _start ??:0

The --overwrite option implies --tail-synthesize, which collects non-sample
events reflecting the system status when recording finishes. However, when
evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist
is not initialized and remains NULL. The code unconditionally calls
record__synthesize() in the error path, which iterates through the NULL
evlist pointer and causes a segfault.

To fix it, move the record__synthesize() call inside the error check block, so
it's only called when there was no error during recording, ensuring that evlist
is properly initialized.

Fixes: 4ea648a ("perf record: Add --tail-synthesize option")
Signed-off-by: Shuai Xue <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jan 5, 2026
Add a driver for the Unicam camera receiver block on BCM283x processors.
Compared to the bcm2835-camera driver present in staging, this driver
handles the Unicam block only (CSI-2 receiver), and doesn't depend on
the VC4 firmware running on the VPU.

The commit is made up of a series of changes cherry-picked from the
rpi-5.4.y branch of https://github.com/raspberrypi/linux/ with
additional enhancements, forward-ported to the mainline kernel.

Signed-off-by: Dave Stevenson <[email protected]>
Signed-off-by: Naushir Patuck <[email protected]>
Signed-off-by: Laurent Pinchart <[email protected]>
Reported-by: kbuild test robot <[email protected]>

media: bcm2835-unicam: Add support for get_mbus_config to set num lanes

Use the get_mbus_config pad subdev call to allow a source to use
fewer than the number of CSI2 lanes defined in device tree.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Avoid gcc warning over {0} on endpoint

Older gcc versions object to = { 0 } initialisation if the first
elemtn in the structure is a substructure.

Use = { } to avoid this compiler warning.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Reinstate V4L2_CAP_READWRITE in the caps

v4l2-compliance throws a failure if the device doesn't advertise
V4L2_CAP_READWRITE but allows read or write operations.
We do support read, so reinstate the flag.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Ensure type is VIDEO_CAPTURE in [g|s]_selection

[g|s]_selection pass in a buffer type that needs to be validated
before passing on to the sensor subdev.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835: unicam: Set VPU min clock freq to 250Mhz.

When streaming with Unicam, the VPU must have a clock frequency of at
least 250Mhz.  Otherwise, the input fifos could overrun, causing
image corruption.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Drop WARN on uing direct cache alias

Pi 0&1 pass all ARM accesses through the VPU L2 cache, therefore
the dma-ranges property sets the cache alias bits to other
than the direct alias, hence this WARN was firing.

It was overprotective coding, so assume that everything is OK
with the dma-ranges, and remove the WARN.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Always service interrupts

From when bringing up the driver, there was a check in the isr
to ignore interrupts (claiming them handled) should the driver
not be streaming.

The VPU now will not register a camera driver if it finds a
CSI2 node enabled in device tree, therefore this flawed check is
redundant.

raspberrypi/linux#3602

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835: unicam: Fix uninitialized warning

Signed-off-by: Jacko Dirks <[email protected]>

media: bcm2835-unicam: Fixup review comments from Hans.

Updates the driver based on the upstream review comments from
Hans Verkuil at https://patchwork.linuxtv.org/patch/63531/

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Retain packing information on G_FMT

The change to retrieve the pixel format always on g_fmt didn't
check whether the native or unpacked version of the format
had been requested, and always returned the packed one.
Correct this so that the packing setting is retained whereever
possible.

Fixes "9d59e89 media: bcm2835-unicam: Re-fetch mbus code from subdev
on a g_fmt call"

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: change minimum number of vb2_queue buffers to 1

Since the unicam driver was modified to write to a dummy buffer when no
user-supplied buffer is available, it can now write to and return a
buffer even when there's only a single one. Enable this by changing the
min_buffers_needed in the vb2_queue; it will be useful for enabling
still captures without allocating more memory than absolutely necessary.

Signed-off-by: David Plowman <[email protected]>

staging: vc04_services: ISP: Add a more complex ISP processing component

Driver for the BCM2835 ISP hardware block.  This driver uses the MMAL
component to program the ISP hardware through the VC firmware.

The ISP component can produce two video stream outputs, and Bayer
image statistics. This can't be encompassed in a simple V4L2
M2M device, so create a new device that registers 4 video nodes.

This patch squashes all the development patches from the earlier
rpi-5.4.y branch into one

Signed-off-by: Naushir Patuck <[email protected]>

staging/bcm2835-isp: Add the unpacked (16bpp) raw formats

Now that the firmware supports the unpacked (16bpp) variants
of the MIPI raw formats, add the mappings.

Signed-off-by: Dave Stevenson <[email protected]>

staging/bcm2835-isp: Log the number of excess supported formats

When logging that the firmware has provided more supported formats
than we had allocated storage for, log the number allocated and
returned.

Signed-off-by: Dave Stevenson <[email protected]>

staging: vc04_services: ISP: Add colour denoise control

Add colour denoise control to the bcm2835 driver through a new v4l2
control: V4L2_CID_USER_BCM2835_ISP_CDN.

Add the accompanying MMAL configuration structure definitions as well.

Signed-off-by: Naushir Patuck <[email protected]>

bcm2835-isp: Allow formats with different colour spaces.

Each supported format now includes a mask showing the allowed colour
spaces, as well as a default colour space for when one was not
specified.

Additionally we translate the colour space to mmal format and pass it
over to the VideoCore.

Signed-off-by: David Plowman <[email protected]>

media: i2c: add ov9281 driver.

Change-Id: I7b77250bbc56d2f861450cf77271ad15f9b88ab1
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: fix mclk issue when probe multiple camera.

Takes the ov9281 part only from the Rockchip's patch.

Change-Id: I30e833baf2c1bb07d6d87ddb3b00759ab45a90e4
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: add enum_frame_interval function for iq tool 2.2 and hal3

Adds the ov9281 parts of the Rockchip patch adding enum_frame_interval to
a large number of drivers.

Change-Id: I03344cd6cf278dd7c18fce8e97479089ef185a5c
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: Fixup for recent kernel releases, and remove custom code

The Rockchip driver was based on a 4.4 kernel, and had several custom
Rockchip parts.

Update to 5.4 kernel APIs, with the relevant controls required by
libcamera, and remove custom Rockchip parts.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Read chip ID via 2 reads

Vision Components have made an OV9281 module which blocks reading
back the majority of registers to comply with NDAs, and in doing
so doesn't allow auto-increment register reading as used when
reading the chip ID.

Use two reads and manually combine the results.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Add support for 8 bit readout

The sensor supports 8 bit mode as well as 10bit, so add the
relevant code to allow selection of this.

Signed-off-by: Dave Stevenson <[email protected]>

media: ov9281: Add 1280x720 and 640x480 modes

Breaks out common register set and adds the different registers
for 1280x720 (cropped) and 640x480 (skipped) modes

Signed-off-by: Dave Stevenson <[email protected]>

Fixed picture line bug in all ov9281 modes

Signed-off-by: Mathias Anhalt <[email protected]>

Added hflip and vflip controls to ov9281

Signed-off-by: Mathias Anhalt <[email protected]>

media: i2c: ov9281: Remove override of subdev name

From the original Rockchip driver, the subdev was renamed
from the default to being "mov9281 <dev_name>" whereas the
default would have been "ov9281 <dev_name>".

Remove the override to drop back to the default rather than
a vendor custom string.

Signed-off-by: Dave Stevenson <[email protected]>

media: v4l2-subdev: add subdev-wide state struct

Signed-off-by: Dom Cobley <[email protected]>

media: i2c: ov9281: Add fwnode properties controls

Add call to v4l2_ctrl_new_fwnode_properties to read and
create the fwnode based controls.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Sensor should report RAW color space

Tested on Raspberry Pi running libcamera.

Signed-off-by: David Plowman <[email protected]>

Partial revert "media: i2c: add ov9281 driver."

This partially reverts commit 84e98e3a4f3eecb168ceb80231c3e8252929892e.

The commit had merged some changes to other drivers with adding the ov9281
driver. Only the ov9281 parts have been reverted.

staging/bcm2835-isp: Fix compiler warning

The result of dividing a u32 by a size_t is an unsigned int on arm32
and a long unsigned int on arm64. Use "%zu" (the size_t format) to
remove the build warning for 64-bit builds.

Signed-off-by: Phil Elwell <[email protected]>

staging: vc04_services: isp: Set the YUV420/YVU420 format stride to 64 bytes

The bcm2835 ISP requires the base address of all input/output planes to have 32
byte alignment. Using a Y stride of 32 bytes would not guarantee that the V
plane would fulfil this, e.g. a height of 650 lines would mean the V plane
buffer is not 32 byte aligned for YUV420 formats.

Having a Y stride of 64 bytes would ensure both U and V planes have a 32 byte
alignment, as the luma height will always be an even number of lines.

Signed-off-by: Naushir Patuck <[email protected]>

vc04_services: isp: Report input node as wanting full range RAW color space

RAW color spaces are more usually reported as having full range
quantization.

Tested using libcamera.

Signed-off-by: David Plowman <[email protected]>

drivers: bcm2835_isp: Allow multiple users for the ISP driver.

Add a second (identical) set of device nodes to allow concurrent use of the ISP
hardware by another user. This change effectively creates a second state
structure (struct bcm2835_isp_dev) to maintain independent state for the second
user. Node and media entity names are appened with the instance index
appropriately.

Further users can be added by changing the BCM2835_ISP_NUM_INSTANCES define.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: bcm2835_isp: Fix div by 0 bug.

Fix a possible division by 0 bug when setting up the mmal port for the stats
port.

Signed-off-by: Naushir Patuck <[email protected]>

staging/bcm2835-isp: Fix cleanup after init fail

bcm2835_isp_remove is called on an initialisation failure, but at that
point the drvdata hasn't been set. This causes a crash when e.g. using
the cutdown firmware (gpu_mem=16).

Move platform_set_drvdata before the instance probing loop to avoid the
problem.

See: raspberrypi/linux#4774

Signed-off-by: Phil Elwell <[email protected]>

bcm2835-v4l2-isp: Add missing lock initialization

ISP device allocation is dynamic hence the locks too.
struct mutex queue_lock is not initialized which result in bug.

Fixing same by initializing it.

[   29.847138] INFO: trying to register non-static key.
[   29.847156] The code is fine but needs lockdep annotation, or maybe
[   29.847159] you didn't initialize this object before use?
[   29.847161] turning off the locking correctness validator.
[   29.847167] CPU: 1 PID: 343 Comm: v4l_id Tainted: G         C        5.15.11-rt24-v8+ torvalds#8
[   29.847187] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[   29.847194] Call trace:
[   29.847197]  dump_backtrace+0x0/0x1b8
[   29.847227]  show_stack+0x20/0x30
[   29.847240]  dump_stack_lvl+0x8c/0xb8
[   29.847254]  dump_stack+0x18/0x34
[   29.847263]  register_lock_class+0x494/0x4a0
[   29.847278]  __lock_acquire+0x80/0x1680
[   29.847289]  lock_acquire+0x214/0x3a0
[   29.847300]  mutex_lock_nested+0x70/0xc8
[   29.847312]  _vb2_fop_release+0x3c/0xa8 [videobuf2_v4l2]
[   29.847346]  vb2_fop_release+0x34/0x60 [videobuf2_v4l2]
[   29.847367]  v4l2_release+0xc8/0x108 [videodev]
[   29.847453]  __fput+0x8c/0x258
[   29.847476]  ____fput+0x18/0x28
[   29.847487]  task_work_run+0x98/0x180
[   29.847502]  do_notify_resume+0x228/0x3f8
[   29.847515]  el0_svc+0xec/0xf0
[   29.847523]  el0t_64_sync_handler+0x90/0xb8
[   29.847531]  el0t_64_sync+0x180/0x184

Signed-off-by: Padmanabha Srinivasaiah <[email protected]>

staging: vc04_services: isp: Permit all sRGB colour spaces on ISP outputs

ISP outputs actually support all colour spaces that are fundamentally
sRGB underneath, regardless of whether an RGB or YUV output format is
actually requested.

Signed-off-by: David Plowman <[email protected]>

drivers: staging: bcm2835-isp: Do not cleanup mmal vcsm buffer on stop_streaming

On stop_streaming() the vcsm buffer handle gets released by the buffer cleanup
code.  This will subsequently cause and error if userland re-queues the same
buffer on the next start_streaming() call.

Remove this cleanup code and rely on the vb2_ops->buf_cleanup() call to do the
cleanups instead.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: staging: bcm2835-isp: Clear LS table handle in the firmware

When all nodes have stopped streaming, ensure the firmware has released its
handle on the LS table dmabuf. This is done by passing a null handle in the
LS params.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: staging: bcm2835-isp: Respect caller's stride value

The stride value reported for output image buffers should be at least
as large as any value that was passed in by the caller (subject to
correct alignment for the pixel format). If the value is zero (meaning
no value was passed), or is too small, the minimum acceptable value
will be substituted.

Signed-off-by: David Plowman <[email protected]>

staging: vc04_services: bcm2835-isp: Drop include Makefile directive

Drop the include directive. They can break the build, when one only
wants to build a subdirectory. Replace with "../" for the includes in
the bcm2835-isp instead.

The fix is equivalent to the four patches between 29d49a7
("staging: vc04_services: bcm2835-audio: Drop include Makefile
directive")...2529ca2 ("staging: vc04_services: interface: Drop
include Makefile directive")

Fixes: c8f89c9551c1 ("staging: vc04_services: ISP: Add a more complex ISP processing component")
Suggested-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Kieran Bingham <[email protected]>

staging: vc04_services: bcm2835-v4l2-isp: Register with vchiq_bus_type

Register the bcm2835-v4l2-isp driver with the vchiq_bus_type instead of
using the platform driver/device.

Signed-off-by: Kieran Bingham <[email protected]>

staging: vc04_services: bcm2835-v4l2-isp: Explicitly set DMA mask

The platform model originally handled the DMA mask. Now that
we are on the vchiq_bus we need to explicitly set this.

Signed-off-by: Kieran Bingham <[email protected]>

drivers: media: bcm2835_isp: Cache LS table dmabuf

Clients such as libcamera do not change the LS table dmabuf on every
frame. In such cases instead of mapping/remapping the same dmabuf on
every frame to send to the firmware, cache the dmabuf once and only
update and remap if the dmabuf has been changed by the userland client.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Correctly handle error propagation for stream on

On a failure in start_streaming(), the error code would not propagate to
the calling function on all conditions. This would cause the userland
caller to not know of the failure.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Return early from stop_streaming() if stopped

clk_disable_unprepare() is called unconditionally in stop_streaming().
This is incorrect in the cases where start_streaming() fails, and
unprepares all clocks as part of the failure cleanup. To avoid this,
ensure that clk_disable_unprepare() is only called in stop_streaming()
if the clocks are in a prepared state.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Clear clock state when stopping streaming

Commit 65e08c465020d4c5b51afb452efc2246d80fd66f failed to clear the
clock state when the device stopped streaming. Fix this, as it might
again cause the same problems when doing an unprepare.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Fix bug in buffer swapping logic

If multiple sets of interrupts occur simultaneously, it may be unsafe
to swap buffers, as the hardware may already be re-using the current
buffers. In such cases, avoid swapping buffers, and wait for the next
opportunity at the Frame End interrupt to signal completion.

Additionally, check the packet compare status when watching for frame
end for buffers swaps, as this could also signify a frame end event.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Forward input status from subdevice

The vidioc_enum_input() v4l2 ioctl is capable of returning
sensor/input status as well. This is used in current
GStreamer HEAD for signal detection [1].

bcm2835-unicam does handle this syscall, but it didn't ask
the subdevice driver about the input status. The input then
appeared as always present.

This commit adds the necessary query. There is a precedent for
this - the R-Car VIN V4L2 driver does a similar call [2].

[1]: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/-/blob/ce0be27caf69aa9d96b73bc2b50737451b6f6936/sys/v4l2/gstv4l2src.c#L553
[2]: https://github.com/raspberrypi/linux/blob/7fb9d006d3ff3baf2e205e0c85c4e4fd0a64fcd0/drivers/media/platform/rcar-vin/rcar-v4l2.c#L548

Signed-off-by: Jakub Vaněk <[email protected]>

media/bcm2835-unicam: Parse pad numbers correctly

The driver was making big assumptions about the source device
using pad 0 and 1, which doesn't follow for more complex
devices where Unicam's source device may be a sink device for
something else.

Read the pad numbers through media controller, and reference
them appropriately.

Signed-off-by: Dave Stevenson <[email protected]>

media/bcm2835-unicam: Add support for configuration via MC API

Adds Media Controller API support for more complex pipelines.
libcamera is about to switch to using this mechanism for configuring
sensors.

This can be enabled by either a module parameter, or device tree.

Various functions have been moved to group video-centric and
mc-centric functions together.

Based on a similar conversion done to ti-vpe.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Fixup for 5.18 and new get_mbus_config struct

The number of active CSI2 data lanes has moved within the struct
v4l2_mbus_config used by the get_mbus_config API call.
Update the driver to match the changes in mainline.

Signed-off-by: Dave Stevenson <[email protected]>

drivers: bcm2835_unicam: Add logging message when a frame is dropped.

If a dummy buffer is still active on a frame start, it indicates that this frame
will be dropped. The explicit logging helps users identify performance issues.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: bcm2835_unicam: Disable trigger mode operation

On a Pi3 B/B+ platform the imx219 sensor frequently generates a single corrupt
frame when the sensor first starts. This can either be a missing line, or
invalid samples within the line. This only occurrs using the Unicam kernel
driver.

Disabling trigger mode elimiates this corruption. Since trigger mode is a
legacy feature copied from the firmware driver and not expected to be needed,
remove it. Tested on the Raspberry Pi cameras and shows no ill effects.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Set ret on error path in unicam_async_complete()

Clang warns:

  drivers/media/platform/bcm2835/bcm2835-unicam.c:3109:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
          if (!source_pads) {
              ^~~~~~~~~~~~
  drivers/media/platform/bcm2835/bcm2835-unicam.c:3152:9: note: uninitialized use occurs here
          return ret;
                 ^~~
  drivers/media/platform/bcm2835/bcm2835-unicam.c:3109:2: note: remove the 'if' if its condition is always false
          if (!source_pads) {
          ^~~~~~~~~~~~~~~~~~~
  drivers/media/platform/bcm2835/bcm2835-unicam.c:3091:9: note: initialize the variable 'ret' to silence this warning
          int ret;
                 ^
                  = 0
  1 warning generated.

When the if condition is true, ret will be used uninitialized, which
could result in undesirable behavior. Set ret to -ENODEV on the error
path, which is a standard error code for the ->complete() callback.

Fixes: d056e86eb35f ("media/bcm2835-unicam: Parse pad numbers correctly")
Signed-off-by: Nathan Chancellor <[email protected]>

media: bcm2835-unicam: Handle a repeated frame start with no end

In the case of 2 frame starts being received with no frame end
between, the queued buffer held in next_frm was lost as the
pointer was overwritten with the dummy buffer.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Correctly handle FS + FE ISR condtion

If we get a simultaneous FS + FE interrupt for the same frame, it cannot be
marked as completed and returned to userland as the framebuffer will be refilled
by Unicam on the next sensor frame. Additionally, the timestamp will be set to 0
as the FS interrupt handling code will not have run yet.

To avoid these problems, the frame is considered dropped in the FE handler,
and will be returned to userland on the subsequent sensor frame.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Fix for possible dummy buffer overrun

The Unicam hardware has been observed to cause a buffer overrun when using the
dummy buffer as a circular buffer. The conditions that cause the overrun are not
fully known, but it seems to occur when the memory bus is heavily loaded.

To avoid the overrun, program the hardware with a buffer size of 0 when using
the dummy buffer. This will cause overrun into the allocated dummy buffer, but
avoid out of bounds writes.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Fix up start/stop api change

Signed-off-by: Dom Cobley <[email protected]>

media: bcm2835-unicam: Use mipi-csi2.h header for data type values

The MIPI CSI2 data type ID values are now defined in the
mipi-csi2.h header, so use those defines instead of hard
coding them in the driver.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Add support for RAW16 formats

With the RAW16 formats now having a defined CSI2 data type ID,
they can be added to the driver.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Start and stop media_pipeline with same node

media_pipeline_start and media_pipeline_stop now validate that
the pipeline is being started and stopped with the same pipe
and pad handles.
When running with embedded metadata (eg imx477 and imx708), the
start typically happens from the metadata pad, whilst stop is
always from the image pad.

Always pass the image pad to media_pipeline_start to ensure
that the calls are balanced.

Signed-off-by: Dave Stevenson <[email protected]>

drivers: media: bcm2835_unicam: Improve frame sequence count handling

Ensure that the frame sequence counter is incremented only if a previous
frame start interrupt has occurred, or a frame start + frame end has
occurred simultaneously.

This corresponds the sequence number with the actual number of frames
produced by the sensor, not the number of frame buffers dequeued back
to userland.

Signed-off-by: Naushir Patuck <[email protected]>

bcm2835-unicam: hacks to allow it to build

media: bcm2835-unicam: Fix up async notifier usage

Fixes "8a090fc3e549 bcm2835-unicam: hacks to allow it to build"

media: bcm2835-unicam: Add option for a GPIO to reflect FS/FE timing

The legacy stack had an option to have a GPIO track frame start and
end events to give basic synchronisation to the incoming image stream.
https://forums.raspberrypi.com/viewtopic.php?t=190314

Replicate this in the kernel Unicam driver.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Add support for 12bit mono packed format

Now that V4L2_PIX_FMT_Y12P is defined, allow passing raw 12bit
mono packed data through the peripheral.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Add support for 14bit mono sources

Now that V4L2_PIX_FMT_Y14 and V4L2_PIX_FMT_Y14P are defined,
allow passing 14bit mono data through the peripheral.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Add support for unpacked 14bit Bayer formats

Now that the 14bit non-packed Bayer formats are defined, add them
into the supported formats lookup table.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Reinstate old downstream driver as legacy

Whilst the Unicam driver has now been upstreamed it only supports
configuration via Media Controller (not driven from the /dev/videoN
node), which makes life significantly harder for simple devices such
as mono sensors, and HDMI or analogue video to CSI2 bridge chips
(eg TC358743 and ADV7282M).

Fix up the downstream driver so that it builds, reinstate the links
from Kconfig and Makefile to it, and give it a new Kconfig name
(VIDEO_BCM2835_UNICAM_LEGACY).

Signed-off-by: Dave Stevenson <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jan 5, 2026
Driver for the BCM2835 ISP hardware block.  This driver uses the MMAL
component to program the ISP hardware through the VC firmware.

The ISP component can produce two video stream outputs, and Bayer
image statistics. This can't be encompassed in a simple V4L2
M2M device, so create a new device that registers 4 video nodes.

This patch squashes all the development patches from the earlier
rpi-5.4.y branch into one

Signed-off-by: Naushir Patuck <[email protected]>

staging/bcm2835-isp: Add the unpacked (16bpp) raw formats

Now that the firmware supports the unpacked (16bpp) variants
of the MIPI raw formats, add the mappings.

Signed-off-by: Dave Stevenson <[email protected]>

staging/bcm2835-isp: Log the number of excess supported formats

When logging that the firmware has provided more supported formats
than we had allocated storage for, log the number allocated and
returned.

Signed-off-by: Dave Stevenson <[email protected]>

staging: vc04_services: ISP: Add colour denoise control

Add colour denoise control to the bcm2835 driver through a new v4l2
control: V4L2_CID_USER_BCM2835_ISP_CDN.

Add the accompanying MMAL configuration structure definitions as well.

Signed-off-by: Naushir Patuck <[email protected]>

bcm2835-isp: Allow formats with different colour spaces.

Each supported format now includes a mask showing the allowed colour
spaces, as well as a default colour space for when one was not
specified.

Additionally we translate the colour space to mmal format and pass it
over to the VideoCore.

Signed-off-by: David Plowman <[email protected]>

media: i2c: add ov9281 driver.

Change-Id: I7b77250bbc56d2f861450cf77271ad15f9b88ab1
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: fix mclk issue when probe multiple camera.

Takes the ov9281 part only from the Rockchip's patch.

Change-Id: I30e833baf2c1bb07d6d87ddb3b00759ab45a90e4
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: add enum_frame_interval function for iq tool 2.2 and hal3

Adds the ov9281 parts of the Rockchip patch adding enum_frame_interval to
a large number of drivers.

Change-Id: I03344cd6cf278dd7c18fce8e97479089ef185a5c
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: Fixup for recent kernel releases, and remove custom code

The Rockchip driver was based on a 4.4 kernel, and had several custom
Rockchip parts.

Update to 5.4 kernel APIs, with the relevant controls required by
libcamera, and remove custom Rockchip parts.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Read chip ID via 2 reads

Vision Components have made an OV9281 module which blocks reading
back the majority of registers to comply with NDAs, and in doing
so doesn't allow auto-increment register reading as used when
reading the chip ID.

Use two reads and manually combine the results.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Add support for 8 bit readout

The sensor supports 8 bit mode as well as 10bit, so add the
relevant code to allow selection of this.

Signed-off-by: Dave Stevenson <[email protected]>

media: ov9281: Add 1280x720 and 640x480 modes

Breaks out common register set and adds the different registers
for 1280x720 (cropped) and 640x480 (skipped) modes

Signed-off-by: Dave Stevenson <[email protected]>

Fixed picture line bug in all ov9281 modes

Signed-off-by: Mathias Anhalt <[email protected]>

Added hflip and vflip controls to ov9281

Signed-off-by: Mathias Anhalt <[email protected]>

media: i2c: ov9281: Remove override of subdev name

From the original Rockchip driver, the subdev was renamed
from the default to being "mov9281 <dev_name>" whereas the
default would have been "ov9281 <dev_name>".

Remove the override to drop back to the default rather than
a vendor custom string.

Signed-off-by: Dave Stevenson <[email protected]>

media: v4l2-subdev: add subdev-wide state struct

Signed-off-by: Dom Cobley <[email protected]>

media: i2c: ov9281: Add fwnode properties controls

Add call to v4l2_ctrl_new_fwnode_properties to read and
create the fwnode based controls.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Sensor should report RAW color space

Tested on Raspberry Pi running libcamera.

Signed-off-by: David Plowman <[email protected]>

Partial revert "media: i2c: add ov9281 driver."

This partially reverts commit 84e98e3a4f3eecb168ceb80231c3e8252929892e.

The commit had merged some changes to other drivers with adding the ov9281
driver. Only the ov9281 parts have been reverted.

staging/bcm2835-isp: Fix compiler warning

The result of dividing a u32 by a size_t is an unsigned int on arm32
and a long unsigned int on arm64. Use "%zu" (the size_t format) to
remove the build warning for 64-bit builds.

Signed-off-by: Phil Elwell <[email protected]>

staging: vc04_services: isp: Set the YUV420/YVU420 format stride to 64 bytes

The bcm2835 ISP requires the base address of all input/output planes to have 32
byte alignment. Using a Y stride of 32 bytes would not guarantee that the V
plane would fulfil this, e.g. a height of 650 lines would mean the V plane
buffer is not 32 byte aligned for YUV420 formats.

Having a Y stride of 64 bytes would ensure both U and V planes have a 32 byte
alignment, as the luma height will always be an even number of lines.

Signed-off-by: Naushir Patuck <[email protected]>

vc04_services: isp: Report input node as wanting full range RAW color space

RAW color spaces are more usually reported as having full range
quantization.

Tested using libcamera.

Signed-off-by: David Plowman <[email protected]>

drivers: bcm2835_isp: Allow multiple users for the ISP driver.

Add a second (identical) set of device nodes to allow concurrent use of the ISP
hardware by another user. This change effectively creates a second state
structure (struct bcm2835_isp_dev) to maintain independent state for the second
user. Node and media entity names are appened with the instance index
appropriately.

Further users can be added by changing the BCM2835_ISP_NUM_INSTANCES define.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: bcm2835_isp: Fix div by 0 bug.

Fix a possible division by 0 bug when setting up the mmal port for the stats
port.

Signed-off-by: Naushir Patuck <[email protected]>

staging/bcm2835-isp: Fix cleanup after init fail

bcm2835_isp_remove is called on an initialisation failure, but at that
point the drvdata hasn't been set. This causes a crash when e.g. using
the cutdown firmware (gpu_mem=16).

Move platform_set_drvdata before the instance probing loop to avoid the
problem.

See: raspberrypi/linux#4774

Signed-off-by: Phil Elwell <[email protected]>

bcm2835-v4l2-isp: Add missing lock initialization

ISP device allocation is dynamic hence the locks too.
struct mutex queue_lock is not initialized which result in bug.

Fixing same by initializing it.

[   29.847138] INFO: trying to register non-static key.
[   29.847156] The code is fine but needs lockdep annotation, or maybe
[   29.847159] you didn't initialize this object before use?
[   29.847161] turning off the locking correctness validator.
[   29.847167] CPU: 1 PID: 343 Comm: v4l_id Tainted: G         C        5.15.11-rt24-v8+ torvalds#8
[   29.847187] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[   29.847194] Call trace:
[   29.847197]  dump_backtrace+0x0/0x1b8
[   29.847227]  show_stack+0x20/0x30
[   29.847240]  dump_stack_lvl+0x8c/0xb8
[   29.847254]  dump_stack+0x18/0x34
[   29.847263]  register_lock_class+0x494/0x4a0
[   29.847278]  __lock_acquire+0x80/0x1680
[   29.847289]  lock_acquire+0x214/0x3a0
[   29.847300]  mutex_lock_nested+0x70/0xc8
[   29.847312]  _vb2_fop_release+0x3c/0xa8 [videobuf2_v4l2]
[   29.847346]  vb2_fop_release+0x34/0x60 [videobuf2_v4l2]
[   29.847367]  v4l2_release+0xc8/0x108 [videodev]
[   29.847453]  __fput+0x8c/0x258
[   29.847476]  ____fput+0x18/0x28
[   29.847487]  task_work_run+0x98/0x180
[   29.847502]  do_notify_resume+0x228/0x3f8
[   29.847515]  el0_svc+0xec/0xf0
[   29.847523]  el0t_64_sync_handler+0x90/0xb8
[   29.847531]  el0t_64_sync+0x180/0x184

Signed-off-by: Padmanabha Srinivasaiah <[email protected]>

staging: vc04_services: isp: Permit all sRGB colour spaces on ISP outputs

ISP outputs actually support all colour spaces that are fundamentally
sRGB underneath, regardless of whether an RGB or YUV output format is
actually requested.

Signed-off-by: David Plowman <[email protected]>

drivers: staging: bcm2835-isp: Do not cleanup mmal vcsm buffer on stop_streaming

On stop_streaming() the vcsm buffer handle gets released by the buffer cleanup
code.  This will subsequently cause and error if userland re-queues the same
buffer on the next start_streaming() call.

Remove this cleanup code and rely on the vb2_ops->buf_cleanup() call to do the
cleanups instead.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: staging: bcm2835-isp: Clear LS table handle in the firmware

When all nodes have stopped streaming, ensure the firmware has released its
handle on the LS table dmabuf. This is done by passing a null handle in the
LS params.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: staging: bcm2835-isp: Respect caller's stride value

The stride value reported for output image buffers should be at least
as large as any value that was passed in by the caller (subject to
correct alignment for the pixel format). If the value is zero (meaning
no value was passed), or is too small, the minimum acceptable value
will be substituted.

Signed-off-by: David Plowman <[email protected]>

staging: vc04_services: bcm2835-isp: Drop include Makefile directive

Drop the include directive. They can break the build, when one only
wants to build a subdirectory. Replace with "../" for the includes in
the bcm2835-isp instead.

The fix is equivalent to the four patches between 29d49a7
("staging: vc04_services: bcm2835-audio: Drop include Makefile
directive")...2529ca2 ("staging: vc04_services: interface: Drop
include Makefile directive")

Fixes: c8f89c9551c1 ("staging: vc04_services: ISP: Add a more complex ISP processing component")
Suggested-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Kieran Bingham <[email protected]>

staging: vc04_services: bcm2835-v4l2-isp: Register with vchiq_bus_type

Register the bcm2835-v4l2-isp driver with the vchiq_bus_type instead of
using the platform driver/device.

Signed-off-by: Kieran Bingham <[email protected]>

staging: vc04_services: bcm2835-v4l2-isp: Explicitly set DMA mask

The platform model originally handled the DMA mask. Now that
we are on the vchiq_bus we need to explicitly set this.

Signed-off-by: Kieran Bingham <[email protected]>

drivers: media: bcm2835_isp: Cache LS table dmabuf

Clients such as libcamera do not change the LS table dmabuf on every
frame. In such cases instead of mapping/remapping the same dmabuf on
every frame to send to the firmware, cache the dmabuf once and only
update and remap if the dmabuf has been changed by the userland client.

Signed-off-by: Naushir Patuck <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Jan 6, 2026
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Jan 6, 2026
Patch series "kallsyms: Prevent invalid access when showing module
buildid", v3.

We have seen nested crashes in __sprint_symbol(), see below.  They seem to
be caused by an invalid pointer to "buildid".  This patchset cleans up
kallsyms code related to module buildid and fixes this invalid access when
printing backtraces.

I made an audit of __sprint_symbol() and found several situations
when the buildid might be wrong:

  + bpf_address_lookup() does not set @modbuildid

  + ftrace_mod_address_lookup() does not set @modbuildid

  + __sprint_symbol() does not take rcu_read_lock and
    the related struct module might get removed before
    mod->build_id is printed.

This patchset solves these problems:

  + 1st, 2nd patches are preparatory
  + 3rd, 4th, 6th patches fix the above problems
  + 5th patch cleans up a suspicious initialization code.

This is the backtrace, we have seen. But it is not really important.
The problems fixed by the patchset are obvious:

  crash64> bt [62/2029]
  PID: 136151 TASK: ffff9f6c981d4000 CPU: 367 COMMAND: "btrfs"
  #0 [ffffbdb687635c28] machine_kexec at ffffffffb4c845b3
  #1 [ffffbdb687635c80] __crash_kexec at ffffffffb4d86a6a
  #2 [ffffbdb687635d08] hex_string at ffffffffb51b3b61
  #3 [ffffbdb687635d40] crash_kexec at ffffffffb4d87964
  #4 [ffffbdb687635d50] oops_end at ffffffffb4c41fc8
  #5 [ffffbdb687635d70] do_trap at ffffffffb4c3e49a
  torvalds#6 [ffffbdb687635db8] do_error_trap at ffffffffb4c3e6a4
  torvalds#7 [ffffbdb687635df8] exc_stack_segment at ffffffffb5666b33
  torvalds#8 [ffffbdb687635e20] asm_exc_stack_segment at ffffffffb5800cf9
  ...


This patch (of 7)

The function kallsyms_lookup_buildid() initializes the given @namebuf by
clearing the first and the last byte.  It is not clear why.

The 1st byte makes sense because some callers ignore the return code and
expect that the buffer contains a valid string, for example:

  - function_stat_show()
    - kallsyms_lookup()
      - kallsyms_lookup_buildid()

The initialization of the last byte does not make much sense because it
can later be overwritten.  Fortunately, it seems that all called functions
behave correctly:

  -  kallsyms_expand_symbol() explicitly adds the trailing '\0'
     at the end of the function.

  - All *__address_lookup() functions either use the safe strscpy()
    or they do not touch the buffer at all.

Document the reason for clearing the first byte.  And remove the useless
initialization of the last byte.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkman <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Luis Chamberalin <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: "Masami Hiramatsu (Google)" <[email protected]>
Cc: Petr Pavlu <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Daniel Gomez <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Jan 6, 2026
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Jan 6, 2026
Patch series "kallsyms: Prevent invalid access when showing module
buildid", v3.

We have seen nested crashes in __sprint_symbol(), see below.  They seem to
be caused by an invalid pointer to "buildid".  This patchset cleans up
kallsyms code related to module buildid and fixes this invalid access when
printing backtraces.

I made an audit of __sprint_symbol() and found several situations
when the buildid might be wrong:

  + bpf_address_lookup() does not set @modbuildid

  + ftrace_mod_address_lookup() does not set @modbuildid

  + __sprint_symbol() does not take rcu_read_lock and
    the related struct module might get removed before
    mod->build_id is printed.

This patchset solves these problems:

  + 1st, 2nd patches are preparatory
  + 3rd, 4th, 6th patches fix the above problems
  + 5th patch cleans up a suspicious initialization code.

This is the backtrace, we have seen. But it is not really important.
The problems fixed by the patchset are obvious:

  crash64> bt [62/2029]
  PID: 136151 TASK: ffff9f6c981d4000 CPU: 367 COMMAND: "btrfs"
  #0 [ffffbdb687635c28] machine_kexec at ffffffffb4c845b3
  #1 [ffffbdb687635c80] __crash_kexec at ffffffffb4d86a6a
  #2 [ffffbdb687635d08] hex_string at ffffffffb51b3b61
  #3 [ffffbdb687635d40] crash_kexec at ffffffffb4d87964
  #4 [ffffbdb687635d50] oops_end at ffffffffb4c41fc8
  #5 [ffffbdb687635d70] do_trap at ffffffffb4c3e49a
  torvalds#6 [ffffbdb687635db8] do_error_trap at ffffffffb4c3e6a4
  torvalds#7 [ffffbdb687635df8] exc_stack_segment at ffffffffb5666b33
  torvalds#8 [ffffbdb687635e20] asm_exc_stack_segment at ffffffffb5800cf9
  ...


This patch (of 7)

The function kallsyms_lookup_buildid() initializes the given @namebuf by
clearing the first and the last byte.  It is not clear why.

The 1st byte makes sense because some callers ignore the return code and
expect that the buffer contains a valid string, for example:

  - function_stat_show()
    - kallsyms_lookup()
      - kallsyms_lookup_buildid()

The initialization of the last byte does not make much sense because it
can later be overwritten.  Fortunately, it seems that all called functions
behave correctly:

  -  kallsyms_expand_symbol() explicitly adds the trailing '\0'
     at the end of the function.

  - All *__address_lookup() functions either use the safe strscpy()
    or they do not touch the buffer at all.

Document the reason for clearing the first byte.  And remove the useless
initialization of the last byte.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkman <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Luis Chamberalin <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: "Masami Hiramatsu (Google)" <[email protected]>
Cc: Petr Pavlu <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Daniel Gomez <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 6, 2026
Do not use memcpy() to extract syscall arguments from struct pt_regs
but rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments()
and syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance
benchmarks from perf bench basic syscall on kunpeng920 gives roughly
a 1% performance uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

[1]: https://lore.kernel.org/all/[email protected]/

Signed-off-by: Jinjie Ruan <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jan 6, 2026
[ Upstream commit 163e5f2 ]

When using perf record with the `--overwrite` option, a segmentation fault
occurs if an event fails to open. For example:

  perf record -e cycles-ct -F 1000 -a --overwrite
  Error:
  cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
  perf: Segmentation fault
      #0 0x6466b6 in dump_stack debug.c:366
      #1 0x646729 in sighandler_dump_stack debug.c:378
      #2 0x453fd1 in sigsegv_handler builtin-record.c:722
      #3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090]
      #4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862
      #5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943
      torvalds#6 0x458090 in record__synthesize builtin-record.c:2075
      torvalds#7 0x45a85a in __cmd_record builtin-record.c:2888
      torvalds#8 0x45deb6 in cmd_record builtin-record.c:4374
      torvalds#9 0x4e5e33 in run_builtin perf.c:349
      torvalds#10 0x4e60bf in handle_internal_command perf.c:401
      torvalds#11 0x4e6215 in run_argv perf.c:448
      torvalds#12 0x4e653a in main perf.c:555
      torvalds#13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72]
      torvalds#14 0x43a3ee in _start ??:0

The --overwrite option implies --tail-synthesize, which collects non-sample
events reflecting the system status when recording finishes. However, when
evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist
is not initialized and remains NULL. The code unconditionally calls
record__synthesize() in the error path, which iterates through the NULL
evlist pointer and causes a segfault.

To fix it, move the record__synthesize() call inside the error check block, so
it's only called when there was no error during recording, ensuring that evlist
is properly initialized.

Fixes: 4ea648a ("perf record: Add --tail-synthesize option")
Signed-off-by: Shuai Xue <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jan 6, 2026
[ Upstream commit 9ab29ed ]

It is reported that on Acer Nitro V15 suspend only works properly if the
keyboard backlight is turned off. In looking through the issue Acer Nitro
V15 has a GPIO (torvalds#8) specified in _AEI but it has no matching notify device
in _EVT. The values for GPIO torvalds#8 change as keyboard backlight is turned on
and off.

This makes it seem that GPIO torvalds#8 is actually supposed to be solely for
keyboard backlight.  Turning off the interrupt for this GPIO fixes the issue.
Add a quirk that does just that.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4169
Signed-off-by: Mario Limonciello <[email protected]>
Acked-by: Mika Westerberg <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Stable-dep-of: 2d96731 ("gpiolib: acpi: Add quirk for Dell Precision 7780")
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Jan 6, 2026
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Jan 6, 2026
Patch series "kallsyms: Prevent invalid access when showing module
buildid", v3.

We have seen nested crashes in __sprint_symbol(), see below.  They seem to
be caused by an invalid pointer to "buildid".  This patchset cleans up
kallsyms code related to module buildid and fixes this invalid access when
printing backtraces.

I made an audit of __sprint_symbol() and found several situations
when the buildid might be wrong:

  + bpf_address_lookup() does not set @modbuildid

  + ftrace_mod_address_lookup() does not set @modbuildid

  + __sprint_symbol() does not take rcu_read_lock and
    the related struct module might get removed before
    mod->build_id is printed.

This patchset solves these problems:

  + 1st, 2nd patches are preparatory
  + 3rd, 4th, 6th patches fix the above problems
  + 5th patch cleans up a suspicious initialization code.

This is the backtrace, we have seen. But it is not really important.
The problems fixed by the patchset are obvious:

  crash64> bt [62/2029]
  PID: 136151 TASK: ffff9f6c981d4000 CPU: 367 COMMAND: "btrfs"
  #0 [ffffbdb687635c28] machine_kexec at ffffffffb4c845b3
  #1 [ffffbdb687635c80] __crash_kexec at ffffffffb4d86a6a
  #2 [ffffbdb687635d08] hex_string at ffffffffb51b3b61
  #3 [ffffbdb687635d40] crash_kexec at ffffffffb4d87964
  #4 [ffffbdb687635d50] oops_end at ffffffffb4c41fc8
  #5 [ffffbdb687635d70] do_trap at ffffffffb4c3e49a
  torvalds#6 [ffffbdb687635db8] do_error_trap at ffffffffb4c3e6a4
  torvalds#7 [ffffbdb687635df8] exc_stack_segment at ffffffffb5666b33
  torvalds#8 [ffffbdb687635e20] asm_exc_stack_segment at ffffffffb5800cf9
  ...


This patch (of 7)

The function kallsyms_lookup_buildid() initializes the given @namebuf by
clearing the first and the last byte.  It is not clear why.

The 1st byte makes sense because some callers ignore the return code and
expect that the buffer contains a valid string, for example:

  - function_stat_show()
    - kallsyms_lookup()
      - kallsyms_lookup_buildid()

The initialization of the last byte does not make much sense because it
can later be overwritten.  Fortunately, it seems that all called functions
behave correctly:

  -  kallsyms_expand_symbol() explicitly adds the trailing '\0'
     at the end of the function.

  - All *__address_lookup() functions either use the safe strscpy()
    or they do not touch the buffer at all.

Document the reason for clearing the first byte.  And remove the useless
initialization of the last byte.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkman <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Luis Chamberalin <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: "Masami Hiramatsu (Google)" <[email protected]>
Cc: Petr Pavlu <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Daniel Gomez <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 8, 2026
Patch series "kallsyms: Prevent invalid access when showing module
buildid", v3.

We have seen nested crashes in __sprint_symbol(), see below.  They seem to
be caused by an invalid pointer to "buildid".  This patchset cleans up
kallsyms code related to module buildid and fixes this invalid access when
printing backtraces.

I made an audit of __sprint_symbol() and found several situations
when the buildid might be wrong:

  + bpf_address_lookup() does not set @modbuildid

  + ftrace_mod_address_lookup() does not set @modbuildid

  + __sprint_symbol() does not take rcu_read_lock and
    the related struct module might get removed before
    mod->build_id is printed.

This patchset solves these problems:

  + 1st, 2nd patches are preparatory
  + 3rd, 4th, 6th patches fix the above problems
  + 5th patch cleans up a suspicious initialization code.

This is the backtrace, we have seen. But it is not really important.
The problems fixed by the patchset are obvious:

  crash64> bt [62/2029]
  PID: 136151 TASK: ffff9f6c981d4000 CPU: 367 COMMAND: "btrfs"
  #0 [ffffbdb687635c28] machine_kexec at ffffffffb4c845b3
  #1 [ffffbdb687635c80] __crash_kexec at ffffffffb4d86a6a
  #2 [ffffbdb687635d08] hex_string at ffffffffb51b3b61
  #3 [ffffbdb687635d40] crash_kexec at ffffffffb4d87964
  #4 [ffffbdb687635d50] oops_end at ffffffffb4c41fc8
  #5 [ffffbdb687635d70] do_trap at ffffffffb4c3e49a
  torvalds#6 [ffffbdb687635db8] do_error_trap at ffffffffb4c3e6a4
  torvalds#7 [ffffbdb687635df8] exc_stack_segment at ffffffffb5666b33
  torvalds#8 [ffffbdb687635e20] asm_exc_stack_segment at ffffffffb5800cf9
  ...


This patch (of 7)

The function kallsyms_lookup_buildid() initializes the given @namebuf by
clearing the first and the last byte.  It is not clear why.

The 1st byte makes sense because some callers ignore the return code and
expect that the buffer contains a valid string, for example:

  - function_stat_show()
    - kallsyms_lookup()
      - kallsyms_lookup_buildid()

The initialization of the last byte does not make much sense because it
can later be overwritten.  Fortunately, it seems that all called functions
behave correctly:

  -  kallsyms_expand_symbol() explicitly adds the trailing '\0'
     at the end of the function.

  - All *__address_lookup() functions either use the safe strscpy()
    or they do not touch the buffer at all.

Document the reason for clearing the first byte.  And remove the useless
initialization of the last byte.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkman <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Luis Chamberalin <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: "Masami Hiramatsu (Google)" <[email protected]>
Cc: Petr Pavlu <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Daniel Gomez <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Jan 8, 2026
[ Upstream commit 9ab29ed ]

It is reported that on Acer Nitro V15 suspend only works properly if the
keyboard backlight is turned off. In looking through the issue Acer Nitro
V15 has a GPIO (torvalds#8) specified in _AEI but it has no matching notify device
in _EVT. The values for GPIO torvalds#8 change as keyboard backlight is turned on
and off.

This makes it seem that GPIO torvalds#8 is actually supposed to be solely for
keyboard backlight.  Turning off the interrupt for this GPIO fixes the issue.
Add a quirk that does just that.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4169
Signed-off-by: Mario Limonciello <[email protected]>
Acked-by: Mika Westerberg <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Stable-dep-of: 2d96731 ("gpiolib: acpi: Add quirk for Dell Precision 7780")
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Jan 8, 2026
Patch series "kallsyms: Prevent invalid access when showing module
buildid", v3.

We have seen nested crashes in __sprint_symbol(), see below.  They seem to
be caused by an invalid pointer to "buildid".  This patchset cleans up
kallsyms code related to module buildid and fixes this invalid access when
printing backtraces.

I made an audit of __sprint_symbol() and found several situations
when the buildid might be wrong:

  + bpf_address_lookup() does not set @modbuildid

  + ftrace_mod_address_lookup() does not set @modbuildid

  + __sprint_symbol() does not take rcu_read_lock and
    the related struct module might get removed before
    mod->build_id is printed.

This patchset solves these problems:

  + 1st, 2nd patches are preparatory
  + 3rd, 4th, 6th patches fix the above problems
  + 5th patch cleans up a suspicious initialization code.

This is the backtrace, we have seen. But it is not really important.
The problems fixed by the patchset are obvious:

  crash64> bt [62/2029]
  PID: 136151 TASK: ffff9f6c981d4000 CPU: 367 COMMAND: "btrfs"
  #0 [ffffbdb687635c28] machine_kexec at ffffffffb4c845b3
  #1 [ffffbdb687635c80] __crash_kexec at ffffffffb4d86a6a
  #2 [ffffbdb687635d08] hex_string at ffffffffb51b3b61
  #3 [ffffbdb687635d40] crash_kexec at ffffffffb4d87964
  #4 [ffffbdb687635d50] oops_end at ffffffffb4c41fc8
  #5 [ffffbdb687635d70] do_trap at ffffffffb4c3e49a
  torvalds#6 [ffffbdb687635db8] do_error_trap at ffffffffb4c3e6a4
  torvalds#7 [ffffbdb687635df8] exc_stack_segment at ffffffffb5666b33
  torvalds#8 [ffffbdb687635e20] asm_exc_stack_segment at ffffffffb5800cf9
  ...


This patch (of 7)

The function kallsyms_lookup_buildid() initializes the given @namebuf by
clearing the first and the last byte.  It is not clear why.

The 1st byte makes sense because some callers ignore the return code and
expect that the buffer contains a valid string, for example:

  - function_stat_show()
    - kallsyms_lookup()
      - kallsyms_lookup_buildid()

The initialization of the last byte does not make much sense because it
can later be overwritten.  Fortunately, it seems that all called functions
behave correctly:

  -  kallsyms_expand_symbol() explicitly adds the trailing '\0'
     at the end of the function.

  - All *__address_lookup() functions either use the safe strscpy()
    or they do not touch the buffer at all.

Document the reason for clearing the first byte.  And remove the useless
initialization of the last byte.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkman <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Luis Chamberalin <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: "Masami Hiramatsu (Google)" <[email protected]>
Cc: Petr Pavlu <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Daniel Gomez <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jan 9, 2026
[ Upstream commit 163e5f2 ]

When using perf record with the `--overwrite` option, a segmentation fault
occurs if an event fails to open. For example:

  perf record -e cycles-ct -F 1000 -a --overwrite
  Error:
  cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
  perf: Segmentation fault
      #0 0x6466b6 in dump_stack debug.c:366
      #1 0x646729 in sighandler_dump_stack debug.c:378
      #2 0x453fd1 in sigsegv_handler builtin-record.c:722
      #3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090]
      #4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862
      #5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943
      torvalds#6 0x458090 in record__synthesize builtin-record.c:2075
      torvalds#7 0x45a85a in __cmd_record builtin-record.c:2888
      torvalds#8 0x45deb6 in cmd_record builtin-record.c:4374
      torvalds#9 0x4e5e33 in run_builtin perf.c:349
      torvalds#10 0x4e60bf in handle_internal_command perf.c:401
      torvalds#11 0x4e6215 in run_argv perf.c:448
      torvalds#12 0x4e653a in main perf.c:555
      torvalds#13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72]
      torvalds#14 0x43a3ee in _start ??:0

The --overwrite option implies --tail-synthesize, which collects non-sample
events reflecting the system status when recording finishes. However, when
evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist
is not initialized and remains NULL. The code unconditionally calls
record__synthesize() in the error path, which iterates through the NULL
evlist pointer and causes a segfault.

To fix it, move the record__synthesize() call inside the error check block, so
it's only called when there was no error during recording, ensuring that evlist
is properly initialized.

Fixes: 4ea648a ("perf record: Add --tail-synthesize option")
Signed-off-by: Shuai Xue <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jan 9, 2026
Add a driver for the Unicam camera receiver block on BCM283x processors.
Compared to the bcm2835-camera driver present in staging, this driver
handles the Unicam block only (CSI-2 receiver), and doesn't depend on
the VC4 firmware running on the VPU.

The commit is made up of a series of changes cherry-picked from the
rpi-5.4.y branch of https://github.com/raspberrypi/linux/ with
additional enhancements, forward-ported to the mainline kernel.

Signed-off-by: Dave Stevenson <[email protected]>
Signed-off-by: Naushir Patuck <[email protected]>
Signed-off-by: Laurent Pinchart <[email protected]>
Reported-by: kbuild test robot <[email protected]>

media: bcm2835-unicam: Add support for get_mbus_config to set num lanes

Use the get_mbus_config pad subdev call to allow a source to use
fewer than the number of CSI2 lanes defined in device tree.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Avoid gcc warning over {0} on endpoint

Older gcc versions object to = { 0 } initialisation if the first
elemtn in the structure is a substructure.

Use = { } to avoid this compiler warning.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Reinstate V4L2_CAP_READWRITE in the caps

v4l2-compliance throws a failure if the device doesn't advertise
V4L2_CAP_READWRITE but allows read or write operations.
We do support read, so reinstate the flag.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Ensure type is VIDEO_CAPTURE in [g|s]_selection

[g|s]_selection pass in a buffer type that needs to be validated
before passing on to the sensor subdev.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835: unicam: Set VPU min clock freq to 250Mhz.

When streaming with Unicam, the VPU must have a clock frequency of at
least 250Mhz.  Otherwise, the input fifos could overrun, causing
image corruption.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Drop WARN on uing direct cache alias

Pi 0&1 pass all ARM accesses through the VPU L2 cache, therefore
the dma-ranges property sets the cache alias bits to other
than the direct alias, hence this WARN was firing.

It was overprotective coding, so assume that everything is OK
with the dma-ranges, and remove the WARN.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Always service interrupts

From when bringing up the driver, there was a check in the isr
to ignore interrupts (claiming them handled) should the driver
not be streaming.

The VPU now will not register a camera driver if it finds a
CSI2 node enabled in device tree, therefore this flawed check is
redundant.

raspberrypi/linux#3602

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835: unicam: Fix uninitialized warning

Signed-off-by: Jacko Dirks <[email protected]>

media: bcm2835-unicam: Fixup review comments from Hans.

Updates the driver based on the upstream review comments from
Hans Verkuil at https://patchwork.linuxtv.org/patch/63531/

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Retain packing information on G_FMT

The change to retrieve the pixel format always on g_fmt didn't
check whether the native or unpacked version of the format
had been requested, and always returned the packed one.
Correct this so that the packing setting is retained whereever
possible.

Fixes "9d59e89 media: bcm2835-unicam: Re-fetch mbus code from subdev
on a g_fmt call"

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: change minimum number of vb2_queue buffers to 1

Since the unicam driver was modified to write to a dummy buffer when no
user-supplied buffer is available, it can now write to and return a
buffer even when there's only a single one. Enable this by changing the
min_buffers_needed in the vb2_queue; it will be useful for enabling
still captures without allocating more memory than absolutely necessary.

Signed-off-by: David Plowman <[email protected]>

staging: vc04_services: ISP: Add a more complex ISP processing component

Driver for the BCM2835 ISP hardware block.  This driver uses the MMAL
component to program the ISP hardware through the VC firmware.

The ISP component can produce two video stream outputs, and Bayer
image statistics. This can't be encompassed in a simple V4L2
M2M device, so create a new device that registers 4 video nodes.

This patch squashes all the development patches from the earlier
rpi-5.4.y branch into one

Signed-off-by: Naushir Patuck <[email protected]>

staging/bcm2835-isp: Add the unpacked (16bpp) raw formats

Now that the firmware supports the unpacked (16bpp) variants
of the MIPI raw formats, add the mappings.

Signed-off-by: Dave Stevenson <[email protected]>

staging/bcm2835-isp: Log the number of excess supported formats

When logging that the firmware has provided more supported formats
than we had allocated storage for, log the number allocated and
returned.

Signed-off-by: Dave Stevenson <[email protected]>

staging: vc04_services: ISP: Add colour denoise control

Add colour denoise control to the bcm2835 driver through a new v4l2
control: V4L2_CID_USER_BCM2835_ISP_CDN.

Add the accompanying MMAL configuration structure definitions as well.

Signed-off-by: Naushir Patuck <[email protected]>

bcm2835-isp: Allow formats with different colour spaces.

Each supported format now includes a mask showing the allowed colour
spaces, as well as a default colour space for when one was not
specified.

Additionally we translate the colour space to mmal format and pass it
over to the VideoCore.

Signed-off-by: David Plowman <[email protected]>

media: i2c: add ov9281 driver.

Change-Id: I7b77250bbc56d2f861450cf77271ad15f9b88ab1
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: fix mclk issue when probe multiple camera.

Takes the ov9281 part only from the Rockchip's patch.

Change-Id: I30e833baf2c1bb07d6d87ddb3b00759ab45a90e4
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: add enum_frame_interval function for iq tool 2.2 and hal3

Adds the ov9281 parts of the Rockchip patch adding enum_frame_interval to
a large number of drivers.

Change-Id: I03344cd6cf278dd7c18fce8e97479089ef185a5c
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: Fixup for recent kernel releases, and remove custom code

The Rockchip driver was based on a 4.4 kernel, and had several custom
Rockchip parts.

Update to 5.4 kernel APIs, with the relevant controls required by
libcamera, and remove custom Rockchip parts.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Read chip ID via 2 reads

Vision Components have made an OV9281 module which blocks reading
back the majority of registers to comply with NDAs, and in doing
so doesn't allow auto-increment register reading as used when
reading the chip ID.

Use two reads and manually combine the results.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Add support for 8 bit readout

The sensor supports 8 bit mode as well as 10bit, so add the
relevant code to allow selection of this.

Signed-off-by: Dave Stevenson <[email protected]>

media: ov9281: Add 1280x720 and 640x480 modes

Breaks out common register set and adds the different registers
for 1280x720 (cropped) and 640x480 (skipped) modes

Signed-off-by: Dave Stevenson <[email protected]>

Fixed picture line bug in all ov9281 modes

Signed-off-by: Mathias Anhalt <[email protected]>

Added hflip and vflip controls to ov9281

Signed-off-by: Mathias Anhalt <[email protected]>

media: i2c: ov9281: Remove override of subdev name

From the original Rockchip driver, the subdev was renamed
from the default to being "mov9281 <dev_name>" whereas the
default would have been "ov9281 <dev_name>".

Remove the override to drop back to the default rather than
a vendor custom string.

Signed-off-by: Dave Stevenson <[email protected]>

media: v4l2-subdev: add subdev-wide state struct

Signed-off-by: Dom Cobley <[email protected]>

media: i2c: ov9281: Add fwnode properties controls

Add call to v4l2_ctrl_new_fwnode_properties to read and
create the fwnode based controls.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Sensor should report RAW color space

Tested on Raspberry Pi running libcamera.

Signed-off-by: David Plowman <[email protected]>

Partial revert "media: i2c: add ov9281 driver."

This partially reverts commit 84e98e3a4f3eecb168ceb80231c3e8252929892e.

The commit had merged some changes to other drivers with adding the ov9281
driver. Only the ov9281 parts have been reverted.

staging/bcm2835-isp: Fix compiler warning

The result of dividing a u32 by a size_t is an unsigned int on arm32
and a long unsigned int on arm64. Use "%zu" (the size_t format) to
remove the build warning for 64-bit builds.

Signed-off-by: Phil Elwell <[email protected]>

staging: vc04_services: isp: Set the YUV420/YVU420 format stride to 64 bytes

The bcm2835 ISP requires the base address of all input/output planes to have 32
byte alignment. Using a Y stride of 32 bytes would not guarantee that the V
plane would fulfil this, e.g. a height of 650 lines would mean the V plane
buffer is not 32 byte aligned for YUV420 formats.

Having a Y stride of 64 bytes would ensure both U and V planes have a 32 byte
alignment, as the luma height will always be an even number of lines.

Signed-off-by: Naushir Patuck <[email protected]>

vc04_services: isp: Report input node as wanting full range RAW color space

RAW color spaces are more usually reported as having full range
quantization.

Tested using libcamera.

Signed-off-by: David Plowman <[email protected]>

drivers: bcm2835_isp: Allow multiple users for the ISP driver.

Add a second (identical) set of device nodes to allow concurrent use of the ISP
hardware by another user. This change effectively creates a second state
structure (struct bcm2835_isp_dev) to maintain independent state for the second
user. Node and media entity names are appened with the instance index
appropriately.

Further users can be added by changing the BCM2835_ISP_NUM_INSTANCES define.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: bcm2835_isp: Fix div by 0 bug.

Fix a possible division by 0 bug when setting up the mmal port for the stats
port.

Signed-off-by: Naushir Patuck <[email protected]>

staging/bcm2835-isp: Fix cleanup after init fail

bcm2835_isp_remove is called on an initialisation failure, but at that
point the drvdata hasn't been set. This causes a crash when e.g. using
the cutdown firmware (gpu_mem=16).

Move platform_set_drvdata before the instance probing loop to avoid the
problem.

See: raspberrypi/linux#4774

Signed-off-by: Phil Elwell <[email protected]>

bcm2835-v4l2-isp: Add missing lock initialization

ISP device allocation is dynamic hence the locks too.
struct mutex queue_lock is not initialized which result in bug.

Fixing same by initializing it.

[   29.847138] INFO: trying to register non-static key.
[   29.847156] The code is fine but needs lockdep annotation, or maybe
[   29.847159] you didn't initialize this object before use?
[   29.847161] turning off the locking correctness validator.
[   29.847167] CPU: 1 PID: 343 Comm: v4l_id Tainted: G         C        5.15.11-rt24-v8+ torvalds#8
[   29.847187] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[   29.847194] Call trace:
[   29.847197]  dump_backtrace+0x0/0x1b8
[   29.847227]  show_stack+0x20/0x30
[   29.847240]  dump_stack_lvl+0x8c/0xb8
[   29.847254]  dump_stack+0x18/0x34
[   29.847263]  register_lock_class+0x494/0x4a0
[   29.847278]  __lock_acquire+0x80/0x1680
[   29.847289]  lock_acquire+0x214/0x3a0
[   29.847300]  mutex_lock_nested+0x70/0xc8
[   29.847312]  _vb2_fop_release+0x3c/0xa8 [videobuf2_v4l2]
[   29.847346]  vb2_fop_release+0x34/0x60 [videobuf2_v4l2]
[   29.847367]  v4l2_release+0xc8/0x108 [videodev]
[   29.847453]  __fput+0x8c/0x258
[   29.847476]  ____fput+0x18/0x28
[   29.847487]  task_work_run+0x98/0x180
[   29.847502]  do_notify_resume+0x228/0x3f8
[   29.847515]  el0_svc+0xec/0xf0
[   29.847523]  el0t_64_sync_handler+0x90/0xb8
[   29.847531]  el0t_64_sync+0x180/0x184

Signed-off-by: Padmanabha Srinivasaiah <[email protected]>

staging: vc04_services: isp: Permit all sRGB colour spaces on ISP outputs

ISP outputs actually support all colour spaces that are fundamentally
sRGB underneath, regardless of whether an RGB or YUV output format is
actually requested.

Signed-off-by: David Plowman <[email protected]>

drivers: staging: bcm2835-isp: Do not cleanup mmal vcsm buffer on stop_streaming

On stop_streaming() the vcsm buffer handle gets released by the buffer cleanup
code.  This will subsequently cause and error if userland re-queues the same
buffer on the next start_streaming() call.

Remove this cleanup code and rely on the vb2_ops->buf_cleanup() call to do the
cleanups instead.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: staging: bcm2835-isp: Clear LS table handle in the firmware

When all nodes have stopped streaming, ensure the firmware has released its
handle on the LS table dmabuf. This is done by passing a null handle in the
LS params.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: staging: bcm2835-isp: Respect caller's stride value

The stride value reported for output image buffers should be at least
as large as any value that was passed in by the caller (subject to
correct alignment for the pixel format). If the value is zero (meaning
no value was passed), or is too small, the minimum acceptable value
will be substituted.

Signed-off-by: David Plowman <[email protected]>

staging: vc04_services: bcm2835-isp: Drop include Makefile directive

Drop the include directive. They can break the build, when one only
wants to build a subdirectory. Replace with "../" for the includes in
the bcm2835-isp instead.

The fix is equivalent to the four patches between 29d49a7
("staging: vc04_services: bcm2835-audio: Drop include Makefile
directive")...2529ca2 ("staging: vc04_services: interface: Drop
include Makefile directive")

Fixes: c8f89c9551c1 ("staging: vc04_services: ISP: Add a more complex ISP processing component")
Suggested-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Kieran Bingham <[email protected]>

staging: vc04_services: bcm2835-v4l2-isp: Register with vchiq_bus_type

Register the bcm2835-v4l2-isp driver with the vchiq_bus_type instead of
using the platform driver/device.

Signed-off-by: Kieran Bingham <[email protected]>

staging: vc04_services: bcm2835-v4l2-isp: Explicitly set DMA mask

The platform model originally handled the DMA mask. Now that
we are on the vchiq_bus we need to explicitly set this.

Signed-off-by: Kieran Bingham <[email protected]>

drivers: media: bcm2835_isp: Cache LS table dmabuf

Clients such as libcamera do not change the LS table dmabuf on every
frame. In such cases instead of mapping/remapping the same dmabuf on
every frame to send to the firmware, cache the dmabuf once and only
update and remap if the dmabuf has been changed by the userland client.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Correctly handle error propagation for stream on

On a failure in start_streaming(), the error code would not propagate to
the calling function on all conditions. This would cause the userland
caller to not know of the failure.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Return early from stop_streaming() if stopped

clk_disable_unprepare() is called unconditionally in stop_streaming().
This is incorrect in the cases where start_streaming() fails, and
unprepares all clocks as part of the failure cleanup. To avoid this,
ensure that clk_disable_unprepare() is only called in stop_streaming()
if the clocks are in a prepared state.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Clear clock state when stopping streaming

Commit 65e08c465020d4c5b51afb452efc2246d80fd66f failed to clear the
clock state when the device stopped streaming. Fix this, as it might
again cause the same problems when doing an unprepare.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Fix bug in buffer swapping logic

If multiple sets of interrupts occur simultaneously, it may be unsafe
to swap buffers, as the hardware may already be re-using the current
buffers. In such cases, avoid swapping buffers, and wait for the next
opportunity at the Frame End interrupt to signal completion.

Additionally, check the packet compare status when watching for frame
end for buffers swaps, as this could also signify a frame end event.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Forward input status from subdevice

The vidioc_enum_input() v4l2 ioctl is capable of returning
sensor/input status as well. This is used in current
GStreamer HEAD for signal detection [1].

bcm2835-unicam does handle this syscall, but it didn't ask
the subdevice driver about the input status. The input then
appeared as always present.

This commit adds the necessary query. There is a precedent for
this - the R-Car VIN V4L2 driver does a similar call [2].

[1]: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/-/blob/ce0be27caf69aa9d96b73bc2b50737451b6f6936/sys/v4l2/gstv4l2src.c#L553
[2]: https://github.com/raspberrypi/linux/blob/7fb9d006d3ff3baf2e205e0c85c4e4fd0a64fcd0/drivers/media/platform/rcar-vin/rcar-v4l2.c#L548

Signed-off-by: Jakub Vaněk <[email protected]>

media/bcm2835-unicam: Parse pad numbers correctly

The driver was making big assumptions about the source device
using pad 0 and 1, which doesn't follow for more complex
devices where Unicam's source device may be a sink device for
something else.

Read the pad numbers through media controller, and reference
them appropriately.

Signed-off-by: Dave Stevenson <[email protected]>

media/bcm2835-unicam: Add support for configuration via MC API

Adds Media Controller API support for more complex pipelines.
libcamera is about to switch to using this mechanism for configuring
sensors.

This can be enabled by either a module parameter, or device tree.

Various functions have been moved to group video-centric and
mc-centric functions together.

Based on a similar conversion done to ti-vpe.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Fixup for 5.18 and new get_mbus_config struct

The number of active CSI2 data lanes has moved within the struct
v4l2_mbus_config used by the get_mbus_config API call.
Update the driver to match the changes in mainline.

Signed-off-by: Dave Stevenson <[email protected]>

drivers: bcm2835_unicam: Add logging message when a frame is dropped.

If a dummy buffer is still active on a frame start, it indicates that this frame
will be dropped. The explicit logging helps users identify performance issues.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: bcm2835_unicam: Disable trigger mode operation

On a Pi3 B/B+ platform the imx219 sensor frequently generates a single corrupt
frame when the sensor first starts. This can either be a missing line, or
invalid samples within the line. This only occurrs using the Unicam kernel
driver.

Disabling trigger mode elimiates this corruption. Since trigger mode is a
legacy feature copied from the firmware driver and not expected to be needed,
remove it. Tested on the Raspberry Pi cameras and shows no ill effects.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Set ret on error path in unicam_async_complete()

Clang warns:

  drivers/media/platform/bcm2835/bcm2835-unicam.c:3109:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
          if (!source_pads) {
              ^~~~~~~~~~~~
  drivers/media/platform/bcm2835/bcm2835-unicam.c:3152:9: note: uninitialized use occurs here
          return ret;
                 ^~~
  drivers/media/platform/bcm2835/bcm2835-unicam.c:3109:2: note: remove the 'if' if its condition is always false
          if (!source_pads) {
          ^~~~~~~~~~~~~~~~~~~
  drivers/media/platform/bcm2835/bcm2835-unicam.c:3091:9: note: initialize the variable 'ret' to silence this warning
          int ret;
                 ^
                  = 0
  1 warning generated.

When the if condition is true, ret will be used uninitialized, which
could result in undesirable behavior. Set ret to -ENODEV on the error
path, which is a standard error code for the ->complete() callback.

Fixes: d056e86eb35f ("media/bcm2835-unicam: Parse pad numbers correctly")
Signed-off-by: Nathan Chancellor <[email protected]>

media: bcm2835-unicam: Handle a repeated frame start with no end

In the case of 2 frame starts being received with no frame end
between, the queued buffer held in next_frm was lost as the
pointer was overwritten with the dummy buffer.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Correctly handle FS + FE ISR condtion

If we get a simultaneous FS + FE interrupt for the same frame, it cannot be
marked as completed and returned to userland as the framebuffer will be refilled
by Unicam on the next sensor frame. Additionally, the timestamp will be set to 0
as the FS interrupt handling code will not have run yet.

To avoid these problems, the frame is considered dropped in the FE handler,
and will be returned to userland on the subsequent sensor frame.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Fix for possible dummy buffer overrun

The Unicam hardware has been observed to cause a buffer overrun when using the
dummy buffer as a circular buffer. The conditions that cause the overrun are not
fully known, but it seems to occur when the memory bus is heavily loaded.

To avoid the overrun, program the hardware with a buffer size of 0 when using
the dummy buffer. This will cause overrun into the allocated dummy buffer, but
avoid out of bounds writes.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Fix up start/stop api change

Signed-off-by: Dom Cobley <[email protected]>

media: bcm2835-unicam: Use mipi-csi2.h header for data type values

The MIPI CSI2 data type ID values are now defined in the
mipi-csi2.h header, so use those defines instead of hard
coding them in the driver.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Add support for RAW16 formats

With the RAW16 formats now having a defined CSI2 data type ID,
they can be added to the driver.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Start and stop media_pipeline with same node

media_pipeline_start and media_pipeline_stop now validate that
the pipeline is being started and stopped with the same pipe
and pad handles.
When running with embedded metadata (eg imx477 and imx708), the
start typically happens from the metadata pad, whilst stop is
always from the image pad.

Always pass the image pad to media_pipeline_start to ensure
that the calls are balanced.

Signed-off-by: Dave Stevenson <[email protected]>

drivers: media: bcm2835_unicam: Improve frame sequence count handling

Ensure that the frame sequence counter is incremented only if a previous
frame start interrupt has occurred, or a frame start + frame end has
occurred simultaneously.

This corresponds the sequence number with the actual number of frames
produced by the sensor, not the number of frame buffers dequeued back
to userland.

Signed-off-by: Naushir Patuck <[email protected]>

bcm2835-unicam: hacks to allow it to build

media: bcm2835-unicam: Fix up async notifier usage

Fixes "8a090fc3e549 bcm2835-unicam: hacks to allow it to build"

media: bcm2835-unicam: Add option for a GPIO to reflect FS/FE timing

The legacy stack had an option to have a GPIO track frame start and
end events to give basic synchronisation to the incoming image stream.
https://forums.raspberrypi.com/viewtopic.php?t=190314

Replicate this in the kernel Unicam driver.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Add support for 12bit mono packed format

Now that V4L2_PIX_FMT_Y12P is defined, allow passing raw 12bit
mono packed data through the peripheral.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Add support for 14bit mono sources

Now that V4L2_PIX_FMT_Y14 and V4L2_PIX_FMT_Y14P are defined,
allow passing 14bit mono data through the peripheral.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Add support for unpacked 14bit Bayer formats

Now that the 14bit non-packed Bayer formats are defined, add them
into the supported formats lookup table.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Reinstate old downstream driver as legacy

Whilst the Unicam driver has now been upstreamed it only supports
configuration via Media Controller (not driven from the /dev/videoN
node), which makes life significantly harder for simple devices such
as mono sensors, and HDMI or analogue video to CSI2 bridge chips
(eg TC358743 and ADV7282M).

Fix up the downstream driver so that it builds, reinstate the links
from Kconfig and Makefile to it, and give it a new Kconfig name
(VIDEO_BCM2835_UNICAM_LEGACY).

Signed-off-by: Dave Stevenson <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jan 9, 2026
Driver for the BCM2835 ISP hardware block.  This driver uses the MMAL
component to program the ISP hardware through the VC firmware.

The ISP component can produce two video stream outputs, and Bayer
image statistics. This can't be encompassed in a simple V4L2
M2M device, so create a new device that registers 4 video nodes.

This patch squashes all the development patches from the earlier
rpi-5.4.y branch into one

Signed-off-by: Naushir Patuck <[email protected]>

staging/bcm2835-isp: Add the unpacked (16bpp) raw formats

Now that the firmware supports the unpacked (16bpp) variants
of the MIPI raw formats, add the mappings.

Signed-off-by: Dave Stevenson <[email protected]>

staging/bcm2835-isp: Log the number of excess supported formats

When logging that the firmware has provided more supported formats
than we had allocated storage for, log the number allocated and
returned.

Signed-off-by: Dave Stevenson <[email protected]>

staging: vc04_services: ISP: Add colour denoise control

Add colour denoise control to the bcm2835 driver through a new v4l2
control: V4L2_CID_USER_BCM2835_ISP_CDN.

Add the accompanying MMAL configuration structure definitions as well.

Signed-off-by: Naushir Patuck <[email protected]>

bcm2835-isp: Allow formats with different colour spaces.

Each supported format now includes a mask showing the allowed colour
spaces, as well as a default colour space for when one was not
specified.

Additionally we translate the colour space to mmal format and pass it
over to the VideoCore.

Signed-off-by: David Plowman <[email protected]>

media: i2c: add ov9281 driver.

Change-Id: I7b77250bbc56d2f861450cf77271ad15f9b88ab1
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: fix mclk issue when probe multiple camera.

Takes the ov9281 part only from the Rockchip's patch.

Change-Id: I30e833baf2c1bb07d6d87ddb3b00759ab45a90e4
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: add enum_frame_interval function for iq tool 2.2 and hal3

Adds the ov9281 parts of the Rockchip patch adding enum_frame_interval to
a large number of drivers.

Change-Id: I03344cd6cf278dd7c18fce8e97479089ef185a5c
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: Fixup for recent kernel releases, and remove custom code

The Rockchip driver was based on a 4.4 kernel, and had several custom
Rockchip parts.

Update to 5.4 kernel APIs, with the relevant controls required by
libcamera, and remove custom Rockchip parts.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Read chip ID via 2 reads

Vision Components have made an OV9281 module which blocks reading
back the majority of registers to comply with NDAs, and in doing
so doesn't allow auto-increment register reading as used when
reading the chip ID.

Use two reads and manually combine the results.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Add support for 8 bit readout

The sensor supports 8 bit mode as well as 10bit, so add the
relevant code to allow selection of this.

Signed-off-by: Dave Stevenson <[email protected]>

media: ov9281: Add 1280x720 and 640x480 modes

Breaks out common register set and adds the different registers
for 1280x720 (cropped) and 640x480 (skipped) modes

Signed-off-by: Dave Stevenson <[email protected]>

Fixed picture line bug in all ov9281 modes

Signed-off-by: Mathias Anhalt <[email protected]>

Added hflip and vflip controls to ov9281

Signed-off-by: Mathias Anhalt <[email protected]>

media: i2c: ov9281: Remove override of subdev name

From the original Rockchip driver, the subdev was renamed
from the default to being "mov9281 <dev_name>" whereas the
default would have been "ov9281 <dev_name>".

Remove the override to drop back to the default rather than
a vendor custom string.

Signed-off-by: Dave Stevenson <[email protected]>

media: v4l2-subdev: add subdev-wide state struct

Signed-off-by: Dom Cobley <[email protected]>

media: i2c: ov9281: Add fwnode properties controls

Add call to v4l2_ctrl_new_fwnode_properties to read and
create the fwnode based controls.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Sensor should report RAW color space

Tested on Raspberry Pi running libcamera.

Signed-off-by: David Plowman <[email protected]>

Partial revert "media: i2c: add ov9281 driver."

This partially reverts commit 84e98e3a4f3eecb168ceb80231c3e8252929892e.

The commit had merged some changes to other drivers with adding the ov9281
driver. Only the ov9281 parts have been reverted.

staging/bcm2835-isp: Fix compiler warning

The result of dividing a u32 by a size_t is an unsigned int on arm32
and a long unsigned int on arm64. Use "%zu" (the size_t format) to
remove the build warning for 64-bit builds.

Signed-off-by: Phil Elwell <[email protected]>

staging: vc04_services: isp: Set the YUV420/YVU420 format stride to 64 bytes

The bcm2835 ISP requires the base address of all input/output planes to have 32
byte alignment. Using a Y stride of 32 bytes would not guarantee that the V
plane would fulfil this, e.g. a height of 650 lines would mean the V plane
buffer is not 32 byte aligned for YUV420 formats.

Having a Y stride of 64 bytes would ensure both U and V planes have a 32 byte
alignment, as the luma height will always be an even number of lines.

Signed-off-by: Naushir Patuck <[email protected]>

vc04_services: isp: Report input node as wanting full range RAW color space

RAW color spaces are more usually reported as having full range
quantization.

Tested using libcamera.

Signed-off-by: David Plowman <[email protected]>

drivers: bcm2835_isp: Allow multiple users for the ISP driver.

Add a second (identical) set of device nodes to allow concurrent use of the ISP
hardware by another user. This change effectively creates a second state
structure (struct bcm2835_isp_dev) to maintain independent state for the second
user. Node and media entity names are appened with the instance index
appropriately.

Further users can be added by changing the BCM2835_ISP_NUM_INSTANCES define.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: bcm2835_isp: Fix div by 0 bug.

Fix a possible division by 0 bug when setting up the mmal port for the stats
port.

Signed-off-by: Naushir Patuck <[email protected]>

staging/bcm2835-isp: Fix cleanup after init fail

bcm2835_isp_remove is called on an initialisation failure, but at that
point the drvdata hasn't been set. This causes a crash when e.g. using
the cutdown firmware (gpu_mem=16).

Move platform_set_drvdata before the instance probing loop to avoid the
problem.

See: raspberrypi/linux#4774

Signed-off-by: Phil Elwell <[email protected]>

bcm2835-v4l2-isp: Add missing lock initialization

ISP device allocation is dynamic hence the locks too.
struct mutex queue_lock is not initialized which result in bug.

Fixing same by initializing it.

[   29.847138] INFO: trying to register non-static key.
[   29.847156] The code is fine but needs lockdep annotation, or maybe
[   29.847159] you didn't initialize this object before use?
[   29.847161] turning off the locking correctness validator.
[   29.847167] CPU: 1 PID: 343 Comm: v4l_id Tainted: G         C        5.15.11-rt24-v8+ torvalds#8
[   29.847187] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[   29.847194] Call trace:
[   29.847197]  dump_backtrace+0x0/0x1b8
[   29.847227]  show_stack+0x20/0x30
[   29.847240]  dump_stack_lvl+0x8c/0xb8
[   29.847254]  dump_stack+0x18/0x34
[   29.847263]  register_lock_class+0x494/0x4a0
[   29.847278]  __lock_acquire+0x80/0x1680
[   29.847289]  lock_acquire+0x214/0x3a0
[   29.847300]  mutex_lock_nested+0x70/0xc8
[   29.847312]  _vb2_fop_release+0x3c/0xa8 [videobuf2_v4l2]
[   29.847346]  vb2_fop_release+0x34/0x60 [videobuf2_v4l2]
[   29.847367]  v4l2_release+0xc8/0x108 [videodev]
[   29.847453]  __fput+0x8c/0x258
[   29.847476]  ____fput+0x18/0x28
[   29.847487]  task_work_run+0x98/0x180
[   29.847502]  do_notify_resume+0x228/0x3f8
[   29.847515]  el0_svc+0xec/0xf0
[   29.847523]  el0t_64_sync_handler+0x90/0xb8
[   29.847531]  el0t_64_sync+0x180/0x184

Signed-off-by: Padmanabha Srinivasaiah <[email protected]>

staging: vc04_services: isp: Permit all sRGB colour spaces on ISP outputs

ISP outputs actually support all colour spaces that are fundamentally
sRGB underneath, regardless of whether an RGB or YUV output format is
actually requested.

Signed-off-by: David Plowman <[email protected]>

drivers: staging: bcm2835-isp: Do not cleanup mmal vcsm buffer on stop_streaming

On stop_streaming() the vcsm buffer handle gets released by the buffer cleanup
code.  This will subsequently cause and error if userland re-queues the same
buffer on the next start_streaming() call.

Remove this cleanup code and rely on the vb2_ops->buf_cleanup() call to do the
cleanups instead.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: staging: bcm2835-isp: Clear LS table handle in the firmware

When all nodes have stopped streaming, ensure the firmware has released its
handle on the LS table dmabuf. This is done by passing a null handle in the
LS params.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: staging: bcm2835-isp: Respect caller's stride value

The stride value reported for output image buffers should be at least
as large as any value that was passed in by the caller (subject to
correct alignment for the pixel format). If the value is zero (meaning
no value was passed), or is too small, the minimum acceptable value
will be substituted.

Signed-off-by: David Plowman <[email protected]>

staging: vc04_services: bcm2835-isp: Drop include Makefile directive

Drop the include directive. They can break the build, when one only
wants to build a subdirectory. Replace with "../" for the includes in
the bcm2835-isp instead.

The fix is equivalent to the four patches between 29d49a7
("staging: vc04_services: bcm2835-audio: Drop include Makefile
directive")...2529ca2 ("staging: vc04_services: interface: Drop
include Makefile directive")

Fixes: c8f89c9551c1 ("staging: vc04_services: ISP: Add a more complex ISP processing component")
Suggested-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Kieran Bingham <[email protected]>

staging: vc04_services: bcm2835-v4l2-isp: Register with vchiq_bus_type

Register the bcm2835-v4l2-isp driver with the vchiq_bus_type instead of
using the platform driver/device.

Signed-off-by: Kieran Bingham <[email protected]>

staging: vc04_services: bcm2835-v4l2-isp: Explicitly set DMA mask

The platform model originally handled the DMA mask. Now that
we are on the vchiq_bus we need to explicitly set this.

Signed-off-by: Kieran Bingham <[email protected]>

drivers: media: bcm2835_isp: Cache LS table dmabuf

Clients such as libcamera do not change the LS table dmabuf on every
frame. In such cases instead of mapping/remapping the same dmabuf on
every frame to send to the firmware, cache the dmabuf once and only
update and remap if the dmabuf has been changed by the userland client.

Signed-off-by: Naushir Patuck <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 9, 2026
When run on a kernel without BTF info, I get:

    libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
    libbpf: failed to find valid kernel BTF

    Program received signal SIGSEGV, Segmentation fault.
    0x00005555556915b7 in btf.type_cnt ()
    (gdb) bt
    #0  0x00005555556915b7 in btf.type_cnt ()
    #1  0x0000555555691fbc in btf_find_by_name_kind ()
    #2  0x00005555556920d0 in btf.find_by_name_kind ()
    #3  0x00005555558a1b7c in init_numa_data (con=0x7fffffffd0a0) at util/bpf_lock_contention.c:125
    #4  0x00005555558a264b in lock_contention_prepare (con=0x7fffffffd0a0) at util/bpf_lock_contention.c:313
    #5  0x0000555555620702 in __cmd_contention (argc=0, argv=0x7fffffffea10) at builtin-lock.c:2084
    torvalds#6  0x0000555555622c8d in cmd_lock (argc=0, argv=0x7fffffffea10) at builtin-lock.c:2755
    torvalds#7  0x0000555555651451 in run_builtin (p=0x555556104f00 <commands+576>, argc=3, argv=0x7fffffffea10)
        at perf.c:349
    torvalds#8  0x00005555556516ed in handle_internal_command (argc=3, argv=0x7fffffffea10) at perf.c:401
    torvalds#9  0x000055555565184e in run_argv (argcp=0x7fffffffe7fc, argv=0x7fffffffe7f0) at perf.c:445
    torvalds#10 0x0000555555651b9f in main (argc=3, argv=0x7fffffffea10) at perf.c:553

If we really are running -b without BTF info, the error is fatal, so let's
propagate it and exit accordingly.

Signed-off-by: Tycho Andersen (AMD) <[email protected]>
CC: Ravi Bangoria <[email protected]>
CC: K Prateek Nayak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants