问题描述
问题现象:偶发文件系统只读,df -h / 命令无效,mount命令显示根分区为ro
错误日志:
Jul 11 14:00:41 master3 kernel: [72689.827120][40] ata1.00: exception Emask 0x0 SAct 0x300000 SErr 0x0 action 0x6 frozen Jul 11 14:00:41 master3 kernel: [72689.827129][40] ata1.00: failed command: WRITE FPDMA QUEUED Jul 11 14:00:41 master3 kernel: [72689.827136][40] ata1.00: cmd 61/10:a0:b0:48:17/00:00:48:00:00/40 tag 20 ncq 8192 out Jul 11 14:00:41 master3 kernel: [72689.827136][40] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 11 14:00:41 master3 kernel: [72689.827140][40] ata1.00: status: { DRDY } Jul 11 14:00:41 master3 kernel: [72689.827143][40] ata1.00: failed command: WRITE FPDMA QUEUED Jul 11 14:00:41 master3 kernel: [72689.827148][40] ata1.00: cmd 61/08:a8:40:02:55/00:00:6d:00:00/40 tag 21 ncq 4096 out Jul 11 14:00:41 master3 kernel: [72689.827148][40] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 11 14:00:41 master3 kernel: [72689.827151][40] ata1.00: status: { DRDY } Jul 11 14:00:41 master3 kernel: [72689.827156][40] ata1: hard resetting link Jul 11 14:00:41 master3 kernel: [72690.147117][40] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jul 11 14:00:46 master3 kernel: [72695.147106][40] ata1.00: qc timeout (cmd 0xec) Jul 11 14:00:46 master3 kernel: [72695.147120][40] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jul 11 14:00:46 master3 kernel: [72695.147123][40] ata1.00: revalidation failed (errno=-5) Jul 11 14:00:46 master3 kernel: [72695.147131][40] ata1: hard resetting link Jul 11 14:00:47 master3 kernel: [72695.463118][26] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jul 11 14:00:57 master3 kernel: [72705.463103][26] ata1.00: qc timeout (cmd 0xec) Jul 11 14:00:57 master3 kernel: [72705.463117][26] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jul 11 14:00:57 master3 kernel: [72705.463121][26] ata1.00: revalidation failed (errno=-5) Jul 11 14:00:57 master3 kernel: [72705.463127][26] ata1: limiting SATA link speed to 3.0 Gbps Jul 11 14:00:57 master3 kernel: [72705.463133][26] ata1: hard resetting link Jul 11 14:00:57 master3 kernel: [72705.783117][26] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320) Jul 11 14:01:07 master3 kernel: [72715.947084][26] INFO: rcu_sched self-detected stall on CPU Jul 11 14:01:07 master3 kernel: [72715.947093][26] 1-...: (14999 ticks this GP) idle=e59/140000000000001/0 softirq=3079900/3079900 fqs=14755 Jul 11 14:01:07 master3 kernel: [72715.947096][26] (t=15000 jiffies g=3402378 c=3402377 q=148748) Jul 11 14:01:07 master3 kernel: [72715.947105][26] Task dump for CPU 1: Jul 11 14:01:07 master3 kernel: [72715.947108][26] x.x.12.3-man R running task 0 35157 2 0x00000002 Jul 11 14:01:07 master3 kernel: [72715.947113][26] Call trace: Jul 11 14:01:07 master3 kernel: [72715.947121][26] [<ffff800000089dc8>] dump_backtrace+0x0/0x190 Jul 11 14:01:07 master3 kernel: [72715.947124][26] [<ffff800000089f7c>] show_stack+0x24/0x30 Jul 11 14:01:07 master3 kernel: [72715.947129][26] [<ffff8000000f7418>] sched_show_task+0xa0/0x100 Jul 11 14:01:07 master3 kernel: [72715.947132][26] [<ffff8000000f9f80>] dump_cpu_task+0x48/0x58 Jul 11 14:01:07 master3 kernel: [72715.947135][26] [<ffff80000012b428>] rcu_dump_cpu_stacks+0xa8/0xf8 Jul 11 14:01:07 master3 kernel: [72715.947138][26] [<ffff80000012f54c>] rcu_check_callbacks+0x56c/0x850 Jul 11 14:01:07 master3 kernel: [72715.947142][26] [<ffff80000013529c>] update_process_times+0x44/0x70 Jul 11 14:01:07 master3 kernel: [72715.947145][26] [<ffff8000001461d0>] tick_sched_handle.isra.6+0x38/0x78 Jul 11 14:01:07 master3 kernel: [72715.947147][26] [<ffff80000014625c>] tick_sched_timer+0x4c/0x98 Jul 11 14:01:07 master3 kernel: [72715.947150][26] [<ffff800000135bfc>] __hrtimer_run_queues+0xcc/0x2b8 Jul 11 14:01:07 master3 kernel: [72715.947154][26] [<ffff8000001364d8>] hrtimer_interrupt+0xa0/0x1d0 Jul 11 14:01:07 master3 kernel: [72715.947158][26] [<ffff80000080dba4>] arch_timer_handler_phys+0x3c/0x50 Jul 11 14:01:07 master3 kernel: [72715.947161][26] [<ffff800000122e1c>] handle_percpu_devid_irq+0x84/0x178 Jul 11 14:01:07 master3 kernel: [72715.947166][26] [<ffff80000011de7c>] generic_handle_irq+0x34/0x50 Jul 11 14:01:07 master3 kernel: [72715.947168][26] [<ffff80000011e1e8>] __handle_domain_irq+0x68/0xc0 Jul 11 14:01:07 master3 kernel: [72715.947171][26] [<ffff800000081df4>] gic_handle_irq+0xc4/0x170 Jul 11 14:01:07 master3 kernel: [72715.947174][26] Exception stack(0xffff868345783c30 to 0xffff868345783d50) Jul 11 14:01:07 master3 kernel: [72715.947176][26] 3c20: ffff8080c9c94200 ffff80834ab48000 Jul 11 14:01:07 master3 kernel: [72715.947179][26] 3c40: ffff868345783d80 ffff8000004522a8 0000000080000145 ffff848343514000 Jul 11 14:01:07 master3 kernel: [72715.947182][26] 3c60: 0000000000000000 ffff8080c9c94248 0000000000000001 0000000000000002 Jul 11 14:01:07 master3 kernel: [72715.947184][26] 3c80: 0000000000000004 0000000000000000 0000000000000000 e393000000000376 Jul 11 14:01:07 master3 kernel: [72715.947186][26] 3ca0: 788e184cae09745b 0000000000000000 0000000000000940 0000000000000002 Jul 11 14:01:07 master3 kernel: [72715.947189][26] 3cc0: 0000000000000002 00000000000fd84d 00000000000c8000 000047d998000000 Jul 11 14:01:07 master3 kernel: [72715.947191][26] 3ce0: ffff80000014acc8 000000400450bf48 0000000000000000 ffff8080c9c94200 Jul 11 14:01:07 master3 kernel: [72715.947194][26] 3d00: ffff80834ab48000 ffff80834ab48698 ffff8080c9c94248 ffff8483435140c8 Jul 11 14:01:07 master3 kernel: [72715.947196][26] 3d20: ffff848343514000 0000000000000000 0000000000000000 0000000000000000 Jul 11 14:01:07 master3 kernel: [72715.947198][26] 3d40: 0000000000000000 ffff868345783d80 Jul 11 14:01:07 master3 kernel: [72715.947201][26] [<ffff800000084da8>] el1_irq+0x68/0xc0 Jul 11 14:01:07 master3 kernel: [72715.947241][26] [<ffff7ffffd45ed54>] nfs4_state_manager+0x414/0x7c8 [nfsv4] Jul 11 14:01:07 master3 kernel: [72715.947276][26] [<ffff7ffffd45f134>] nfs4_run_state_manager+0x2c/0x48 [nfsv4] Jul 11 14:01:07 master3 kernel: [72715.947280][26] [<ffff8000000e7eb8>] kthread+0xe8/0x100 Jul 11 14:01:07 master3 kernel: [72715.947282][26] [<ffff800000085420>] ret_from_fork+0x10/0x30 Jul 11 14:01:27 master3 kernel: [72735.783103][26] ata1.00: qc timeout (cmd 0xec) Jul 11 14:01:27 master3 kernel: [72735.783117][26] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jul 11 14:01:27 master3 kernel: [72735.783120][26] ata1.00: revalidation failed (errno=-5) Jul 11 14:01:27 master3 kernel: [72735.783125][26] ata1.00: disabled Jul 11 14:01:27 master3 kernel: [72735.783144][26] ata1: hard resetting link Jul 11 14:01:27 master3 kernel: [72736.103110][26] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320) Jul 11 14:01:27 master3 kernel: [72736.103132][26] ata1: EH complete
原因分析:
这个问题是因为硬盘开启NCQ队列传输导致的
解决方案:
echo 1 > /sys/block/sda/device/queue_depth echo max_performance > /sys/class/scsi_host/host*/link_power_management_policyecho 1 > /sys/block/sda/device/queue_depth echo max_performance > /sys/class/scsi_host/host*/link_power_management_policyecho 1 > /sys/block/sda/device/queue_depth echo max_performance > /sys/class/scsi_host/host*/link_power_management_policy
Try grub cmdline with libata.force=noncq,noncqtrimTry grub cmdline with libata.force=noncq,noncqtrimTry grub cmdline with libata.force=noncq,noncqtrim
更多精彩文章,请扫码关注公众号
原文链接:https://blog.csdn.net/mintafang5881/article/details/141497259?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522d18b3a9daec5bc2c40540ffbf4e4bc3a%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=d18b3a9daec5bc2c40540ffbf4e4bc3a&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-27-141497259-null-null.nonecase&utm_term=%E9%A3%9E%E7%89%9BOS
© 版权声明
声明📢本站内容均来自互联网,归原创作者所有,如有侵权必删除。
本站文章皆由CC-4.0协议发布,如无来源则为原创,转载请注明出处。
THE END