解决了

CVM由于磁盘错误而无法启动

1年前
2020年10月15日
3个答复
1749年的观点

UserLevel 2

涉水
引导者
42个答复

我对CVM的疑问是不启动的。这是在没有工作负载的半退休生产集群（非CE）上。

我在 /tmp/ntnx.serial.out.0中找到了控制台输出，我可以看到它试图启用RAID设备，扫描UUID标记并找到其中的2个，然后中止并卸载MPT3SAS内核模块，然后再尝试再次尝试5秒。在管理程序将其重置之前，这重复了几次，然后重新开始启动。

日志的最相关部分（删除了大量内核污点）是

[9.543553] SD 2：0：3：0：[SDD]附加SCSI磁盘
svmboot：=== svmboot
MDADM主：无法在Mapfile上获得独家锁定
[9.790075] MD：MD127停止。
MDADM：忽略 /dev /sdb3在报告 /dev /sda3中为失败
[9.794087] MD/RAID1：MD127：2分中有1个镜子活跃
[9.796034] MD127：检测到的容量从0到42915069952
mdadm：/dev/dev/md/phoenix：2启动了1驱动器（2分）。
[9.808602] MD：MD126停止。
[9.813330] MD/RAID1：MD126：2分中有2个镜子活跃
[9.815279] MD126：检测到的容量从0到10727981056
mdadm：/dev/dev/md/phoenix：1启动了2个驱动器。
[9.832111] MD：MD125停止。
MDADM：忽略 /dev /sdb1，因为它报告 /dev /sda1失败
[9.840436] MD/RAID1：MD125：2分中有1个镜子活跃
[9.842341] MD125：检测到的容量从0到10727981056
mdadm：/dev/dev/md/phoenix：0启动了1驱动器（2分）。
mdadm：/dev/md/phoenix：2存在 - 忽略
[9.887613] MD：MD124停止。
[9.896418] MD/RAID1：MD124：2分中有1个镜子活跃
[9.898373] MD124：检测到的容量从0到42915069952
MDADM： /dev /md124已从1驱动器开始（2分）。
mdadm：/dev/md/phoenix：0存在 - 忽略
[9.926863] MD：MD123停止。
[9.937962] MD/RAID1：MD123：2分中有1个镜子活跃
[9.939950] MD123：检测到的容量从0到10727981056
MDADM： /dev /md123已从1驱动器开始（2分）。
svmboot：检查 /dev /md for /nutanix_active_svm_partition
svmboot：检查 /dev /md123 for /nutanix_active_svm_partition

[9.994541] EXT4-FS（MD123）：具有订购数据模式的已安装文件系统。选择：（ null）
SVMBOOT：适当的启动分区，用/.cvm_uuid at /dev /md123

[10.009251] EXT4-FS（MD125）：具有订购数据模式的安装文件系统。选择：（ null）
SVMBOOT：使用/.cvm_uuid at /dev /md125适当的启动分区

svmboot：checking /dev /nvme？* p？* for /nutanix_active_svm_partition
svmboot：错误：有效的CVM_UUID： /dev /md123 /dev /md125的分区太多
SH：缺少]
SVMBOOT：在5秒内重试。

[10.430316] MD123：检测到的容量从10727981056到0
[10.432058] MD：MD123停止。
MDADM：停止 /DEV /MD123
[10.467498] MD124：检测到的容量变化从42915069952到0
[10.469245] MD：MD124停止。
MDADM：停止 /DEV /MD124
[10.507492] MD125：检测到的容量从10727981056到0
[10.509276] MD：MD125停止。
MDADM：停止 /DEV /MD125
[10.547497] MD126：检测到的容量变化从10727981056到0
[10.549243] MD：MD126停止。
MDADM：停止 /DEV /MD126
[10.577498] MD127：检测到的容量从42915069952到0
[10.579245] MD：MD127停止。
MDADM：停止 /DEV /MD127
[10.586750] ATA2.00：禁用
modprobe：删除'virtio_pci'：没有这样的文件或目录
[10.673882] MPT3SAS版本14.101.00.00卸载

由于它发生在网络启动并被管理程序重置之前，我没有任何方式与VM进行交互。

如何解决？

图标

最好的答案涉水2020年10月16日，05：03

After much mucking around, I was\u00a0finally able to boot a System Rescue CD which had\u00a0access to the RAID disks so I could fix it.<\/p>

FYI - the hypervisor boots from the SATADOM but it does not have a device driver for the SAS HBA device so it cannot normally see the storage disks. The hypervisor boots the CVM which has a SAS device driver (mpt3sas), therefore all disk access is done through the CVM. The CVM boots off software RAID devices using the first 3 partitions of the SSDs.<\/p>

In my case, 2 of the software RAID devices had lost sync.<\/p>

[root@sysresccd ~]# lsscsi
[0:0:0:0]    disk    ATA      INTEL SSDSC2BX80 0140  \/dev\/sdb
[0:0:1:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sda
[0:0:2:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sdc
[0:0:3:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sde
[0:0:4:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sdd
[0:0:5:0]    disk    ATA      INTEL SSDSC2BX80 0140  \/dev\/sdg
[4:0:0:0]    disk    ATA      SATADOM-SL 3ME   119   \/dev\/sdf
[11:0:0:0]   cd\/dvd  ATEN     Virtual CDROM    YS0J  \/dev\/sr0
[root@sysresccd ~]# lsblk
NAME      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
loop0       7:0    0 632.2M  1 loop  \/run\/archiso\/sfs\/airootfs
sda         8:0    0   1.8T  0 disk
\u2514\u2500sda1      8:1    0   1.8T  0 part
sdb         8:16   0 745.2G  0 disk
\u251c\u2500sdb1      8:17   0    10G  0 part
\u2502 \u2514\u2500md127   9:127  0    10G  0 raid1
\u251c\u2500sdb2      8:18   0    10G  0 part
\u2502 \u2514\u2500md125   9:125  0    10G  0 raid1
\u251c\u2500sdb3      8:19   0    40G  0 part
\u2502 \u2514\u2500md126   9:126  0    40G  0 raid1
\u2514\u2500sdb4      8:20   0 610.6G  0 part
sdc         8:32   0   1.8T  0 disk
\u2514\u2500sdc1      8:33   0   1.8T  0 part
sdd         8:48   0   1.8T  0 disk
\u2514\u2500sdd1      8:49   0   1.8T  0 part
sde         8:64   0   1.8T  0 disk
\u2514\u2500sde1      8:65   0   1.8T  0 part
sdf         8:80   0  59.6G  0 disk
\u2514\u2500sdf1      8:81   0  59.6G  0 part
sdg         8:96   0 745.2G  0 disk
\u251c\u2500sdg1      8:97   0    10G  0 part
\u251c\u2500sdg2      8:98   0    10G  0 part
\u2502 \u2514\u2500md125   9:125  0    10G  0 raid1
\u251c\u2500sdg3      8:99   0    40G  0 part
\u2514\u2500sdg4      8:100  0 610.6G  0 part
sr0        11:0    1   693M  0 rom   \/run\/archiso\/bootmnt
[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active (auto-read-only) raid1 sdb3[2]
      41909248 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

md127 : active (auto-read-only) raid1 sdb1[2]
      10476544 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

unused devices: <none><\/code><\/pre>I could see the RAID devices probed as sdb and sdg, with partitions 1, 2, 3 configured but only partition 2 correctly in sync. The 4th partition is used for NFS in the CVM (ie. fast storage for the cluster).<\/p>
So my solution was\u00a0<\/p>
Set the devices I needed to modify back to writable mode\t[root@sysresccd ~]# mdadm --readwrite md126
[root@sysresccd ~]# mdadm --readwrite md127
[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdb3[2]
      41909248 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

md127 : active raid1 sdb1[2]
      10476544 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

unused devices: <none><\/code><\/pre>\t\u00a0<\/p>\t<\/li><\/ol>
Rejoin the devices back into the RAID1 mirror and let them resync\u00a0\t[root@sysresccd ~]# mdadm \/dev\/md126 -a \/dev\/sdg3
mdadm: re-added \/dev\/sdg3
[root@sysresccd ~]# mdadm \/dev\/md127 -a \/dev\/sdg1
mdadm: re-added \/dev\/sdg1
[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdg3[1] sdb3[2]
      41909248 blocks super 1.1 [2\/1] [U_]
      [=========>...........]  recovery = 48.5% (20361856\/41909248) finish=1.7min speed=200123K\/sec
      bitmap: 1\/1 pages [4KB], 65536KB chunk

md127 : active raid1 sdg1[1] sdb1[2]
      10476544 blocks super 1.1 [2\/1] [U_]
      \tresync=DELAYED
      bitmap: 1\/1 pages [4KB], 65536KB chunk

unused devices: <none><\/code><\/pre>\t[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdg3[1] sdb3[2]
      41909248 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md127 : active raid1 sdg1[1] sdb1[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

unused devices: <none><\/code><\/pre>\t<\/li>\tAs an added check, run fsck on the volumes\u00a0\t[root@sysresccd ~]# fsck \/dev\/md125
fsck from util-linux 2.36
e2fsck 1.45.6 (20-Mar-2020)
\/dev\/md125 has gone 230 days without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
\/dev\/md125: 62842\/655360 files (0.2% non-contiguous), 1912185\/2619136 blocks

[root@sysresccd ~]# fsck \/dev\/md126
fsck from util-linux 2.36
e2fsck 1.45.6 (20-Mar-2020)
\/dev\/md126: clean, 20006\/2621440 files, 5177194\/10477312 blocks

[root@sysresccd ~]# fsck \/dev\/md127
fsck from util-linux 2.36
e2fsck 1.45.6 (20-Mar-2020)
\/dev\/md127: clean, 66951\/655360 files, 1866042\/2619136 blocks<\/code><\/pre>\t<\/li><\/ol>After rebooting back into the hypervisor, the CVM came up normally.<\/p>","className":"post__content__best_answer"}">


            查看原件


          
           
            CVM
            AHV
           
           
           I have a problem with a CVM that won\u2019t boot. This is on a semi-retired production cluster (not CE) that has no workloads running on it.<\/p>I found the console output in \/tmp\/NTNX.serial.out.0 and I can see it trying to enable RAID devices, scan for a uuid marker and find 2 of them, then abort and unload the mpt3sas kernel module before trying again in 5 seconds. This repeats a few times before the hypervisor resets it and it starts booting again.<\/p>
The most relevant sections of the log (copious kernel taint messages removed) are\u00a0<\/p>
[    9.543553] sd 2:0:3:0: [sdd] Attached SCSI disk
svmboot: === SVMBOOT
mdadm main: failed to get exclusive lock on mapfile
[    9.790075] md: md127 stopped.
mdadm: ignoring \/dev\/sdb3 as it reports \/dev\/sda3 as failed
[    9.794087] md\/raid1:md127: active with 1 out of 2 mirrors
[    9.796034] md127: detected capacity change from 0 to 42915069952
mdadm: \/dev\/md\/phoenix:2 has been started with 1 drive (out of 2).
[    9.808602] md: md126 stopped.
[    9.813330] md\/raid1:md126: active with 2 out of 2 mirrors
[    9.815279] md126: detected capacity change from 0 to 10727981056
mdadm: \/dev\/md\/phoenix:1 has been started with 2 drives.
[    9.832111] md: md125 stopped.
mdadm: ignoring \/dev\/sdb1 as it reports \/dev\/sda1 as failed
[    9.840436] md\/raid1:md125: active with 1 out of 2 mirrors
[    9.842341] md125: detected capacity change from 0 to 10727981056
mdadm: \/dev\/md\/phoenix:0 has been started with 1 drive (out of 2).
mdadm: \/dev\/md\/phoenix:2 exists - ignoring
[    9.887613] md: md124 stopped.
[    9.896418] md\/raid1:md124: active with 1 out of 2 mirrors
[    9.898373] md124: detected capacity change from 0 to 42915069952
mdadm: \/dev\/md124 has been started with 1 drive (out of 2).
mdadm: \/dev\/md\/phoenix:0 exists - ignoring
[    9.926863] md: md123 stopped.
[    9.937962] md\/raid1:md123: active with 1 out of 2 mirrors
[    9.939950] md123: detected capacity change from 0 to 10727981056
mdadm: \/dev\/md123 has been started with 1 drive (out of 2).
svmboot: Checking \/dev\/md for \/.nutanix_active_svm_partition
svmboot: Checking \/dev\/md123 for \/.nutanix_active_svm_partition

[    9.994541] EXT4-fs (md123): mounted filesystem with ordered data mode. Opts: (null)
svmboot: Appropriate boot partition with \/.cvm_uuid at \/dev\/md123

[   10.009251] EXT4-fs (md125): mounted filesystem with ordered data mode. Opts: (null)
svmboot: Appropriate boot partition with \/.cvm_uuid at \/dev\/md125

svmboot: Checking \/dev\/nvme?*p?* for \/.nutanix_active_svm_partition
svmboot: error: too many partitions with valid cvm_uuid:  \/dev\/md123 \/dev\/md125
sh: missing ]
svmboot: Trying again in 5 seconds.

[   10.430316] md123: detected capacity change from 10727981056 to 0
[   10.432058] md: md123 stopped.
mdadm: stopped \/dev\/md123
[   10.467498] md124: detected capacity change from 42915069952 to 0
[   10.469245] md: md124 stopped.
mdadm: stopped \/dev\/md124
[   10.507492] md125: detected capacity change from 10727981056 to 0
[   10.509276] md: md125 stopped.
mdadm: stopped \/dev\/md125
[   10.547497] md126: detected capacity change from 10727981056 to 0
[   10.549243] md: md126 stopped.
mdadm: stopped \/dev\/md126
[   10.577498] md127: detected capacity change from 42915069952 to 0
[   10.579245] md: md127 stopped.
mdadm: stopped \/dev\/md127
[   10.586750] ata2.00: disabled
modprobe: remove 'virtio_pci': No such file or directory
[   10.673882] mpt3sas version 14.101.00.00 unloading<\/code><\/pre>\u00a0<\/p>
As it occurs before the networking has started and gets reset by the hypervisor, I do not have any way of interacting with the VM.<\/p>
How can this be resolved?<\/p>","quoteUsername":"waddles","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">
            

             喜欢
             引用
             
             
              
               
                分享


        
         
          3个答复
          
           
            
             最古老的第一
            
             新的先来
             最佳投票
            
           
          
         
         
          
           
            
             
              
               w
             
             
           
           
            
             UserLevel 2
           
           
            
            +5
           
          
          
           
            
             
              涉水
             
            作者
            引导者
            42个答复
            
              
               1年前
               
                2020年10月16日
              
            回答
           
          
          
           经过大量的困扰，我终于能够启动一个系统救援CD，该CD可以访问RAID磁盘，以便我可以修复它。
           仅供参考 - 来自SATADOM的管理程序靴子，但没有SAS HBA设备的设备驱动程序，因此通常无法看到存储磁盘。管理程序启动具有SAS设备驱动程序（MPT3SAS）的CVM，因此所有磁盘访问均通过CVM完成。CVM使用SSD的前三个分区启动软件RAID设备。
           就我而言，其中2个软件RAID设备失去了同步。
           [root@sysresccd〜]＃lsscsi
[0：0：0：0]磁盘ata intel ssdsc2bx80 0140 /dev /sdb
[0：0：1：0]磁盘ATA ST2000NX0253 SN05 /dev /sda
[0：0：2：0]磁盘ATA ST2000NX0253 SN05 /dev /sdc
[0：0：3：0]磁盘ATA ST2000NX0253 SN05 /dev /sde
[0：0：4：0]磁盘ATA ST2000NX0253 SN05 /dev /sdd
[0：0：5：0]磁盘ata intel ssdsc2bx80 0140 /dev /sdg
[4：0：0：0]磁盘ATA SATADOM-SL 3ME 119 /DEV /SDF
[11：0：0：0] cd/dvd aten虚拟CDROM ys0j/dev/sr0
[root@sysresccd〜]＃lsblk
名称少校：最小RM大小RO类型Mountpoint
loop0 7：0 0 632.2m 1循环/run/archiso/sfs/airootfs
SDA 8：0 0 1.8T 0磁盘
└─SDA18：1 0 1.8T 0零件
SDB 8:16 0 745.2G 0磁盘
├─SDB18:17 0 10G 0零件
MD127 9：127 0 10G 0 RAID1
├─SDB28:18 0 10G 0零件
MD125 9：125 0 10G 0 RAID1
├─SDB38:19 0 40G 0零件
MD126 9：126 0 40G 0 RAID1
└─SDB48:20 0 610.6G 0零件
SDC 8:32 0 1.8T 0磁盘
└─SDC18:33 0 1.8T 0零件
SDD 8:48 0 1.8T 0磁盘
└─SDD18:49 0 1.8T 0零件
SDE 8:64 0 1.8T 0磁盘
└─SDE18:65 0 1.8T 0零件
SDF 8:80 0 59.6G 0磁盘
└─SDF18:81 0 59.6G 0零件
SDG 8:96 0 745.2G 0磁盘
├─SDG18:97 0 10G 0零件
├─SDG28:98 0 10G 0零件
MD125 9：125 0 10G 0 RAID1
├─SDG38:99 0 40G 0零件
└─SDG48：100 0 610.6G 0零件
SR0 11：0 1 693M 0 ROM/RUN/ARCHISO/BOOTMNT
[root@sysresccd〜]＃cat /proc /mdstat
个性：[RAID1]
MD125：活动（仅自动阅读）RAID1 SDG2 [1] SDB2 [2]
10476544封锁Super 1.1 [2/2] [UU]
位图：0/1页[0KB]，65536KB块

MD126：活动（仅自动阅读）RAID1 SDB3 [2]
41909248封锁Super 1.1 [2/1] [U_]
位图：1/1页[4KB]，65536KB块

MD127：活动（仅自动阅读）RAID1 SDB1 [2]
10476544块超级1.1 [2/1] [u_]
位图：1/1页[4KB]，65536KB块

未使用的设备：
           我可以看到RAID设备探测为SDB和SDG，分区1、2、3配置了，但仅在同步中正确地分区2。第四个分区用于CVM中的NFS（即集群的快速存储）。
           所以我的解决方案是
           
            设置我需要修改为可写模式所需的设备[root@sysresccd〜]＃mdadm-ReadWrite MD126
[root@sysresccd〜]＃mdadm-ReadWrite MD127
[root@sysresccd〜]＃cat /proc /mdstat
个性：[RAID1]
MD125：活动（仅自动阅读）RAID1 SDG2 [1] SDB2 [2]
10476544封锁Super 1.1 [2/2] [UU]
位图：0/1页[0KB]，65536KB块

MD126：Active RAID1 SDB3 [2]
41909248封锁Super 1.1 [2/1] [U_]
位图：1/1页[4KB]，65536KB块

MD127：Active RAID1 SDB1 [2]
10476544块超级1.1 [2/1] [u_]
位图：1/1页[4KB]，65536KB块

未使用的设备：

           
           
            将设备重新加入RAID1镜子，让它们重新同步[root@sysresccd〜]＃mdadm /dev /md126 -a /dev /sdg3
MDADM：重新添加 /DEV /SDG3
[root@sysresccd〜]＃mdadm /dev /md127 -a /dev /sdg1
MDADM：重新添加 /DEV /SDG1
[root@sysresccd〜]＃cat /proc /mdstat
个性：[RAID1]
MD125：活动（仅自动阅读）RAID1 SDG2 [1] SDB2 [2]
10476544封锁Super 1.1 [2/2] [UU]
位图：0/1页[0KB]，65536KB块

MD126：Active RAID1 SDG3 [1] SDB3 [2]
41909248封锁Super 1.1 [2/1] [U_]
[=========>...........] recovery = 48.5% (20361856/41909248) finish=1.7min speed=200123K/sec
位图：1/1页[4KB]，65536KB块

MD127：Active RAID1 SDG1 [1] SDB1 [2]
10476544块超级1.1 [2/1] [u_]
resync =延迟
位图：1/1页[4KB]，65536KB块

未使用的设备：
[root@sysresccd〜]＃cat /proc /mdstat
个性：[RAID1]
MD125：活动（仅自动阅读）RAID1 SDG2 [1] SDB2 [2]
10476544封锁Super 1.1 [2/2] [UU]
位图：0/1页[0KB]，65536KB块

MD126：Active RAID1 SDG3 [1] SDB3 [2]
41909248封锁Super 1.1 [2/2] [UU]
位图：0/1页[0KB]，65536KB块

MD127：Active RAID1 SDG1 [1] SDB1 [2]
10476544封锁Super 1.1 [2/2] [UU]
位图：0/1页[0KB]，65536KB块

未使用的设备：
            作为额外的检查，在卷上运行FSCK[root@sysresccd〜]＃fsck /dev /md125
FSCK来自Util-Linux 2.36
E2FSCK 1.45.6（20-MAR-2020）
/dev/md125已经消失了230天，没有检查，请检查强制。
通过1：检查inodes，块和尺寸
通过2：检查目录结构
通过3：检查目录连接
通过4：检查参考计数
通过5：检查小组摘要信息
/dev/md125：62842/655360文件（0.2％无连锁），1912185/2619136块

[root@sysresccd〜]＃fsck /dev /md126
FSCK来自Util-Linux 2.36
E2FSCK 1.45.6（20-MAR-2020）
/dev/md126：清洁，20006/2621440文件，5177194/10477312块

[root@sysresccd〜]＃fsck /dev /md127
FSCK来自Util-Linux 2.36
E2FSCK 1.45.6（20-MAR-2020）
/dev/md127：清洁，66951/655360文件，1866042/2619136块
           
           重新启动回到管理程序后，CVM正常出现。
          
          
           
           After much mucking around, I was\u00a0finally able to boot a System Rescue CD which had\u00a0access to the RAID disks so I could fix it.<\/p>FYI - the hypervisor boots from the SATADOM but it does not have a device driver for the SAS HBA device so it cannot normally see the storage disks. The hypervisor boots the CVM which has a SAS device driver (mpt3sas), therefore all disk access is done through the CVM. The CVM boots off software RAID devices using the first 3 partitions of the SSDs.<\/p>
In my case, 2 of the software RAID devices had lost sync.<\/p>
[root@sysresccd ~]# lsscsi
[0:0:0:0]    disk    ATA      INTEL SSDSC2BX80 0140  \/dev\/sdb
[0:0:1:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sda
[0:0:2:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sdc
[0:0:3:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sde
[0:0:4:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sdd
[0:0:5:0]    disk    ATA      INTEL SSDSC2BX80 0140  \/dev\/sdg
[4:0:0:0]    disk    ATA      SATADOM-SL 3ME   119   \/dev\/sdf
[11:0:0:0]   cd\/dvd  ATEN     Virtual CDROM    YS0J  \/dev\/sr0
[root@sysresccd ~]# lsblk
NAME      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
loop0       7:0    0 632.2M  1 loop  \/run\/archiso\/sfs\/airootfs
sda         8:0    0   1.8T  0 disk
\u2514\u2500sda1      8:1    0   1.8T  0 part
sdb         8:16   0 745.2G  0 disk
\u251c\u2500sdb1      8:17   0    10G  0 part
\u2502 \u2514\u2500md127   9:127  0    10G  0 raid1
\u251c\u2500sdb2      8:18   0    10G  0 part
\u2502 \u2514\u2500md125   9:125  0    10G  0 raid1
\u251c\u2500sdb3      8:19   0    40G  0 part
\u2502 \u2514\u2500md126   9:126  0    40G  0 raid1
\u2514\u2500sdb4      8:20   0 610.6G  0 part
sdc         8:32   0   1.8T  0 disk
\u2514\u2500sdc1      8:33   0   1.8T  0 part
sdd         8:48   0   1.8T  0 disk
\u2514\u2500sdd1      8:49   0   1.8T  0 part
sde         8:64   0   1.8T  0 disk
\u2514\u2500sde1      8:65   0   1.8T  0 part
sdf         8:80   0  59.6G  0 disk
\u2514\u2500sdf1      8:81   0  59.6G  0 part
sdg         8:96   0 745.2G  0 disk
\u251c\u2500sdg1      8:97   0    10G  0 part
\u251c\u2500sdg2      8:98   0    10G  0 part
\u2502 \u2514\u2500md125   9:125  0    10G  0 raid1
\u251c\u2500sdg3      8:99   0    40G  0 part
\u2514\u2500sdg4      8:100  0 610.6G  0 part
sr0        11:0    1   693M  0 rom   \/run\/archiso\/bootmnt
[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active (auto-read-only) raid1 sdb3[2]
      41909248 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

md127 : active (auto-read-only) raid1 sdb1[2]
      10476544 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

unused devices: <none><\/code><\/pre>I could see the RAID devices probed as sdb and sdg, with partitions 1, 2, 3 configured but only partition 2 correctly in sync. The 4th partition is used for NFS in the CVM (ie. fast storage for the cluster).<\/p>
So my solution was\u00a0<\/p>
Set the devices I needed to modify back to writable mode\t[root@sysresccd ~]# mdadm --readwrite md126
[root@sysresccd ~]# mdadm --readwrite md127
[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdb3[2]
      41909248 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

md127 : active raid1 sdb1[2]
      10476544 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

unused devices: <none><\/code><\/pre>\t\u00a0<\/p>\t<\/li><\/ol>
Rejoin the devices back into the RAID1 mirror and let them resync\u00a0\t[root@sysresccd ~]# mdadm \/dev\/md126 -a \/dev\/sdg3
mdadm: re-added \/dev\/sdg3
[root@sysresccd ~]# mdadm \/dev\/md127 -a \/dev\/sdg1
mdadm: re-added \/dev\/sdg1
[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdg3[1] sdb3[2]
      41909248 blocks super 1.1 [2\/1] [U_]
      [=========>...........]  recovery = 48.5% (20361856\/41909248) finish=1.7min speed=200123K\/sec
      bitmap: 1\/1 pages [4KB], 65536KB chunk

md127 : active raid1 sdg1[1] sdb1[2]
      10476544 blocks super 1.1 [2\/1] [U_]
      \tresync=DELAYED
      bitmap: 1\/1 pages [4KB], 65536KB chunk

unused devices: <none><\/code><\/pre>\t[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdg3[1] sdb3[2]
      41909248 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md127 : active raid1 sdg1[1] sdb1[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

unused devices: <none><\/code><\/pre>\t<\/li>\tAs an added check, run fsck on the volumes\u00a0\t[root@sysresccd ~]# fsck \/dev\/md125
fsck from util-linux 2.36
e2fsck 1.45.6 (20-Mar-2020)
\/dev\/md125 has gone 230 days without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
\/dev\/md125: 62842\/655360 files (0.2% non-contiguous), 1912185\/2619136 blocks

[root@sysresccd ~]# fsck \/dev\/md126
fsck from util-linux 2.36
e2fsck 1.45.6 (20-Mar-2020)
\/dev\/md126: clean, 20006\/2621440 files, 5177194\/10477312 blocks

[root@sysresccd ~]# fsck \/dev\/md127
fsck from util-linux 2.36
e2fsck 1.45.6 (20-Mar-2020)
\/dev\/md127: clean, 66951\/655360 files, 1866042\/2619136 blocks<\/code><\/pre>\t<\/li><\/ol>After rebooting back into the hypervisor, the CVM came up normally.<\/p>","quoteUsername":"waddles","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">
            

             喜欢
             引用
            
           

           
          

         

         
          
           
            
             
              
               b
             
             
           
          
          
           
            
             
              B.Moussa
             
            旅行者
            1回复
            
              
               5个月前
               
                2021年7月4日
              
           
          
          
           你好，
           感谢您的分享，这对我来说非常少，仅用于我使用Pheonix图像从Prism（Boot Repaire部分）下载的信息，以下命令不起作用。
           
           [root@sysresccd〜]＃MDADM  -  ReadWrite MD126
           [root@sysresccd〜]＃MDADM  -  ReadWrite MD127
           
           我试图应用更改和这项工作（它已经在读写状态中）
           
           最好的问候，谢谢。
           
          
          
           
           hello,<\/p>thank you for this share, it was very handful for me, just for information i used pheonix image downloaded from prism (boot repaire section) and the below command doen\u2019t work.\u00a0<\/p>
\u00a0<\/p>
[root@sysresccd ~]# mdadm --readwrite md126<\/em><\/p>
[root@sysresccd ~]# mdadm --readwrite md127<\/em>\u00a0<\/p>
\u00a0<\/p>
i tried to apply change and that\u00a0work\u00a0 (it was already in read and write status)<\/p>
\u00a0<\/p>
best regards & thank you.<\/p>
\u00a0<\/p>","quoteUsername":"B.Moussa","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">
            

             喜欢
             引用
            

           

           
          

         

         
          
           
            
             
              
               s
             
             
              
             
           
          
          
           
            
             
              Sergiy Lozovsky
              
             
            Nutanix员工
            1回复
            
              
               3个月前
               
                2021年8月31日
              
           
          
          
           如果可以访问CVM控制台。
           
            重新启动CVM;
            在grub菜单上选择“调试外壳”；
            在外壳提示下做。模块。SH”（加载所需的驱动程序）；
            组装突袭阵列“ mdadm  - 组合 - 扫描 - 运行”；
            检查RAID阵列是否有两个驱动器（“ CAT /PROC /MDSTAT”）；
           
           如果将RAID阵列组装在一起，则正如前面评论中所描述的那样，将其重新组装不正确。（（MDADM /DEV /MD127 -A /DEV /SDG1）
           重新启动CVM（来自管理程序）。
           
           重建女士阵列有一些外部链接，例如https://www.thomas-krenn.com/en/wiki/mdadm_recovery_and_resync
          
          
           
           If there is access to CVM console.<\/p>Reboot CVM;<\/li>\t
At Grub menu\u00a0select \u201cDebug Shell\u201d;<\/li>\t
At the shell prompt do \u201c. modules.sh\u201d (that loads required drivers);<\/li>\t
Assemble RAID arrays \u201cmdadm --assemble --scan --run\u201d;<\/li>\t
Check if RAID arrays have two drives each (\u201ccat \/proc\/mdstat\u201d);<\/li><\/ol>If RAID arrays are assembled incorrectly reassemble them as was describe in the previous comments. (mdadm \/dev\/md127 -a \/dev\/sdg1<\/em>)<\/p>
Reboot CVM (from the hypervisor).<\/p>
\u00a0<\/p>
There are some external links on rebuilding madam arrays, like\u00a0https:\/\/www.thomas-krenn.com\/en\/wiki\/Mdadm_recovery_and_resync<\/a><\/p>","quoteUsername":"Sergiy Lozovsky","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">
            

             喜欢
             引用
            

           

           
          

         

        

        
         
          回复


   
    
     
    
   
   
    
     
      
       由内部提供动力
      
      
     
    
   
   
    
     
     注册
     已经有一个帐户？登录
     
      
      使用您的帐户登录
     
    
    
     
     登录社区
     
     使用您的帐户登录
    
    
     输入您的用户名或电子邮件地址。我们将向您发送带有指令的电子邮件以重置您的密码。
     
      
      
       
        用户名或电子邮件
       
       
        
       
      
      
       
       返回概述
      
      
     
    
    
    
     扫描病毒文件。
     抱歉，我们仍在检查该文件的内容，以确保它可以安全下载。请在几分钟后再试一次。
     好的
    
    
     该文件无法下载
     抱歉，我们的病毒扫描仪检测到该文件无法安全下载。
     好的