解决了

CVM由于磁盘错误而无法启动

1年前
2020年10月15日
3个答复
2467意见

UserLevel 2

涉水
引导者
47个答复

我对CVM的疑问是不启动的。这是在没有工作负载的半退休生产集群（非CE）上。

我在 /tmp/ntnx.serial.out.0中找到了控制台输出，我可以看到它试图启用RAID设备，扫描UUID标记并找到其中的2个，然后中止并卸载MPT3SAS内核模块，然后再尝试再次尝试5秒。在管理程序将其重置之前，这重复了几次，然后重新开始启动。

日志的最相关部分（删除了大量内核污点）是

[9.543553] SD 2：0：3：0：[SDD]附加SCSI磁盘
svmboot：=== svmboot
MDADM主：无法在Mapfile上获得独家锁定
[9.790075] MD：MD127停止。
MDADM：忽略 /dev /sdb3在报告 /dev /sda3中为失败
[9.794087] MD/RAID1：MD127：2分中有1个镜子活跃
[9.796034] MD127：检测到的容量从0到42915069952
mdadm：/dev/dev/md/phoenix：2启动了1驱动器（2分）。
[9.808602] MD：MD126停止。
[9.813330] MD/RAID1：MD126：2分中有2个镜子活跃
[9.815279] MD126：检测到的容量从0到10727981056
mdadm：/dev/dev/md/phoenix：1启动了2个驱动器。
[9.832111] MD：MD125停止。
MDADM：忽略 /dev /sdb1，因为它报告 /dev /sda1失败
[9.840436] MD/RAID1：MD125：2分中有1个镜子活跃
[9.842341] MD125：检测到的容量从0到10727981056
mdadm：/dev/dev/md/phoenix：0启动了1驱动器（2分）。
mdadm：/dev/md/phoenix：2存在 - 忽略
[9.887613] MD：MD124停止。
[9.896418] MD/RAID1：MD124：2分中有1个镜子活跃
[9.898373] MD124：检测到的容量从0到42915069952
MDADM： /dev /md124已从1驱动器开始（2分）。
mdadm：/dev/md/phoenix：0存在 - 忽略
[9.926863] MD：MD123停止。
[9.937962] MD/RAID1：MD123：2分中有1个镜子活跃
[9.939950] MD123：检测到的容量从0到10727981056
MDADM： /dev /md123已从1驱动器开始（2分）。
svmboot：检查 /dev /md for /nutanix_active_svm_partition
svmboot：检查 /dev /md123 for /nutanix_active_svm_partition

[9.994541] EXT4-FS（MD123）：具有订购数据模式的已安装文件系统。选择：（ null）
SVMBOOT：适当的启动分区，用/.cvm_uuid at /dev /md123

[10.009251] EXT4-FS（MD125）：具有订购数据模式的安装文件系统。选择：（ null）
SVMBOOT：使用/.cvm_uuid at /dev /md125适当的启动分区

svmboot：checking /dev /nvme？* p？* for /nutanix_active_svm_partition
svmboot：错误：有效的CVM_UUID： /dev /md123 /dev /md125的分区太多
SH：缺少]
SVMBOOT：在5秒内重试。

[10.430316] MD123：检测到的容量从10727981056到0
[10.432058] MD：MD123停止。
MDADM：停止 /DEV /MD123
[10.467498] MD124：检测到的容量变化从42915069952到0
[10.469245] MD：MD124停止。
MDADM：停止 /DEV /MD124
[10.507492] MD125：检测到的容量从10727981056到0
[10.509276] MD：MD125停止。
MDADM：停止 /DEV /MD125
[10.547497] MD126：检测到的容量变化从10727981056到0
[10.549243] MD：MD126停止。
MDADM：停止 /DEV /MD126
[10.577498] MD127：检测到的容量从42915069952到0
[10.579245] MD：MD127停止。
MDADM：停止 /DEV /MD127
[10.586750] ATA2.00：禁用
modprobe：删除'virtio_pci'：没有这样的文件或目录
[10.673882] MPT3SAS版本14.101.00.00卸载

由于它发生在网络启动并被管理程序重置之前，我没有任何方式与VM进行交互。

如何解决？

图标

最好的答案涉水2020年10月16日，05：03

After much mucking around, I was\u00a0finally able to boot a System Rescue CD which had\u00a0access to the RAID disks so I could fix it.<\/p>

FYI - the hypervisor boots from the SATADOM but it does not have a device driver for the SAS HBA device so it cannot normally see the storage disks. The hypervisor boots the CVM which has a SAS device driver (mpt3sas), therefore all disk access is done through the CVM. The CVM boots off software RAID devices using the first 3 partitions of the SSDs.<\/p>

In my case, 2 of the software RAID devices had lost sync.<\/p>

[root@sysresccd ~]# lsscsi
[0:0:0:0]    disk    ATA      INTEL SSDSC2BX80 0140  \/dev\/sdb
[0:0:1:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sda
[0:0:2:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sdc
[0:0:3:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sde
[0:0:4:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sdd
[0:0:5:0]    disk    ATA      INTEL SSDSC2BX80 0140  \/dev\/sdg
[4:0:0:0]    disk    ATA      SATADOM-SL 3ME   119   \/dev\/sdf
[11:0:0:0]   cd\/dvd  ATEN     Virtual CDROM    YS0J  \/dev\/sr0
[root@sysresccd ~]# lsblk
NAME      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
loop0       7:0    0 632.2M  1 loop  \/run\/archiso\/sfs\/airootfs
sda         8:0    0   1.8T  0 disk
\u2514\u2500sda1      8:1    0   1.8T  0 part
sdb         8:16   0 745.2G  0 disk
\u251c\u2500sdb1      8:17   0    10G  0 part
\u2502 \u2514\u2500md127   9:127  0    10G  0 raid1
\u251c\u2500sdb2      8:18   0    10G  0 part
\u2502 \u2514\u2500md125   9:125  0    10G  0 raid1
\u251c\u2500sdb3      8:19   0    40G  0 part
\u2502 \u2514\u2500md126   9:126  0    40G  0 raid1
\u2514\u2500sdb4      8:20   0 610.6G  0 part
sdc         8:32   0   1.8T  0 disk
\u2514\u2500sdc1      8:33   0   1.8T  0 part
sdd         8:48   0   1.8T  0 disk
\u2514\u2500sdd1      8:49   0   1.8T  0 part
sde         8:64   0   1.8T  0 disk
\u2514\u2500sde1      8:65   0   1.8T  0 part
sdf         8:80   0  59.6G  0 disk
\u2514\u2500sdf1      8:81   0  59.6G  0 part
sdg         8:96   0 745.2G  0 disk
\u251c\u2500sdg1      8:97   0    10G  0 part
\u251c\u2500sdg2      8:98   0    10G  0 part
\u2502 \u2514\u2500md125   9:125  0    10G  0 raid1
\u251c\u2500sdg3      8:99   0    40G  0 part
\u2514\u2500sdg4      8:100  0 610.6G  0 part
sr0        11:0    1   693M  0 rom   \/run\/archiso\/bootmnt
[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active (auto-read-only) raid1 sdb3[2]
      41909248 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

md127 : active (auto-read-only) raid1 sdb1[2]
      10476544 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

unused devices: <none><\/code><\/pre>I could see the RAID devices probed as sdb and sdg, with partitions 1, 2, 3 configured but only partition 2 correctly in sync. The 4th partition is used for NFS in the CVM (ie. fast storage for the cluster).<\/p>
So my solution was\u00a0<\/p>
Set the devices I needed to modify back to writable mode\t[root@sysresccd ~]# mdadm --readwrite md126
[root@sysresccd ~]# mdadm --readwrite md127
[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdb3[2]
      41909248 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

md127 : active raid1 sdb1[2]
      10476544 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

unused devices: <none><\/code><\/pre>\t\u00a0<\/p>\t<\/li><\/ol>
Rejoin the devices back into the RAID1 mirror and let them resync\u00a0\t[root@sysresccd ~]# mdadm \/dev\/md126 -a \/dev\/sdg3
mdadm: re-added \/dev\/sdg3
[root@sysresccd ~]# mdadm \/dev\/md127 -a \/dev\/sdg1
mdadm: re-added \/dev\/sdg1
[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdg3[1] sdb3[2]
      41909248 blocks super 1.1 [2\/1] [U_]
      [=========>...........]  recovery = 48.5% (20361856\/41909248) finish=1.7min speed=200123K\/sec
      bitmap: 1\/1 pages [4KB], 65536KB chunk

md127 : active raid1 sdg1[1] sdb1[2]
      10476544 blocks super 1.1 [2\/1] [U_]
      \tresync=DELAYED
      bitmap: 1\/1 pages [4KB], 65536KB chunk

unused devices: <none><\/code><\/pre>\t[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdg3[1] sdb3[2]
      41909248 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md127 : active raid1 sdg1[1] sdb1[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

unused devices: <none><\/code><\/pre>\t<\/li>\tAs an added check, run fsck on the volumes\u00a0\t[root@sysresccd ~]# fsck \/dev\/md125
fsck from util-linux 2.36
e2fsck 1.45.6 (20-Mar-2020)
\/dev\/md125 has gone 230 days without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
\/dev\/md125: 62842\/655360 files (0.2% non-contiguous), 1912185\/2619136 blocks

[root@sysresccd ~]# fsck \/dev\/md126
fsck from util-linux 2.36
e2fsck 1.45.6 (20-Mar-2020)
\/dev\/md126: clean, 20006\/2621440 files, 5177194\/10477312 blocks

[root@sysresccd ~]# fsck \/dev\/md127
fsck from util-linux 2.36
e2fsck 1.45.6 (20-Mar-2020)
\/dev\/md127: clean, 66951\/655360 files, 1866042\/2619136 blocks<\/code><\/pre>\t<\/li><\/ol>After rebooting back into the hypervisor, the CVM came up normally.<\/p>","className":"post__content__best_answer"}">


            查看原件


          
           
            AHV
            CVM
           
           
           I have a problem with a CVM that won\u2019t boot. This is on a semi-retired production cluster (not CE) that has no workloads running on it.<\/p>I found the console output in \/tmp\/NTNX.serial.out.0 and I can see it trying to enable RAID devices, scan for a uuid marker and find 2 of them, then abort and unload the mpt3sas kernel module before trying again in 5 seconds. This repeats a few times before the hypervisor resets it and it starts booting again.<\/p>
The most relevant sections of the log (copious kernel taint messages removed) are\u00a0<\/p>
[    9.543553] sd 2:0:3:0: [sdd] Attached SCSI disk
svmboot: === SVMBOOT
mdadm main: failed to get exclusive lock on mapfile
[    9.790075] md: md127 stopped.
mdadm: ignoring \/dev\/sdb3 as it reports \/dev\/sda3 as failed
[    9.794087] md\/raid1:md127: active with 1 out of 2 mirrors
[    9.796034] md127: detected capacity change from 0 to 42915069952
mdadm: \/dev\/md\/phoenix:2 has been started with 1 drive (out of 2).
[    9.808602] md: md126 stopped.
[    9.813330] md\/raid1:md126: active with 2 out of 2 mirrors
[    9.815279] md126: detected capacity change from 0 to 10727981056
mdadm: \/dev\/md\/phoenix:1 has been started with 2 drives.
[    9.832111] md: md125 stopped.
mdadm: ignoring \/dev\/sdb1 as it reports \/dev\/sda1 as failed
[    9.840436] md\/raid1:md125: active with 1 out of 2 mirrors
[    9.842341] md125: detected capacity change from 0 to 10727981056
mdadm: \/dev\/md\/phoenix:0 has been started with 1 drive (out of 2).
mdadm: \/dev\/md\/phoenix:2 exists - ignoring
[    9.887613] md: md124 stopped.
[    9.896418] md\/raid1:md124: active with 1 out of 2 mirrors
[    9.898373] md124: detected capacity change from 0 to 42915069952
mdadm: \/dev\/md124 has been started with 1 drive (out of 2).
mdadm: \/dev\/md\/phoenix:0 exists - ignoring
[    9.926863] md: md123 stopped.
[    9.937962] md\/raid1:md123: active with 1 out of 2 mirrors
[    9.939950] md123: detected capacity change from 0 to 10727981056
mdadm: \/dev\/md123 has been started with 1 drive (out of 2).
svmboot: Checking \/dev\/md for \/.nutanix_active_svm_partition
svmboot: Checking \/dev\/md123 for \/.nutanix_active_svm_partition

[    9.994541] EXT4-fs (md123): mounted filesystem with ordered data mode. Opts: (null)
svmboot: Appropriate boot partition with \/.cvm_uuid at \/dev\/md123

[   10.009251] EXT4-fs (md125): mounted filesystem with ordered data mode. Opts: (null)
svmboot: Appropriate boot partition with \/.cvm_uuid at \/dev\/md125

svmboot: Checking \/dev\/nvme?*p?* for \/.nutanix_active_svm_partition
svmboot: error: too many partitions with valid cvm_uuid:  \/dev\/md123 \/dev\/md125
sh: missing ]
svmboot: Trying again in 5 seconds.

[   10.430316] md123: detected capacity change from 10727981056 to 0
[   10.432058] md: md123 stopped.
mdadm: stopped \/dev\/md123
[   10.467498] md124: detected capacity change from 42915069952 to 0
[   10.469245] md: md124 stopped.
mdadm: stopped \/dev\/md124
[   10.507492] md125: detected capacity change from 10727981056 to 0
[   10.509276] md: md125 stopped.
mdadm: stopped \/dev\/md125
[   10.547497] md126: detected capacity change from 10727981056 to 0
[   10.549243] md: md126 stopped.
mdadm: stopped \/dev\/md126
[   10.577498] md127: detected capacity change from 42915069952 to 0
[   10.579245] md: md127 stopped.
mdadm: stopped \/dev\/md127
[   10.586750] ata2.00: disabled
modprobe: remove 'virtio_pci': No such file or directory
[   10.673882] mpt3sas version 14.101.00.00 unloading<\/code><\/pre>\u00a0<\/p>
As it occurs before the networking has started and gets reset by the hypervisor, I do not have any way of interacting with the VM.<\/p>
How can this be resolved?<\/p>","quoteUsername":"waddles","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">
            

             喜欢
             引用
             
             
              
               
                分享


        
         
          该主题已关闭以供评论
        
        
         
          3个答复
          
           
            
             最古老的第一
            
             新的先来
             最佳投票
            
           
          
         
         
          
           
            
             
              
               w
             
             
           
           
            
             UserLevel 2
           
           
            
            +5
           
          
          
           
            
             
              涉水
             
            作者
            引导者
            47个答复
            
              
               1年前
               
                2020年10月16日
              
            回答
           
          
          
           经过大量的困扰，我终于能够启动一个系统救援CD，该CD可以访问RAID磁盘，以便我可以修复它。
           仅供参考 - 来自SATADOM的管理程序靴子，但没有SAS HBA设备的设备驱动程序，因此通常无法看到存储磁盘。管理程序启动具有SAS设备驱动程序（MPT3SAS）的CVM，因此所有磁盘访问均通过CVM完成。CVM使用SSD的前三个分区启动软件RAID设备。
           就我而言，其中2个软件RAID设备失去了同步。
           [root@sysresccd〜]＃lsscsi
[0：0：0：0]磁盘ata intel ssdsc2bx80 0140 /dev /sdb
[0：0：1：0]磁盘ATA ST2000NX0253 SN05 /dev /sda
[0：0：2：0]磁盘ATA ST2000NX0253 SN05 /dev /sdc
[0：0：3：0]磁盘ATA ST2000NX0253 SN05 /dev /sde
[0：0：4：0]磁盘ATA ST2000NX0253 SN05 /dev /sdd
[0：0：5：0]磁盘ata intel ssdsc2bx80 0140 /dev /sdg
[4：0：0：0]磁盘ATA SATADOM-SL 3ME 119 /DEV /SDF
[11：0：0：0] cd/dvd aten虚拟CDROM ys0j/dev/sr0
[root@sysresccd〜]＃lsblk
名称少校：最小RM大小RO类型Mountpoint
loop0 7：0 0 632.2m 1循环/run/archiso/sfs/airootfs
SDA 8：0 0 1.8T 0磁盘
└─SDA18：1 0 1.8T 0零件
SDB 8:16 0 745.2G 0磁盘
├─SDB18:17 0 10G 0零件
MD127 9：127 0 10G 0 RAID1
├─SDB28:18 0 10G 0零件
MD125 9：125 0 10G 0 RAID1
├─SDB38:19 0 40G 0零件
MD126 9：126 0 40G 0 RAID1
└─SDB48:20 0 610.6G 0零件
SDC 8:32 0 1.8T 0磁盘
└─SDC18:33 0 1.8T 0零件
SDD 8:48 0 1.8T 0磁盘
└─SDD18:49 0 1.8T 0零件
SDE 8:64 0 1.8T 0磁盘
└─SDE18:65 0 1.8T 0零件
SDF 8:80 0 59.6G 0磁盘
└─SDF18:81 0 59.6G 0零件
SDG 8:96 0 745.2G 0磁盘
├─SDG18:97 0 10G 0零件
├─SDG28:98 0 10G 0零件
MD125 9：125 0 10G 0 RAID1
├─SDG38:99 0 40G 0零件
└─SDG48：100 0 610.6G 0零件
SR0 11：0 1 693M 0 ROM/RUN/ARCHISO/BOOTMNT
[root@sysresccd〜]＃cat /proc /mdstat
个性：[RAID1]
MD125：活动（仅自动阅读）RAID1 SDG2 [1] SDB2 [2]
10476544封锁Super 1.1 [2/2] [UU]
位图：0/1页[0KB]，65536KB块

MD126：活动（仅自动阅读）RAID1 SDB3 [2]
41909248封锁Super 1.1 [2/1] [U_]
位图：1/1页[4KB]，65536KB块

MD127：活动（仅自动阅读）RAID1 SDB1 [2]
10476544块超级1.1 [2/1] [u_]
位图：1/1页[4KB]，65536KB块

未使用的设备：
           我可以看到RAID设备探测为SDB和SDG，分区1、2、3配置了，但仅在同步中正确地分区2。第四个分区用于CVM中的NFS（即集群的快速存储）。
           所以我的解决方案是
           
            设置我需要修改为可写模式所需的设备[root@sysresccd〜]＃mdadm-ReadWrite MD126
[root@sysresccd〜]＃mdadm-ReadWrite MD127
[root@sysresccd〜]＃cat /proc /mdstat
个性：[RAID1]
MD125：活动（仅自动阅读）RAID1 SDG2 [1] SDB2 [2]
10476544封锁Super 1.1 [2/2] [UU]
位图：0/1页[0KB]，65536KB块

MD126：Active RAID1 SDB3 [2]
41909248封锁Super 1.1 [2/1] [U_]
位图：1/1页[4KB]，65536KB块

MD127：Active RAID1 SDB1 [2]
10476544块超级1.1 [2/1] [u_]
位图：1/1页[4KB]，65536KB块

未使用的设备：

           
           
            将设备重新加入RAID1镜子，让它们重新同步[root@sysresccd〜]＃mdadm /dev /md126 -a /dev /sdg3
MDADM：重新添加 /DEV /SDG3
[root@sysresccd〜]＃mdadm /dev /md127 -a /dev /sdg1
MDADM：重新添加 /DEV /SDG1
[root@sysresccd〜]＃cat /proc /mdstat
个性：[RAID1]
MD125：活动（仅自动阅读）RAID1 SDG2 [1] SDB2 [2]
10476544封锁Super 1.1 [2/2] [UU]
位图：0/1页[0KB]，65536KB块

MD126：Active RAID1 SDG3 [1] SDB3 [2]
41909248封锁Super 1.1 [2/1] [U_]
[=========>...........] recovery = 48.5% (20361856/41909248) finish=1.7min speed=200123K/sec
位图：1/1页[4KB]，65536KB块

MD127：Active RAID1 SDG1 [1] SDB1 [2]
10476544块超级1.1 [2/1] [u_]
resync =延迟
位图：1/1页[4KB]，65536KB块

未使用的设备：
[root@sysresccd〜]＃cat /proc /mdstat
个性：[RAID1]
MD125：活动（仅自动阅读）RAID1 SDG2 [1] SDB2 [2]
10476544封锁Super 1.1 [2/2] [UU]
位图：0/1页[0KB]，65536KB块

MD126：Active RAID1 SDG3 [1] SDB3 [2]
41909248封锁Super 1.1 [2/2] [UU]
位图：0/1页[0KB]，65536KB块

MD127：Active RAID1 SDG1 [1] SDB1 [2]
10476544封锁Super 1.1 [2/2] [UU]
位图：0/1页[0KB]，65536KB块

未使用的设备：
            作为额外的检查，在卷上运行FSCK[root@sysresccd〜]＃fsck /dev /md125
FSCK来自Util-Linux 2.36
E2FSCK 1.45.6（20-MAR-2020）
/dev/md125已经消失了230天，没有检查，请检查强制。
通过1：检查inodes，块和尺寸
通过2：检查目录结构
通过3：检查目录连接
通过4：检查参考计数
通过5：检查小组摘要信息
/dev/md125：62842/655360文件（0.2％无连锁），1912185/2619136块

[root@sysresccd〜]＃fsck /dev /md126
FSCK来自Util-Linux 2.36
E2FSCK 1.45.6（20-MAR-2020）
/dev/md126：清洁，20006/2621440文件，5177194/10477312块

[root@sysresccd〜]＃fsck /dev /md127
FSCK来自Util-Linux 2.36
E2FSCK 1.45.6（20-MAR-2020）
/dev/md127：清洁，66951/655360文件，1866042/2619136块
           
           重新启动回到管理程序后，CVM正常出现。
          
          
           
           After much mucking around, I was\u00a0finally able to boot a System Rescue CD which had\u00a0access to the RAID disks so I could fix it.<\/p>FYI - the hypervisor boots from the SATADOM but it does not have a device driver for the SAS HBA device so it cannot normally see the storage disks. The hypervisor boots the CVM which has a SAS device driver (mpt3sas), therefore all disk access is done through the CVM. The CVM boots off software RAID devices using the first 3 partitions of the SSDs.<\/p>
In my case, 2 of the software RAID devices had lost sync.<\/p>
[root@sysresccd ~]# lsscsi
[0:0:0:0]    disk    ATA      INTEL SSDSC2BX80 0140  \/dev\/sdb
[0:0:1:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sda
[0:0:2:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sdc
[0:0:3:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sde
[0:0:4:0]    disk    ATA      ST2000NX0253     SN05  \/dev\/sdd
[0:0:5:0]    disk    ATA      INTEL SSDSC2BX80 0140  \/dev\/sdg
[4:0:0:0]    disk    ATA      SATADOM-SL 3ME   119   \/dev\/sdf
[11:0:0:0]   cd\/dvd  ATEN     Virtual CDROM    YS0J  \/dev\/sr0
[root@sysresccd ~]# lsblk
NAME      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
loop0       7:0    0 632.2M  1 loop  \/run\/archiso\/sfs\/airootfs
sda         8:0    0   1.8T  0 disk
\u2514\u2500sda1      8:1    0   1.8T  0 part
sdb         8:16   0 745.2G  0 disk
\u251c\u2500sdb1      8:17   0    10G  0 part
\u2502 \u2514\u2500md127   9:127  0    10G  0 raid1
\u251c\u2500sdb2      8:18   0    10G  0 part
\u2502 \u2514\u2500md125   9:125  0    10G  0 raid1
\u251c\u2500sdb3      8:19   0    40G  0 part
\u2502 \u2514\u2500md126   9:126  0    40G  0 raid1
\u2514\u2500sdb4      8:20   0 610.6G  0 part
sdc         8:32   0   1.8T  0 disk
\u2514\u2500sdc1      8:33   0   1.8T  0 part
sdd         8:48   0   1.8T  0 disk
\u2514\u2500sdd1      8:49   0   1.8T  0 part
sde         8:64   0   1.8T  0 disk
\u2514\u2500sde1      8:65   0   1.8T  0 part
sdf         8:80   0  59.6G  0 disk
\u2514\u2500sdf1      8:81   0  59.6G  0 part
sdg         8:96   0 745.2G  0 disk
\u251c\u2500sdg1      8:97   0    10G  0 part
\u251c\u2500sdg2      8:98   0    10G  0 part
\u2502 \u2514\u2500md125   9:125  0    10G  0 raid1
\u251c\u2500sdg3      8:99   0    40G  0 part
\u2514\u2500sdg4      8:100  0 610.6G  0 part
sr0        11:0    1   693M  0 rom   \/run\/archiso\/bootmnt
[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active (auto-read-only) raid1 sdb3[2]
      41909248 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

md127 : active (auto-read-only) raid1 sdb1[2]
      10476544 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

unused devices: <none><\/code><\/pre>I could see the RAID devices probed as sdb and sdg, with partitions 1, 2, 3 configured but only partition 2 correctly in sync. The 4th partition is used for NFS in the CVM (ie. fast storage for the cluster).<\/p>
So my solution was\u00a0<\/p>
Set the devices I needed to modify back to writable mode\t[root@sysresccd ~]# mdadm --readwrite md126
[root@sysresccd ~]# mdadm --readwrite md127
[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdb3[2]
      41909248 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

md127 : active raid1 sdb1[2]
      10476544 blocks super 1.1 [2\/1] [U_]
      bitmap: 1\/1 pages [4KB], 65536KB chunk

unused devices: <none><\/code><\/pre>\t\u00a0<\/p>\t<\/li><\/ol>
Rejoin the devices back into the RAID1 mirror and let them resync\u00a0\t[root@sysresccd ~]# mdadm \/dev\/md126 -a \/dev\/sdg3
mdadm: re-added \/dev\/sdg3
[root@sysresccd ~]# mdadm \/dev\/md127 -a \/dev\/sdg1
mdadm: re-added \/dev\/sdg1
[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdg3[1] sdb3[2]
      41909248 blocks super 1.1 [2\/1] [U_]
      [=========>...........]  recovery = 48.5% (20361856\/41909248) finish=1.7min speed=200123K\/sec
      bitmap: 1\/1 pages [4KB], 65536KB chunk

md127 : active raid1 sdg1[1] sdb1[2]
      10476544 blocks super 1.1 [2\/1] [U_]
      \tresync=DELAYED
      bitmap: 1\/1 pages [4KB], 65536KB chunk

unused devices: <none><\/code><\/pre>\t[root@sysresccd ~]# cat \/proc\/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdg3[1] sdb3[2]
      41909248 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

md127 : active raid1 sdg1[1] sdb1[2]
      10476544 blocks super 1.1 [2\/2] [UU]
      bitmap: 0\/1 pages [0KB], 65536KB chunk

unused devices: <none><\/code><\/pre>\t<\/li>\tAs an added check, run fsck on the volumes\u00a0\t[root@sysresccd ~]# fsck \/dev\/md125
fsck from util-linux 2.36
e2fsck 1.45.6 (20-Mar-2020)
\/dev\/md125 has gone 230 days without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
\/dev\/md125: 62842\/655360 files (0.2% non-contiguous), 1912185\/2619136 blocks

[root@sysresccd ~]# fsck \/dev\/md126
fsck from util-linux 2.36
e2fsck 1.45.6 (20-Mar-2020)
\/dev\/md126: clean, 20006\/2621440 files, 5177194\/10477312 blocks

[root@sysresccd ~]# fsck \/dev\/md127
fsck from util-linux 2.36
e2fsck 1.45.6 (20-Mar-2020)
\/dev\/md127: clean, 66951\/655360 files, 1866042\/2619136 blocks<\/code><\/pre>\t<\/li><\/ol>After rebooting back into the hypervisor, the CVM came up normally.<\/p>","quoteUsername":"waddles","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">
            

             喜欢
             引用
            
           

           
          

         

         
          
           
            
             
              
               b
             
             
           
          
          
           
            
             
              B.Moussa
             
            旅行者
            1回复
            
              
               10个月前
               
                2021年7月4日
              
           
          
          
           你好，
           感谢您的分享，这对我来说非常少，仅用于我使用Pheonix图像从Prism（Boot Repaire部分）下载的信息，以下命令不起作用。
           
           [root@sysresccd〜]＃MDADM  -  ReadWrite MD126
           [root@sysresccd〜]＃MDADM  -  ReadWrite MD127
           
           我试图应用更改和这项工作（它已经在读写状态中）
           
           最好的问候，谢谢。
           
          
          
           
           hello,<\/p>thank you for this share, it was very handful for me, just for information i used pheonix image downloaded from prism (boot repaire section) and the below command doen\u2019t work.\u00a0<\/p>
\u00a0<\/p>
[root@sysresccd ~]# mdadm --readwrite md126<\/em><\/p>
[root@sysresccd ~]# mdadm --readwrite md127<\/em>\u00a0<\/p>
\u00a0<\/p>
i tried to apply change and that\u00a0work\u00a0 (it was already in read and write status)<\/p>
\u00a0<\/p>
best regards & thank you.<\/p>
\u00a0<\/p>","quoteUsername":"B.Moussa","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">
            

             喜欢
             引用
            

           

           
          

         

         
          
           
            
             
              
               s
             
             
              
             
           
          
          
           
            
             
              Sergiy Lozovsky
              
             
            Nutanix员工
            1回复
            
              
               8个月前
               
                2021年8月31日
              
           
          
          
           如果可以访问CVM控制台。
           
            重新启动CVM;
            在grub菜单上选择“调试外壳”；
            在外壳提示下做。模块。SH”（加载所需的驱动程序）；
            组装突袭阵列“ mdadm  - 组合 - 扫描 - 运行”；
            检查RAID阵列是否有两个驱动器（“ CAT /PROC /MDSTAT”）；
           
           如果将RAID阵列组装在一起，则正如前面评论中所描述的那样，将其重新组装不正确。（（MDADM /DEV /MD127 -A /DEV /SDG1）
           重新启动CVM（来自管理程序）。
           
           重建女士阵列有一些外部链接，例如https://www.thomas-krenn.com/en/wiki/mdadm_recovery_and_resync
          
          
           
           If there is access to CVM console.<\/p>Reboot CVM;<\/li>\t
At Grub menu\u00a0select \u201cDebug Shell\u201d;<\/li>\t
At the shell prompt do \u201c. modules.sh\u201d (that loads required drivers);<\/li>\t
Assemble RAID arrays \u201cmdadm --assemble --scan --run\u201d;<\/li>\t
Check if RAID arrays have two drives each (\u201ccat \/proc\/mdstat\u201d);<\/li><\/ol>If RAID arrays are assembled incorrectly reassemble them as was describe in the previous comments. (mdadm \/dev\/md127 -a \/dev\/sdg1<\/em>)<\/p>
Reboot CVM (from the hypervisor).<\/p>
\u00a0<\/p>
There are some external links on rebuilding madam arrays, like\u00a0https:\/\/www.thomas-krenn.com\/en\/wiki\/Mdadm_recovery_and_resync<\/a><\/p>","quoteUsername":"Sergiy Lozovsky","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">
            

             喜欢
             引用


   
    
     
    
   
   
    
     
      
       
        由内部提供动力
       
       
        条款和条件
       
       
      
     
    
   
   
    
     
     注册
     已经有一个帐户？登录
     
      
      使用您的帐户登录
     
    
    
     
     登录社区
     
     使用您的帐户登录
    
    
     输入您的用户名或电子邮件地址。我们将向您发送带有指令的电子邮件以重置您的密码。
     
      
      
       
        用户名或电子邮件
       
       
        
       
      
      
       
       返回概述
      
      
     
    
    
    
     扫描病毒文件。
     抱歉，我们仍在检查该文件的内容，以确保它可以安全下载。请在几分钟后再试一次。
     好的
    
    
     该文件无法下载
     抱歉，我们的病毒扫描仪检测到该文件无法安全下载。
     好的
    
   
   
   Learn more about our cookies.<\/a>","cookiepolicy.button":"Accept cookies","cookiepolicy.button.deny":"Deny all","cookiepolicy.link":"Cookie settings","cookiepolicy.modal.title":"Cookie settings","cookiepolicy.modal.content":"We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.<\/a>","cookiepolicy.modal.level1":"Basic
Functional","cookiepolicy.modal.level2":"Normal
Functional + analytics","cookiepolicy.modal.level3":"Complete
Functional + analytics + social media + embedded videos"}}}">