NOTE: I am not a member of VMware Global Support Services, nor should you use this as any sort of replacement for a service ticket or vmWare published documentation. This is just a process that I had to piece together during my own support case with them wanted to document for home lab use. Attempt the below steps at your own risk.
Problem Statement
While trying to expand storage per the VMware documentation within 2 separate vRLI 4.5 clusters, I had 1 of 3 nodes in each cluster fail to expand storage. After a few hours with support via email and phone, and several attempts to salvage the nodes by trying to expand the storage manually, it was decided that the nodes needed to be replaced.
TLDR
Unjoin bad node from cluster, backup the log data off the node, Delete node, build new node, join new node, restore data to appliance, load data into vRLI.
Procedure Used
- Remove Node from cluster via UI by following the documentation for your version of vRLI
- Find a secondary location you can temporarily store the data. I had a utility server that i added an extra 2.3 TB drive to for this.
- We used PSFTP to connect and move data
- login to the ‘bad’ node as root
open root@<applianceFQDN>
- use
lcd <destionation loc>
to set your local directory to the destination for the data you found in step 2 - navigate to the log ‘blob’ on the appliance -
cd /storage/core/loginsight/cidata
- pull that data -
get -r store store
- Wait. this is probably going to take a while. I had to leave hours running overnight.
- When the file copy is done, validate you have the right amount in your backup location
- I had our first copy fail to copy it all so I had to try again
- Power down the node being replaced
- Delete the VM
- Redeploy a new node with the same host-name and IP address
- Join the new node
- via the vRLI console, place the new node into Maintenance Mode
- Open PSFTP
- login to the new node as root
open root@<applianceFQDN>
lcd
back to your location for step 2. DO NOT lcd INTO the ‘store’ folder that was createdcd /storage/core/loginsight/cidata
- load the data back into the appliance using
put -r store store
- Waiting again
- Putty into the appliance as root
- run
cd /storage/core/loginsight/cidata
- Set the correct permission son the store folder with
chmod 755 store
- Stop services on the node with
service LogInsight stop
- run the following script to load the log data into the LogInsight application:
for bucket in $(ls /storage/core/loginsight/cidata/store | grep -v 'generation\|buckets\|strata_write.lock'); do echo y | /usr/lib/loginsight/application/sbin/bucket-index add $bucket --statuses archived; done
- more waiting
- start up services -
service loginsight start
- via the vRLI UI, bring the cluster out of maintenance mode