Let me paint you a picture: So its 10pm on a Saturday night and I’m sitting on my file server trying to get a couple of large files from it and its giving me some weird problems. Slow speed, files coming through corrupted and what not.
Give the big file server a reboot and let it start up, and this is where I start to notice something is not right. As it turned out there were 1 drive was failing in the raid 10 array. *sigh* After my final testing days later it actually turned out that 3 drives had failed together. But at this moment I thought it was only one drive.
So I was not too worried because I had set a couple of sync jobs to replicate the data on other servers around the place. As it turned out these were also not working as well and had actually been failing for the last month or so. My fault for not scripting in error event handling. Maybe an email alert etc might have been good. *Kicking my self*
So its now 2am and have started the whole drive recovery and rebuilt of the raid to find that all my attempts continued to fail, wtf and this is when I started to suspect that there were other drives failing as well. SHIT!
All my data is either old or now not accessible because my poor old hard drives have kicked the bucket. Lucky for crashplan! I then over the next week started to restore over 4TB of data to an external HDD. Then the slow process of placing all the data back into my new file server. At this stage I had setup a VM to handle the services.
-– Need a bit more of a structured story here —
1. Where do I start?
Okay so first we need to organise the data we have. So for example my structure is as follows;
- Home Drives
- Deployment and Imaging
This covers essentially all the types of data that I have on hand. This may be different to you but having it reduced down to only a couple of folders initially will help you out later on when you get to the backup scheduling.
I also put a classify in levels 1 to 5 for each folder. 1 being must have and 5 being its okay if I don’t have it.
- 3 - Applications
- 1 - Home Drives
- 4 - ISOs
- 1 - Photos
- 1 - Videos
- 2 - Backups
- 5 - Deployment and Imaging
- 5 - Games
2. What’s important and whats not?
This step is probably one of the most critical ones here, we need to determine whats important and whats not. Working for the example above, I would choose to backup everything except for level 5 Deployment and Imaging and Games primarily because they are fairly large items and that I can reproduce them in time, either by re-downloading or just by re-installing it.
I also work on a 4TB limit. If I cannot fit everything I need in under 4TB, I then start to go through everything and delete what I don’t need. Most important stuff to me I need to keep which is 1 - 3, 4 -5 I don’t mind if I shrink a little.
If 4TB of space does not work for you, you can look at something a bit bigger, but remember the more larger space you have, the expenses of keeping that data go up, and the time and effort of transporting that data around either by cloud or other replication will increase. Not too mention the amount of mistro work you will need to complete. But again if you need the space you would have justified this already.
3. The schedule…
Okay so here comes the fun bit. We now have an idea of what data we are backing up and space requirements but what are we going to do with it..
Well this is what I do, and again this is of my own opinion and testing and this seems to be the best way for my current situation..
Main File Server –> Sync’s data to Synology NAS 4TB
--> Sync’s data to an external hard drive 4TB connect to Server A
--> Sync’s Crashplan data to Server A
--> Sync’s Crashplan data to Crashplan cloud server
Server A --> Receives Crashplan data
--> Veeam backup point
--> Sync’s data on D drive to Crashplan cloud server
Server B --> Run’s Veeam Backup and Replication services
--> Creates VM snapshot backups to restore from in case of emergency to D Drive
--> Sync’s data to Server A’s Crashplan archive
--> Sync’s data to Crahplans cloud server
What’s also important to note is that each one of the steps described above also are timed and do not clash with any of the other running tasks. The full complete backup process usually completes in about 2-3 days each time it runs.
4. Testing Failover, does what I have in place actually work?
Now this is a really scary one to complete, especially if you are not very confident in what you have in place. However this is also in tern such an important step to complete. You really need to know if everything shits the bed, will what I have in place actually help me to restore what I had?
If you answer no to this question go back to step 1 and start again. You need to know the exact steps involved of restoring a VM or restoring data from what replicated point etc etc etc.