Virtual Home Server Part 2

Back in November last year, I posted to this blog with the initial setup of my “home” server (https://www.wardnet.co.uk/virtual-home-server/) which in its former life was a Veeam backup server for an SME. This previous life means that it is a very high spec for its age and it may have a few TB of storage to boot! – Storage is an interesting point here as whilst there is around 8TB total (after RAID) it is not SSD or even high end SAS, therefore this space is at the cost of performance. The good thing here, is that performance is not critical to what I am using the servers for, there’s more than enough capacity and performance to deliver media via Plex for local and remote users and deliver reasonable DB performance for testing of installation processes – which currently relates to my day job. Below I have included an image of the VMWare ESXi Dashboard from today, not too much difference to the one I posted back in November other than the number of VMs has increased from 8 to 13 and therefore available storage has gone down by just around half a terabyte:

But that’s boring… what about the VM setup, anything exciting there?

Well let’s take a look:

Therefore, I have a lab domain setup now (wardnet.local) which I am using for testing out various ERP install/config scenarios, including down to the client layer with the last one in the list being a Windows 10 VM.

Four ERP servers I hear you scream, well, yes, rather… ERP2 is now solely an SQL server delivering the DBs for ERP3 and ERP4, with ERP1 being a self-contained SQL and App for the latest and greatest versions. APPS is actually a SharePoint 2013 Foundation server (DB is on ERP2) which is acting as document storage currently for the ERP servers.

I am also utilising my DC as a mail server with hMail and a .Net based webmail service (with mySQL backend) so I do not need a mail client anywhere! – My hMail implementation will be an extra post on here in the not too distant future I hope.

Extrasphere will be a blog post of its own TBC but it is a free cloning utility for ESXi implementations and works quite nicely.

In Summary, this server has allowed me to get my geek on with virtualisation, networking, server hardware and operating systems deployment from an infrastructure point of view, but also it has allowed me to delve back into the Sysadmin side of things, Domain creation, GPO deployments (Windows Updates, Shared Folders, BGInfo and more), mail server management, SharePoint admin etc. In addition, it has strengthened my expertise in the more recent transition into the application side of things, with many ERP deployment scenarios now tested (and scripted), as well as some real world simulations into config and usage of the ERP systems themselves. So now, I have a platform for testing anything, from Hardware tweaks through to Accounts Receivable invoicing!

Tip of the Week 5 – When sfc fails, DISM prevails

In the last month or so, I have come across a number of servers where the only possible cause remaining for the “issues” is file system/OS corruption. This has been across various scenarios; Cloud hosted, on premise physical, on site virtualised etc. and each time the obvious Googleable thing to try is a sfc /scannow.

For those who do not know this command, it is an old, old, old Windows built in tool designed to scan and repair corruption within Windows itself, notable directories such as System32. File system corruption can occur for many reasons, a dodgy build of the OS to start with, Windows Update issues, potentially caused even by viruses, or remnants of. So the cool thing about running sfc /scannow (from an elevated Command Prompt), is that it is has been around since Windows 98 – incidentally my first OS on a PC that was solely mine!

So sure, I have run this many times, probably more than a hundred, but recently on more advanced operating systems such as 2012R2 and 2016, I have seen it failing a little. Usually the result of the scan states

“Windows Resource Protection found corrupt files but was unable to fix some of them

Details are included in the CBS.Log windir\Logs\CBS\CBS.Log.”

So if you see this does it mean your system is totally broken…

All is not lost

If you have seen my previous posts on DISM you will know it is great for keeping a system tidy, especially when it comes to Windows Updates and the bits they leave behind. However, did you know that DISM could also be used to repair the file system? – Well no, neither did I until this year!

There are a couple of commands very useful to try, when a sfc fails:

  • DISM /Online /Cleanup-Image /CheckHealth

    This checks for corruption without attempting repair

  • DISM /Online /Cleanup-Image /ScanHealth

    This checks for Windows Image corruption and takes significantly longer than a CheckHealth

  • DISM /Online /Cleanup-Image /RestoreHealth

    BINGO! – this one will actually attempt the repair of a corrupted filesystem, and from experience means an sfc /scannnow will also complete without errors.

Please note I am not saying this is the solution to all file system/OS corruption, what I’m saying is in the last few weeks it has saved 3 servers from being binned!

So yet again, system maintenance via the DISM tool is a winner, whether keeping it tidy or just in one piece.

SQL Server Tip of the Week – AlwaysOn Introduction

Here’s some notes I recently wrote to help explain the basics of AlwaysOn Clustering in SQl – originally written for fairly technical people, i/e/ they know how to install SQL and use SSMS!

 
What is Always On?
SQL’s new (from 2012) failover clustering solution, built in to SQL that allows automatic failover of SQL servers without interruption to the applications using a database.

What is required?
2 fully licensed SQL servers at the same version with replication module installed
Windows Failover Clustering role installed
IP addresses for cluster (1 for cluster, min. 1 per node)
Windows Domain – cluster is a domain object

How to install/configure
Microsoft documentation/Brent Ozar is best source.
e.g. https://www.brentozar.com/archive/2015/06/how-to-set-up-standard-edition-alwayson-availability-groups-in-sql-server-2016/ for 2016
For installing Failover clustering you could add the role via PowerShell e.g.

Install-WindowsFeature FailoverClustering,RSAT-Clustering-Mgmt,RSAT-Clustering-PowerShell


What to know in advance

For an installer there are a number of things you need to know in advance:
• What type of synchronization is to be used?
• Synchronous or Asynchronous – https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/availability-modes-always-on-availability-groups
• Cluster Name – this is the name above all servers in a cluster and therefore where Epicor applications point
• Cluster IP address – ideally decided in advance so can be reserved/static , it’s the IP for the cluster not the server
• Node IP address – each server in the cluster will have a secondary IP used purely for clustering, this is not the IP address used for management of that server
• Secondary Node IP address – only if the servers exist in different subnets they need a node IP address in each range
• Witness Server file location – this is another server available to all SQL servers where a witness file is stored to maintain quorum, example \\FILESERVER\SQLWitness

Things to Note

• Asynchronous synchronization can mean a data loss, review the link above for more info, it is also not an automated failover. – designed for offsite replication over a large distance.
• Synchronous is a more accurate (live replica) sync, but “could” be a source of lag SQL side. i.e. increased overhead
• In 2016 Standard SQL you can only have 1 DB in each availability group, therefore from a config side it’s worth considering which DBs need to be available should the system failover.
• When restoring a database in an availability group (overwriting) you must take it out of the availability group first, and example of doing this with existing restore scripts is (with additions highlighted):

--Backup Source DB Code Goes Here
--Safety Backup of Destination DB Code Goes Here
USE [master]
ALTER AVAILABILITY GROUP MyDatabase_AG REMOVE DATABASE MyDatabase; 
--Restore Script GOes here, i.e. grab back of source and overwrite destination...
ALTER AVAILABILITY GROUP MyDatabase_AG ADD DATABASE MyDatabase
GO

Virtual Home Server

As one of those people who loves to be running the latest tech in both my home and professional lives it’s critical that I build the correct infrastructure in order to achieve that.

At home I recently obtained a 2014 spec Dell server, which had a fair bit of memory and storage, certainly for what was to become the hub of my home operations!

In the last 2 months I have been building the server up, utilising all the latest platforms I can get my hands on.. VMWare ESX 6.5, Windows 10/2016, Ubuntu 16.04 etc.

I now have 8 VMs, across 2 datastores having upgraded all firmware possible and playing around with various settings to balance performance and noise (it’s in the spare room)

Here’s the outcome of that work:

This first image shows my ESX 6.5 HTML5 based landing page (one of easiest to use web admin tools I’ve seen), you’ll note the 128GB RAM, Dual 2.9GHz CPUs and 8.5TB storage – perfect for running media servers as well as testing platforms for my crazy ideas!

Drilling down into the VMs I have built you’ll see a mixture of OSes and things I’m testing:

I was clever enough (somehow) to make my FTP server web facing, it’s where I store all the freebie utility style programs that I use across many systems, It allows me to use it instead of having to carry a USB around all the time!

Plex is the big one, over 3TB assigned to it for all the media we have at home, we can play it across all our devices, such as the SmartTV, Amazon Fire Stick, XboxOne etc.

What I’ve not yet got to grips with is the VMNetwork side of things, eventually I’d like to VLAN off some of the VMs to do some sandbox style testing with various OSes, maybe get back into Linux and re-learn hardening techniques etc, just need the time!

Spam Blocking & Operation Gemstone

Those of you who manage email servers at a similar level to myself will have noticed a huge increase in malware-infected spam during 2013.

In fact it has got to such a level that it was becoming unmanageable without a recognised Spam filtering application. And with no plans to venture into the realms of SpamFighter, GFI MailEssentials, Baracudda or others I decided it was time to go all on an all out spam war.

Part 1 started earlier this year with a number of emails being received relating to Stock Market purchases and “upcoming targets”, these were obviously spam, and so started “Operation Gemstone”, so-called due to the first set being related to a Gemstone Mining Company. This was becoming a nuisance for all staff and so we started blocking emails by familiar key words, i.e once we had 3 or 4 of a similar nature we were able to deduce a keyword that we could block and that wouldn’t block (too much) valid email. We started with a transport rule “Gemstone” and we now have 5 of these! The Gemstone rule set blocks key words found in either the subject or body, it excludes emails sent to the boss (who manages his own spam) or from an internal address, and rather than deleting, it redirects the message to a holding account as a quarantine where we can forward on false-positives if necessary.

Part 2 came about after analysing hundreds and thousands of spam emails collected over a number of months and actually looking at the message headers to find more similarities between emails of a seemingly different subject matter. Naturally, the first thought was the source IP, and in some cases we found multiple occurrences of the same IPs, however on the whole they were different every time. (Where we found similar entries we blocked them at firewall level) So the one area we found similarities was in the “Return-Path” message header, with a huge number coming from addresses pretending to be American Express related (aexp.com etc.) so then came our second rule set “Return Path Block”, this again was a transport rule with a redirect to a holding account, the difference this time was to set the rule to read the message headers and look for a “Return-Path” containing various phrases. This rule was so successful that we could turn off Gemstone’s 1-3 meaning less load on the Exchange Transport servers.

But then, another realisation hit, as the months have gone on this year, the spam was becoming more and more convincing, apart from one thing… Zip attachments! On instructions from above I blocked all incoming Zip attachments (by redirect, again). Since 9.05 Monday 18th November 2013 (7 Days) 1208 emails have contained zip files and have been redirected to the quarantine account. Of these only 4 have been genuine files meant for our staff! so our “Zippy” rule sits at the top of our transport rule set and does its job admirably.

Sure these methods may seem a little archaic, but I see a couple of advantages:

  • By redirecting rather than deleting at source it gives a chance to filter through emails to ensure nothing is missed
  • By using built in Exchange rules instead of a 3rd party tool adds less overall load to the Exchange servers (from our experience)
  • We can add new keywords, return-path sources or attachment types instantly
  • The transport rules allow us to TAG the emails, by pre-pending with for example: <BLOCKED – Gemstone Rule> or <.zip file attachment> allowing us to filter emails within Outlook

If anyone has any comments about all this I would love to see them, please feel free to contact me.

 

 

Skills Development

Here’s an update of the skills I’ve been developing lately:

  • WordPress Development
  • PHP Development
  • Exchange 2010 Anti-Spam and Transport Servers
  • MySQL DB Management
  • Linux Anti-Virus
  • Linux Shell Scripting
  • Linux Anti-Spam
  • ISPConfig web hosting environments
  • Oracle VirtualBox hosting environments
  • Oracle Enterprise Linux 6 administration
  • Site-Site VPNs

For more of my skills and abilities, check out my LinkedIn profile here