A new Toolbox…

Update: May 2022 – The toolbox.com domain now redirects to spiceworks.com.

Almost a decade ago, I started a professional blog on [the now defunct] it.toolbox.com called “IT Champloo”. At the time, IT Toolbox was a thriving community of professionals sharing advice, experiences, and wisdom.

Over the years, the platform’s performance and usability slowly declined (as did my interest in creating content for a site where banner ads were given more real estate than my articles).

When the domain, yousefalahmad.com became available, I decided to snatch it up and migrate my content to it. This blog will serve as a repository for tips, tricks, commentary, and observations in the hopes that they might be of use to someone later down the road.

How To: Resolving a System Hang During Patching, Remotely!

Routine patching of systems and software is a crucial piece of any business’ information security strategy. Even so, many systems go unnoticed and unpatched for months, even years until an external threat forces the organization into action (e.g. the recent WannaCry ransomware outbreak).

When that happens, server administrators need to be prepared for irregularities they’re likely to encounter, such as a hang prior to reboot.

In this scenario, we’re going to assume that you’ve just finished patching and clicked the “Restart Now” button. You begin a continuous ping (ping -t [hostname/IP address]) and wait for the server to restart.

Let’s assume a normal reboot takes 5-10 minutes for this machine, and that 25+ minutes have passed.

You check the console, and are greeted by the “‘Preparing to Configure Windows. Do not turn off your computer” message. Time continues to pass while your maintenance window dwindles like falling grains in an hourglass… pressure is mounting, the business won’t wait. Time for action!

Logged in as an Administrator from your workstation check the Windows Module Installer service on the remote system…

  1. Run services.msc
  2. Right-click “Services (Local)” and select “Connect to another computer …”
  3. Make sure the “Another computer” radio button is selected and enter the hostname of the stuck server and click “OK”
  4. Search for “Windows Module Installer” service and verify its status. If it’s “stopping,” then you will need to force it to stop. This can’t be done here, so we’ll need to query its PID and use our old friend TaskKill to manually kill the service

Query the Process ID (PID) of the Windows Module Installer (TrustedInstaller) service…

  1. Open Command Prompt as an Administrator
  2. Run the following command:
sc \\[hostname of the server] queryex trustedinstaller

This will return (among other information) the PID of the stuck service, write it down as you’ll need it for the next step

Kill the hung service remotely using TaskKill…

  1. From the Command Prompt already opened, run the following
command:taskkill /s [hostname of the server] /pid [PIDFromAbove] /f

Congratulations, your system should now be unhung! Check your console or continuous ping to verify that the system is restarting and proceed to the next round of updates.

Windows 10 Woes

Like many others, when Microsoft told me I was entitled to a free upgrade from Windows 8.1 to 10, I decided to take them up on the offer.

I downloaded the installation media, and the upgrade went relatively smoothly. I had a few driver issues at first, but was eventually able to iron everything out.

After a week or two of using the new OS, I started to encounter strange UI bugs…

At first, my start menu tiles stopped accepting mouse clicks. I could still launch applications that were pinned to the taskbar, but could not click on any Windows UI menu elements.

This got progressively worse until no windows or applications would accept mouse input. I could still see the cursor and move it around, I could right click and drag on the desktop, but couldn’t interact with anything else using the mouse.

I could still use Keyboard shortcuts as a work around, but it made things awfully inconvenient. I searched for a fix, but the only thing I came up with was a powershell script that purportedly fixed “Start menu” issues:

Get-AppXPackage -AllUsers | Foreach {Add-AppxPackage -DisableDevelopmentMode -Register "$($_.InstallLocation)\AppXManifest.xml"}

This worked for me (after a reboot), but sure enough, the problem came back within a few days. In addition to everything else, it started crashing with a BSOD “Memory_Management” error.

I wasn’t thrilled about disabling all of my devices and enabling them back one-by-one until I found the faulting driver so a clean install started to look more appealing.

I did this, but discovered that I my newly installed Windows 10 wasn’t activated, nor could I activate it as Microsoft’s brilliant new system doesn’t give you an activation key on the free upgrade!

In theory, your PC is supposed to Automagically ™ active itself as soon as it’s connected to the internet. That is, unless there’s a problem with the activation server (as many have encountered) or some other issue…

Microsoft recommends doing a fresh re-installation of whatever previous OS you were on, then running the upgrade again, but that’s nonsense!

All you have to do to fix the activation issue (assuming, like me, you started with the upgrade) is reboot from Windows 10 installation media and select “Reset your PC” with the “Keep my files” option selected.

After several reboots, my Windows 10 installation was repaired – no more UI issues, and successfully activated WITHOUT having to reinstall Windows 8…

Don’t be an IT Order Taker!

Last year, the old Maytag Man (as portrayed by Gordon Jump) who sat bored in his dispatch office, waiting for a repair call that never came was replaced with a younger, more versatile model. Sad really, as I’m going to miss Ol’ Lonely, but I’d never emulate him, and you shouldn’t either!

Early on in my career, I worked in sales. I’ve carried that experience with me all through my career as it taught me the value of proactivity. There were those who sat under a tree of pre-qualified prospects, waiting for low hanging fruit to drop off the branch. These people rarely made quota, and eventually moved on or were let go.

Amazingly, many IT Managers (particularly in the Middle East) are the same way;  they perceive their job as merely order takers, putting out fires as they appeared, happy to go on maintaining the status quo.

Perhaps these IT Managers aren’t taking time to understand the business and it’s needs. It could also suggest that they lack confidence and/or initiative.

Whatever the case, salespeople and IT Managers alike who take  initiative will always have an advantage over those who don’t, and are less likely to be caught off-guard by issues that will [inevitably] arise.

So what can you do to be more proactive? Here are some suggestions that may help:

  1. “Don’t wait until it’s raining to mend your roof – do it now, while the sun is shining.” Nothing is ever an IT emergency until it becomes one. The more time you spend preventing fires, the less time you’ll spend putting them out! (Backups, DRPs, documentation, etc.)
  2. “Make your rounds!” Often times, many easily-correctable issues go unreported because the employee(s) suffering from them can’t be bothered to submit a ticket, or don’t know how to articulate it. You can save a lot of time by visiting with people face-to-face to understand what their pains. Do this at every level of the organization! Be friendly, be approachable!
  3. “The map is NOT the territory! Get out there once in a while and see it for yourself!” Don’t rely solely on documentation! It could be outdated, there may be human error, or other factors introduced in the course of maintaining your Asset Register/CMDB.
  4. “Focus on what’s important.” This cliché has been beaten to death, so I’ll try to make it simple and relevant – anything the business depends on to make money should be your highest priority, followed by the systems which support them and so on and so forth.
  5. “Those who ignore history have no past – and no future!” Acknowledge those who have come before you. Study their mistakes, learn from them! It’s much less painful to side-step an avoidable pitfall than to climb out of it after-the-fact.

4G Blues (Errr Black, Teal and Purples): Zain Internet Service in Riyadh

Disclaimer: The facts and opinions expressed below represent my attempt to provide an accurate account of my personal experience with the Zain Internet Service Provider, its employees and Speed 4G Unlimited internet service.

The feedback and conclusions I’ve drawn in this article are based solely on my experience, and not intended to influence Zain customers (or potential customers), defame or slander.

Given that Zain operates in 9 countries, and provides a wide variety of services (including the mobile phone and 3G service I presently subscribe to, and am satisfied with), it is conceivable that the reader’s experience might vary.

The Zain logo, screenshots, links and other material are property of Zain Corporation, and are reproduced without permission for information and entertainment purposes only as allowed under Section 107 of the Copyright Act of 1976.

Thanks for reading!

Update, November 5, 2013: When I left town on Thursday, October 10, my service was still out. I can’t attest as to precisely when my service was restored as Zain has not made any attempt to follow up with me). Nevertheless, my service was up when I returned, and I’m presently looking for a new provider.

 

The Issue

Back in May, I started looking around for a 4G ISP here in Riyadh. My choices were STC, Mobily and Zain. Having looked at each of the offerings, Zain (at the time) seemed to offer the most bang for my buck, short of fiber optic service (which wasn’t available in my neighborhood at the time).

The offer advertised was speeds of up to 100 Mbps down and 50 Mbps up. I figured if I could get 25% of that consistently, I would be satisfied. Sure enough, I got about 33-45% of the “up to” speed consistently, which more than met my needs.

I was happy enough with this service to go ahead and prepay for 6 months of service to take advantage of a promotion advertising 3 months + 3 months free — a total of 6-months worth of service for 949 Saudi Riyals (~$253.07 USD or about $42.18/month).

This service took effect on June 22, 2013 and the package expiry is December 22, 2013.

The package I selected was the “Unlimited” service, meaning that there was not a preset cap on the amount of data I was allowed to use, but was still subject to their “Fair Use Policy,” which states that if I exceed 40GB of data (upstream and downstream combined) in one billing cycle, my speed would be capped at 512 Kbps until the end of that billing cycle, then would restore itself to normal speed.

When that happened (and it did every month), I would receive a SMS to my 4G router explaining that I had tripped the threshold and that my bandwidth would be throttled. 512 Kbps isn’t great, but it’s good enough to limp by on for a week or two while I waited for the current cycle to end.

On September 19, 2013, I received an SMS from Zain’s system stating that my package was about to expire on September 22 (only 3 months into my 6 month subscription), and advised me to renew my package.

This seemed odd, so I went to the account portal, and sure enough, my package expiry still showed December 22, 2013, 3 months hence (pictured below):

Out of curiosity, I wanted to see if it would let me purchase an additional package, but when I clicked the “Packages” tab, and hovered over the “Buy” link, it displayed the message, “Your package has to expire for you to buy this package.” thus affirming that my package was still valid.

Reassured by the account page, I disregarded the SMS, and assumed that the 3 months free I was supposed to get would kick in after the 22nd of September.

On October 1 at about 22:00 +03:00 GMT, all URL requests began redirecting to http://broadband.sa.zain.com (the account portal). I tried a few different URLs, checked my DNS settings, checked for proxy settings, power cycled my router, re-seated my SIM card, did a factory reset on my router, all to no avail, so I broke down and called their support line.

Based on the evidence I was seeing, it seemed to me that my account failed to kick over into the 3 free months and was prompting me to pay for a new package, but since the account page was still displaying the same figures above, it must have been some kind of glitch in my account, so I advised the agent who took the call of this and asked him to check the system and verify that my package was setup correctly.

For the purposes of this post, I will refer to the Agents, Supervisors, Operations Managers and other personnel by numbers in order of appearance.

 

Agent 1 advised me that his system was slow, and asked if I could call back in a couple of hours. I wasn’t going anywhere, so I told him I didn’t mind waiting for it load so he could tell me what was wrong with my account. Instead of asking me to standby, he advised me that his system was down, and insisted that I call back in 2 hours. I questioned this, but ultimately relented and called back 15 minutes later.

Agent 2 picked up took the next call, and sure enough, his system was working. He seemed to want to treat the issue like a new service request, and advised me to wait 24-48 hours for the service to activate. I told him that didn’t make sense as I’d prepaid for 6 months, and at no point was I advised that my service would be interrupted between the first 3 and last 3 months. After going through the motions of troubleshooting my service, I asked to speak to his supervisor.

Supervisor 1 went through the troubleshooting steps again, then suggested that the reason my service cut out was because I tripped the Fair Use Policy. I checked the bandwidth meter on my router, which stated that I still had another 12GB worth of service for the month, which should have been plenty. He stated that his count showed 7GB remaining as of midnight the previous day. This didn’t seem right, but regardless, I had come home late, and was watching YouTube videos at 360Kbps for about an hour for before my service cut out. No other devices connected (just my laptop). Assuming I was using my full bandwidth (at last count, I was getting about 13Mbps (~1.58MBps) down, and about 7Mbps (~0.85MBps) up), I would only be burning about 2.4MB/second.

At that speed, it would have taken me almost 11 hours to  burn through that much data, which would have been physically impossible as I took my laptop with me that day. Supervisor 1 did not let this physical impossibility trouble him, and refused to give me the benefit of the doubt, so I pointed out the actual behavior of the fair use policy, then links Zain’s 4G FAQ page explaining the same.

At long last, he’d finally conceded my assertion, but was forced to admit that he was unable to do anything to resolve my issue as the “technical support” department would have to call me back.

Call back? I couldn’t just be transferred to their technical support or be placed in a hold queue to wait for the next available representative? No sir, I had to wait for a call back.

“When can I expect a call back?” I asked.  Supervisor 1 advised me that it would be ‘soon’, but could not give me any kind of estimate beyond that.

Surprisingly, I did get a call back within about 30 minutes, but the agent didn’t speak English, and I had to wait another 20 minutes or so for another call back. English Speaking Technical Support Agent 2 seemed to understand the issue, and told me he would get it resolved shortly, and sure enough, within about about another 10 or 15 minutes, my service was back up…

Unfortunately, my luck didn’t last, and the service was back down again the following evening around the same time. It’s important to note that no one had followed up to verify that my service had been restored and ask permission to close the ticket, but I did receive an SMS with my ticket number indicating that the ticket would auto-close in 4 days if I didn’t respond – I responded via SMS, then followed up with a phone call to verify.

Agent 3 verified that my ticket was still open, and assured me he would make sure that it remained open until solved. He also advised me that a “technical support” agent would call shortly after looking into the issue.

About 4 hours passed with no call back, unable to call technical support directly, I called the general support line and spoke with Agent 4, who closed my previous ticket (without permission) and opened a new ticket (presumable to reset the RTO).

I called back in again that evening, spoke with Agent 5, then spoke with Supervisor 2, was escalated to Operations Manager 1 (who eventually hung up on me), each promising that the “concerned department” would contact me soon, but unable to provide an estimate as to when.

The following day, I tried contacting support via chat, got no further, but but Agent 6 decided to close my open, unresolved ticket that I’d tried to get an update on, and opened a brand new ticket (3rd so far) for my issue, resetting the RTO… again.

Again I called in, and again I was told that it was impossible to transfer to me technical support, and no, they didn’t have inter-office communications tools (Microsoft Office Communicator/Lync, etc. et al.) except for email, and assured me that “concerned department” was notified by email and promised me a call back shortly.

Day 3 concluded with no call back, as did day 4. On day 5, I was no longer being redirected to the account page, as my SIM card had deactivated – zero bars (previously 5). I called back in again, spoke to Agent 7, who transferred me back to Operations Manager 1. Operations Manager 1 claimed that he hadn’t hung up on me, that his outbound phone service went down, which is why he was unable to call me back.

“Outbound calls down for 3 days? No DR site? No business continuity plan?” I asked.

This didn’t seem to disturb him as much as it did me. Although I found his story hard to believe, I didn’t argue the matter, but advised him that I was going on 5 days with no service. Again he promised me someone from technical support would call me back, but again, I told him I found that difficult to believe.

Nevertheless, I gave him an opportunity to earn back my trust by calling me back personally in 3 hours to follow up. I verified that this was within his shift, and he agreed. After about 2 hours, I received a call back from Agent 7 begging for more time. I asked where Operations Manager 1 had gone, and that I was not about to let him off the hook, but  assured me that Operations Manager 1 would call back as promised – he didn’t.

Day 6, I concluded that this issue probably wasn’t going to get solved over the phone, and decided to take my 4G Router, SIM card, and original receipt to the Zain store I bought them from and ask for a new SIM card.

The clerk didn’t speak much English, but I fumbled my way through Arabic and managed to explain that my service ended after only 3 months. He examined my receipt, and drew on 6/22-12/22 (the 6 months I paid for), circling the last one, indicating that he understood I was supposed to have 6 months (not 3).

He gave me a new SIM card, I asked him to try it while I was there in the store, but  he indicated that it would take at least an hour for the card to activate.

I went back to my office, worked for about an hour and a half, the received a call from Technical Support Agent 3. I explained my issue to him, and after checking with his supervisor, he came back with a very clever piece of misdirection.

The picture he painted was that I was entitled to 100GB of bandwidth every month, and when that was reached, my service would be cut off. He cited this as the cause of my issue. I asked if the Fair Use Policy had changed, but he assured me that’s the way it always was.

I told him that this wasn’t the behavior I experienced, and asked him to look at the SMS message I had received on September 19, indicating that my 3 months was up and that I’d needed to renew it. He dismissed this as a “normal message from the system” that didn’t mean anything.

At this point, I pointed him to the Zain FAQ page that describes the Fair Use Policy (FUP) for the Unlimited service I have, and challenged him to show me something from Zain to support his claim.

He agreed, and after 5 minutes of futility, was forced to admit that was unable to substantiate his conclusion. I asked him to add up the facts: The SMS’s I received, the previous behavior of the FUP’s threshold being tripped, and even his own website’s explanation of that policy.

Although he ultimately forced to agree with me, regretfully, there was nothing he could do at his level of support, and again, I would have to wait for a call back.

About this time, close to 3 hours had passed since I was given the new SIM card, and it had yet to activate, so I went back to the store on the way home to have it looked at.

The agent I spoke with did a factory reset on my router, which gave me bars again (only 1 inside the store) but the account page redirect issue was still there. During my second visit to the store, I saw missed call from Zain.

Today (day 7), I received a call from non-English speaking technical support agent, so I handed my phone over to one of my co-workers to help me translate. Technical Support Agent 4 tried to give us the same 100GB song and dance, and failing that, handed me over to English-speaking Technical Support Agent 5 – I was on my way to an important meeting, and didn’t have time to argue with him about the FUP, instead, referring him immediately to his website, and once he conceded, advising him to escalate this and get it fixed.

I followed up via chat, referring to my previous ticket number, and was advised by Agent 8 that this ticket had been closed, and a 4th ticket had been opened…At this point, I asked Agent 8 to notate my account with links to the FUP so that the next person to call me back (whenever that might be) would not waste my time (or his) arguing with me about the previous agent’s misinformed notes. He agreed, and confirmed that I was in fact correct, and claimed that Technical Support Agent 3 had noted in my account that my interpretation of the FUP was correct, and promising some kind of compensation for my trouble, though no specific action was documented. I let it go at that, but copied the chat transcript for record purposes.

After 7 days, 4 tickets, and having spoken to 15 individuals, my issue is still unresolved. I will be heading back to the US for 3 weeks in a couple of days, and hope that this issue will be worked out before then. Hopefully, I’ll be able to update this post later with a resolution.

Lessons Zain Could Learn from This Incident

I’ve worked in many different call centers in my career including sales, customer service and technical support. I have first-hand knowledge of their policies, processes and procedures, performance metrics and day to day operations.

Having said as much, I’d like to compare and contrast the my experience on the client-side of Zain with my experience on the service-side in some of the roles I’ve held:

1. Zain’s Support Structure: Routing clients through a tiered support structure, then escalating based on complexity is pretty standard. Where Zain drops the ball is the disconnect between the customer and “technical support”.

In most healthy organizations, an Operator opens the ticket, summarizes the customer’s complaint, attempts some basic troubleshooting, and if unable to resolve the issue, passes the customer on to the next tier on the same call.

In Zain’s case, the customer is denied timely issue resolution by forcing him to wait for a call back. As such, there’s no accountability from the operator-side to attempt to solve the issue, nor are they truly empowered to do so.

If what the Operations Manager explained to me was accurate, no one in his call center, not even the Operations Manager himself could call the technical support team directly. His only alternative was sending an email.

While there might a legitimate reason for this, it seems like a serious logistical oversight.

2. Zain’s Employee Product Knowledge: Products change is fact of life for all call centers, and managers typically employee a number of resources to keep their employees briefed on the latest changes. This could include memos, emails, handouts, intranet sites and knowledgebases.

Despite the details of my package being readily available on Zain’s website, almost every employee I spoke with seemed to lack fundamental knowledge of the product I purchased and it’s terms and conditions.

Furthermore, almost all 15 of the employees I spoke to misquoted their Fair Use Policy. In fairness (pardon the pun), the policy is differs depending on whether one refers to the limited or unlimited package, and the policy they referred to seemed to be the limited one.

Nevertheless, a simple check of my account details should have confirmed my package and it’s terms and conditions (which I told each employee at the start of the conversation).

3. Zain’s Support Philosophy: After over a dozen phone calls and chats, the one recurring theme was the assumption by the agent that I was wrong, and it was his job to set me straight. Eventually, the Zain employee relented, but the burden of proof was always on me.

This “customer is always wrong” philosophy is counter productive. At best, the client will persist (as I did) and pursue issue to resolution in spite of the service provider’s lack of, well…service.

At worse, the customer “learns his lesson”, accepts that he’s been cheated and takes his business elsewhere. Based on conversations I’ve had with many Saudi’s, the later tends to be the prevailing outcome.

4. Zain’s Approach to Ownership and Empowerment: This last issue ties it all together, and represents my biggest issue with Zain. The best any employee could do was empathize with my situation, whether he was an agent operator or the Operations Manager himself. Not one of them was empowered to do anything beyond conveying their apology and regret for my inconvenience.

Even the “technical support” agents I spoke with have thus far been powerless to resolve my issue. Further, not one of them had made any attempt to solve this issue creatively – for instance, canceling my old service and refunding the balance, then starting a new, working subscription to get me back online.

Not one employee had the power to credit back the week I’ve lost in downtime, nor did this request seem reasonable to any of them. This tells me that either most customers don’t expect to be reimbursed to downtime due to Zain accounting error, or that Zain’s policy forbids it’s employees for accepting blame and responsibility for the mistakes of their systems.

It’s almost 1:00 am here now, and into day 8, so I think I’m going wrap up this post. Hopefully I’ll be able to add something positive tomorrow.

If any of my readers are Zain employees, or have worked for an ISP in Saudi Arabia or the middle east, I highly encourage you to share your thoughts and insight!