Take a Walk through your SharePoint Farm

When tasked with upgrading our long-neglected ‘intranet’ last year my first job was to work out just how much data was out there and what needed to be upgraded.

The masterpage had been butchered some time in the past so most of the pages were missing navigation, making it hard to follow sites down the hierarchy.  And what a hierarchy!  The architects of the original instance apparently worked out that you could have more than one document library per site, or that you could create folders.  The result is the typical site sprawl.  To add to the fun, some sites were created using some custom template that no longer works and others didn’t have any files at all in them.

In order to create a list of all the sites and how they relate, you can use a PowerShell script:

[code lang=”PowerShell”]

[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SharePoint") > $null

function global:Get-SPSite($url){
return new-Object Microsoft.SharePoint.SPSite($url)
}

function Enumerate-Sites($website){
foreach($web in $website.getsubwebsforcurrentuser()){
[string]$web.ID+";"+[string]$web.ParentWeb.ID+";"+$web.title+";"+$web.url
Enumerate-Sites $web
$web.dispose()
}
$website.dispose()
}

#Change these variables to your site URL and list name
$siteColletion = Get-SPSite(http://example.org)

$start = $siteColletion.rootweb

Enumerate-Sites $start
$start.Dispose()
[/code]

It’s actually pretty simple since we take advantage of recursion.  Pretty much a matter of getting a handle to the site collection, outputting its GUID and parent site GUID and then the human-readable title and URL.  They you do the same for that site, and so on down the branch.

The reason we’re outputting the GUIDs is that we can use them to relate each site to the rest.  The script outputs straight to the console but if you pipe the output to a text file you can use it for the input to a org-chart diagram in visio.  The results are terrifying:

image

Each node on that diagram is a site that may have one, or thousands of documents.  Or nothing.  Or the site just may not work.  As it turned out, when asked to prioritize material for migration, the stakeholders decided it would be easier just to move the day-to-day stuff and leave the old farm going as an ‘archive’.  Nothing like asking a client to do work to get your scope reduced!

As a final note on this script, it is recursive so could (according to my first-year Comp. Sci. lecturer) theoretically balloon out of control and consume all the resources in the visible universe, before collapsing into an ultradense back hole and crashing your server in the process, but you’d have to have a very elaborate tree structure for that to happen, in which case you’d probably want to partition it off into separate site collections anyway.

More on Web Log Analysis

In my previous post on web log analysis, I described a Powershell wrapper script for LogParser.exe, which lets you do SQL-style queries to text logfiles.  Today I have another script which wraps that script and is used in a timer job to send the filtered logs to the client each month.

[sourcecode language=”powershell”]
#GenerateLogAnalysis will query IIS logfiles and output logs for PDF downloads from the first until
#the last day of the previous month

#function that performs the actual analysis
function RunLogAnalysis(){

$command = "c:\users\daniel.cooper\desktop\scripts\queryLogs.ps1 -inputFolder {0} -outputFile {1} -startDate {2} -endDate {3} -keyword {4}" -f $inputFolder, ($outputPath+$outputFile), (ConvertDateToW3C($startDate)), (ConvertDateToW3C($endDate)), "elibrary"
$command
invoke-expression $command

$emailBody = "<div style=""font-family:Trebuchet MS, Arial, sans-serif;""><img src=""http://www.undp.org/images/cms/global/undp_logo.gif"" border=""0"" align=""right""/><h3 style=""color:#003399;"">Log Analysis</h3>A log anaylsis has been run on the eLibrary for PDF files for "+$monthNames[$startDate.month-1]+" "+$startDate.Year+"<br/>Please find it attached."

sendEmail "recipient@example.org" "sender@example.org" "eLibrary Log Analysis: $outputFile" ($outputPath+$outputFile) $emailBody
}

function ConvertDateToW3C($dateToBeConverted){

return "{0:D4}-{1:D2}-{2:d2}" -f $dateToBeConverted.year, $dateToBeConverted.month, $dateToBeConverted.day;

}

function sendEmail($toAddress, $fromAddress, $subject, $attachmentPath, $body){

$SMTPServer = "yourMailServer"

$mailmessage = New-Object system.net.mail.mailmessage
$mailmessage.from = ($fromAddress)
$mailmessage.To.add($toAddress)
$mailmessage.Subject = $subject
$mailmessage.Body = $body

$attachment = New-Object System.Net.Mail.Attachment($attachmentPath, ‘text/plain’)
$mailmessage.Attachments.Add($attachment)

$mailmessage.IsBodyHTML = $true
$SMTPClient = New-Object Net.Mail.SmtpClient($SmtpServer, 25)
$SMTPClient.Send($mailmessage)
$attachment.dispose()
}

#Current Month
$currentDate = Get-Date
$localDateFormats = new-object system.globalization.datetimeformatinfo
$monthNames = $localDateFormats.monthnames
$localDateFormats.dispose
#Generate first day of last month as a date
$startDate = $currentDate.AddMonths(-1).addDays(-$currentDate.AddMonths(-1).day+1)

#Generate last day of last month as a date
$endDate = $currentDate.AddDays(-$currentDate.day)

#Set the initial parameters
$inputFolder = "c:\temp\www.snap"
$logName = "SNAP"
$outputFile = "LogAnalysis_"+$logName+"_"+$startDate.year+$monthNames[$startDate.month-1]+".csv"
$outputPath = "C:\Users\daniel.cooper\Desktop\"

RunLogAnalysis($inputFolder, $outputFile, $startDate, $endDate)
[/sourcecode]

What’s happening here is that RunLogAnalysis() is the main controller function.  What is does is set up the command to run the queryLogs.ps1 script mentioned in the previous post, waits until it’s run and then email the result off.  We have another function, ConvertDateToW3C, which takes a date-parsable string and converts it to W3C format, which is what LogParser.exe likes.  sendEmail() is pretty straightforward, it’s a generic email-sending function.

After the functions we have a little code to set up parameters.  My task was to email the client the last month’s logs for PDF downloads on the first of each month.  To do this we get last month’s name (for the output filename) , the date on the first of last month and the date on the last day of the last month.

After parameter generation is done, we perform the log analysis and email the result.  This is created as a scheduled task on the webserver and we’re done.

Weblog Madness!

In the these days of Google Analytics it’s a bit passé to talk about boring of web server logs.  But there’s still good reasons to to go diving into the into the big text files generated by IIS or Apache.  In my poorly paid and humiliating day job I was recently asked to find out how popular our PDF publications were.  The trouble is that, being a traditional-style organisation, most staff members think of the internet as an email medium and send links to PDFs via email ‘blasts’.  Downloads this way can’t be picked up via the standard Google Analytics javascript-based tags.  We have a central library of publications and I made landing pages but that’s a bit like closing the gate after the horse has bolted, what about last year’s traffic?

The only true answer is to go look at the actual logs of what files were served to whom and when.  The data’s all in there!  There’s only two problems:

  1. Those are some big-ass files to filter
  2. Lots of downloads are by spiders, rather than people.

Problem #1 is pretty easy to fix.  Microsoft provides a command-line DOS tool as part of it’s IIS5 administrator’s toolkit (you can Google that) which will let you do SQL-like queries against W3C format logfiles (and lots of other log formats).  Problem #2 is a bit more work.  Using the user-agent parameter of a HTTP request we can spot the spiders and filter on them, but there’s a great many of them!  Building the WHERE clauses for the query is a major effort and you risk missing a bracket or comma somewhere.

The solution, as is to all life’s big problems, is to automation or, more specifically, scripts.  PowerShelll this time…

param([string]$inputFolder = "none", [string]$outputFile = "none", [string]$startDate = "none", [string]$endDate = "none", [string]$keyword = "none")

#Query Logs is a wrapper for LogParser.exe which allows SQL-like queries to logfiles
#It is set to query IIS logfiles for PDF downloads, to filter out web spiders and output in a useful format
#With parameters it can output a date range and filter on a keyword withing the PDF filename.

function output-help(){

"USAGE: .\queryLogs.ps1 -inputfolder xxx -outputfile xxx [-startdate xxx] [-enddate xxx] [-keyword xxx] "

}

switch("none"){

$outputFile {
"No output file specified!"
output-help
exit
}
$inputFolder {
"No input folder specified!"
output-help
exit
}

}

function buildRobotExcludeStatement([string]$botname){

"INDEX_OF(TO_LOWERCASE(cs(User-Agent)), TO_LOWERCASE('$botname')) = null"

}

$logparserLocation = "LogParser.exe"

$selectStatement = "SELECT date, cs-uri-stem, cs-uri-query, c-ip, cs(User-Agent)"
$fromStatement = "FROM $inputFolder\*.log TO $outputFile"

$whereStatement = "WHERE sc-status = 200 AND cs-method = 'GET' AND INDEX_OF(cs-uri-stem, '.pdf') > 0"

if($startDate -ne "none"){

$whereStatement = "$whereStatement AND date >= '$startDate'"

}

if($endDate -ne "none"){

$whereStatement = "$whereStatement AND date0"

}

$whereStatement = "{0} AND {1}" -f $whereStatement, (buildRobotExcludeStatement "http:")
$whereStatement = "{0} AND {1}" -f $whereStatement, (buildRobotExcludeStatement("robot"))
$whereStatement = "{0} AND {1}" -f $whereStatement, (buildRobotExcludeStatement("xenu"))
$whereStatement = "{0} AND {1}" -f $whereStatement, (buildRobotExcludeStatement("vse"))
$whereStatement = "{0} AND {1}" -f $whereStatement, (buildRobotExcludeStatement("urlchecker"))
$whereStatement = "{0} AND {1}" -f $whereStatement, (buildRobotExcludeStatement("TimKimSearch"))
$whereStatement = "{0} AND {1}" -f $whereStatement, (buildRobotExcludeStatement("Jakarta+Commons"))
$whereStatement = "{0} AND {1}" -f $whereStatement, (buildRobotExcludeStatement("bot"))
$whereStatement = "{0} AND {1}" -f $whereStatement, (buildRobotExcludeStatement("spider"))
$whereStatement = "{0} AND {1}" -f $whereStatement, (buildRobotExcludeStatement("yandex"))
$whereStatement = "{0} AND {1}" -f $whereStatement, (buildRobotExcludeStatement("Xerka"))
$whereStatement = "{0} AND {1}" -f $whereStatement, (buildRobotExcludeStatement("www"))
$whereStatement = "{0} AND {1}" -f $whereStatement, (buildRobotExcludeStatement("crawler"))
$whereStatement = "{0} AND {1}" -f $whereStatement, (buildRobotExcludeStatement("HttpComponents"))
$whereStatement = "{0} AND {1}" -f $whereStatement, (buildRobotExcludeStatement("leecher"))

$parameters = "-i:IISW3C `"$selectStatement $fromStatement $whereStatement`" -o:csv"
$command = "$logparserLocation $parameters"
$command
"(running)"
iex $command

You can cut that code out and take it to the bank.  What it does is construct the command-line for logparser.exe and set it on it’s way.  You can give it the start and end date, the folder your *.log files are in, the name of and output logfile (it outputs *.csv) and even a keyword in the request URI path.  Your final output is a CSV file of all the PDFs (that’s hardcoded for the moment) that have been downloaded for, say 2011.  In Excel you can perform some easy data analysis (pivot tables are a must) and find the true number of downloads of PDF documents from your site.

If you run this, check the user-agents and add any robots you find to the WHERE clause builder.  The signatures  included are just the ones we get.  It’s handy to compare this with your Google analytics traffic to see what’s not getting recorded.

Of course a smarter way to do this would be to build an ISAPI filter that listened for PDF requests and fired off an event to Google Analytics recording each download but that seems like a lot of work given that my colleagues don’t have access to GA, wouldn’t know what to do with it if they did and would probably prefer the data in Excel format anyway.

Your client does not support opening this list with Windows Explorer

You’re not supposed to see the above error message unless you’re browsing your SharePoint site from a server, using XP SP2, Vista or IE6.

If you’re on Windows 7 and SharePoint 2010 like me and you get this then you have a problem with your connection to the server, probably at the authentication layer.

The easiest way to fix it is:

  • Click Start
  • Right-Click ‘Computer’
  • Select ‘Manage’
  • Expand ‘Services & Applications’
  • Click ‘Services’
  • Find WebClient and restart.
If it works after that then it’s just a hiccup.  If it still won’t work you need to start googling for WebDAV ports and suchlike.

Migrating to SharePoint 2010

Upgrades can be a titanic pain and a platform with as many moving parts as SharePoint means you’re in for a lot of headaches.

If you’re upgrading Office or even Windows, it’s usually just a matter of sticking a DVD in your machine, hitting OK a few times and going for a coffee.

The first problem with upgrading to SharePoint 2010 is its requirements: it has to be on Windows Server 2008 64-bit, so you may find yourself upgrading the OS in the first place.

I’ve got a good idea!

Because of this, the upgrade task seems like a good opportunity to upgrade your hardware as well.  For example, we moved our two farms onto a single, virtualised farm.

The trouble starts at the planning stage.  If you’re moving from an old farm to a new one, you’re not upgrading you’re migrating and pretty much all the support out there is for upgrades.

Two Paths to Follow

SharePoint 2010 gives you two options for upgrading, a database attach upgrade or an in-place upgrade.  We’re doing a database attach because upgrading production servers (which are a mess) into the unknown sounds like a lot of weekends spent in the office.

With a database attach upgrade you backup a content database from your 2007 farm, move it to your new database, create a receiver web application on your target farm and then go into Powershell to mount the new database into the app you just made.  This grands away for a while, after which you have your old website on your new server.  Great!  It can even look the same, as you can pull the 2007 masterpages along, or you can elect to upgrade the ‘user experience’ during the database attach.

And then the Trouble Began

So this is all good (as long as you don’t have any custom code, solutions or files on the old farm that aren’t on the new farm) as long as you’re happy with the new farm where it is, as it is.

We have two 2007 farms, one for intranet and one for extranet.  We’re merging them and redesigning the main site, along with lots of other changes, so we’re moving sites and lists out of the upgraded web application into a new application/site collection.

Now, within a single site collection you can use the Content and Structure tool to move sites, lists and items about.  But if you want to move something between site collections, let alone web applications, it’s a bit trickier.

Powershell to the Rescue

If you can’t do Powershell, you can’t manage SharePoint 2010, it just can’t be done.  Now, there should be a command to move a site or list, right?  Something like Move-SPWeb or some such?

I’m afraid not.  You can get some fancy-pants software to do that but it costs an arm and a leg, one testicle and a handfull of teeth.  Particularly if you have a lot of material to move or lots of servers in your farm.  Plus you have to install proprietary APIs.

So you have to use Powershell, specifically, the Export-SPWeb and Import-SPWeb functions.

The Import/Export Obstacle Course

Here’s the first problem: you can’t just import into an empty URL.  You have to create a site on your target site collection, then perform the import.  OK, that’s not a problem, I’ll just create a blank site and import into that.

Experienced SharePoint administrators will immediately see a problem with that.  The site you create to import your material into has to be the same site template as the one you’re exporting from.  As a bonus, you can’t tell beforehand what template a site was made with (can anyone correct me on this?)  Luckily the import function will tell you what template you need in the error message.  God help you if your original site was some crazy template that doesn’t exist on your new farm.

The other irritant is that you end up with a bunch of blank and duplicate lists.  I hate clutter in SharePoint so it’s a fair bit of work to clean this up.

If my only Tool was  Hammer

Being a lazy soul, if I find myself doing something more than once I’ll look for a way to automate it.  It’s lucky I work with computers hey?  Since I’m punching all these commands into the Powershell console, I may as well just save them to a file.  That’s what’s done and it seems to work OK now.  I shall post the completed script in my next post.

 

 

SharePoint 2010 Scripted Install

I’m really getting into the scripted SharePoint 2010 install hosted at codeplex.  It’s great because whenever something goes wrong with the config and install process, I can roll the machine back and start clean, instead of carrying forward every bug.  This is very important to me as the 2007 instances that we support at the moment are what we developers call “a bloody mess”.

The project started on SharePoint 2003, got upgraded to 2007 and a bunch of custom web parts were made to duplicate the existing functionality.  It wouldn’t do to have an application that didn’t cost more, be less capable and have more bugs than the off-the-shelf product.

So this 2010 project is a fresh start and I wanted to make sure that this instance was rock-solid and not as spotty as some of my earlier attempts.

The trouble is that there’s more than one path to walk when running up a new instance.  At its most basic, SharePoint can be installed on your workstation, with the retarded SQL Server embedded.  At its most complex you have a least-privileges install, which is best-practice and  nice and secure.  Actually if you’re installing SharePoint to host lots of isolated customers (tenants) it’s more complex, but we don’t have to worry about that.

Once your binaries are installed, you’re given the option of running the config wizard.  The wizard is fine for your dev machine but will really mess up a production environment.  The trouble is that the wizard starts a bunch of services and so on that you can’t access via Central Admin, so to make a proper instance you need to get into Powershell.

Since we’re in Powershell and we’re probably going to fluff the first few goes at installing the instance one may as well script the steps.  Luckily, the folks over at AustoSPInstaller have already done all the hard work.

What the script will do is create a fairly typical SharePoint 2010 instance and, most importantly for me, configure and start the thorny User Profile Synchronisation (UPS) service.

One of the best things is that you can run the script many times without it breaking your existing instance.  Got an error when installing a service?  Fix it and run the script again.

I’ve made some extensions to help automate the config — the script will get you to the point where everything exists but you need to configure it to make it work in your environment.  What I’d like, ideally, is a script that would do the install and all the config, so that I have a known, properly configured instance that I can reproduce exactly and quickly.

I’ll write a bit more and give more detail in a future post.

 

Learn from my stupid mistakes

I’ve been playing around with scripted installed for SharePoint 2010.  Given the install process is lengthy, tedious and very easy to bugger up, having a script that performs a least-privilege install automatically is pretty attractive.

Over at Codeplex they’ve got AustoSPInstaller, which is on v2 of the 2010 script.   This takes an XML config file with your service account passwords and so on and goes through all the punishment of a manual install in your behalf.

Great!  Except the powershell script kept getting access denied errors.  That’s odd.  I opened up regedit and had a look at the permissions.  OK script is running as me, I’m in the local domain user’s group, admins have full rights.  WTF?

Now, it’s been a while since I’ve done anything with scripts other than fool around with the filesystem, AD or a compliant SharePoint install, and I’ve only worked on this machine once before.  Luckily I’ve banged my head against this particular wall before.

I’m an administrator… but I still need to ‘run as administrator’ the script.

Much better now.

Of course, on the first run the script found the development database server is full and punched out.  Oh well.

 

MOSS/SharePoint Upload fails with file error 0x80070021

More upload nonsense!  Now we can upload large files, and all is well until a user turns up trying to upload a 173MB video file off a DVD-R.  It keeps failing and I assume it’s the DVD but it copies to my machine OK, but fails to upload to the document library.

OK, I open up the library in explorer view and drag and drop.  That gives a file locking error with the code 0x80070021.  Goggling didn’t do much good as this error apparently pops up most frequently with Outlook *.pst files.
Luckily my colleague is a SharePoint veteran and I asked him if he’d seen this error before.  Within a few minutes he’d concluded that the database was full.  As it turned out, truncating the log file did the trick.
Here’s the script to do that:
BACKUP LOG wss_content_go_snap_undp_org WITH TRUNCATE_ONLY
go
DBCC SHRINKFILE (2,1, TRUNCATEONLY)
GO

Fixing SharePoint/MOSS uploads

Sometimes our users want to upload video to document libraries, which is understandable as there’s no fileshare for them to use. The trouble has been in the past that the upload limit is set to 50MB. This being the 21st century I thought I’d make life a little more rational and change it to 200MB or so. So! Into Central Admin, find the max upload and we’re away.

Except, of course, we’re not because it didn’t work.

Consulting the Google oracle I found this:
http://www.sharepointboris.net/2009/02/upload-bug-in-moss-2007-windows-server-2008/

Which lead me to:
http://support.microsoft.com/kb/925083

Now, with all the changes made to web.config, and I’m still getting an error saying the file is larger than allowed. Uh oh. Back to Central Admin, back to the setting, which is under Application Management->Web Application General Settings. Turns out I had the wrong web application selected.

Oh well. Stupid errors caused by stupid oversights again.