VS SSRS Project Gotchas

I now present to you, a recent adventure in building & configuring a SSRS 2016 / VS2015 project to replace our aging SSRS 2008R2 / VS2008 setup.  In trying to make things ‘better’, I wanted to centralize the storage of the Data Sources.  I found a tip on StackOverflow with a neat (albeit hacky) idea on how to do this.  So I started diving into it.  Here’s how that went.

The Setup

I have a bunch of published reports on a ‘real’ report server (SSRS instance), which were created off-the-cuff using Report Builder 3.0 (no source-control connection or anything).  So the first thing I had to do was go and download each RDL (thanks, Microsoft… ugh!), to my local VS2015 project folder, so I could bring them into my SSRS project.  I didn’t bother trying to download/copy the Data Sources (they’re basically just connection-strings, and anyway you can’t download them from Report Manager), so I settled for re-creating those in VS.

Then it was time to set up the solution & projects.  I prefer to organize my reports in context-relevant folders, like “Operations”, “Marketing”, etc.  In order to do this sensibly, within the VS solution, you need to create a Project for each major fold you’ll have.  Makes sense; this is how the old solution was constructed too.  Here’s where I tried to improve things:  I created a “Datasources” project, which would house just the shared Data Sources.  Here’s where I created my new data-sources, setting them up with saved SQL auth (login & pwd).  Sure, maybe not the “best practice” but it keeps things simple — my SSRS user only ever has read-only db access, and the pwd is managed well enough to keep auditors happy.

Thus, my plan (in following the SO tip) was to have all the other projects’ Data Sources be pointers to these shared Data Source files (RDS​’s).  Sounds good in theory, right?

what could possibly go wrong
ooh, sparkly!

Starting to Implement

Well.  I started small with just one project, one report, and one data source.  I went into my Nate test project, did “add existing item” into the Reports folder, and browsed to pick the rdl that I had just downloaded from my live SSRS server.  I then went to my Datasources project, copied the rds that I needed, pasted it into Nate test project’s Shared Data Sources folder.

Now at this point there are 2 copies of this file in my solution folder.  We don’t want that.  So I opened up trusty ol’ Notepad++ on the Nate test.rptproj file, and edited the XML node under <DataSources><ProjectItem><Name>My-Source.rds</Name><FullPath>My-Source.rds</FullPath></ProjectItem></DataSources> , changing the FullPath node value to <FullPath>..\Datasources\My-Source.rds</FullPath>.  I then deleted the physical copy of the file at \MySolution\Nate test\My-Source.rds , so that only the shared one physically remains in the filesystem (\MySolution\Datasources\My-Source.rds).

Another way to accomplish this, you may read, is to right-click Shared Data Sources within your project, e.g. Nate test, and say “Add existing item..”, and go select the global shared RDL from your master Datasources project-folder.  However, this still results in a physical copy of the file within Nate test folder!  So in either case you’ll end up doing some editing & deletion.

With me so far?  Okay, here’s where it gets interesting.  I could successfully build and deploy this report to my SSRS server, verifying that it showed my “new thing” (I added a text-box that simply said “This is the new hotness!”).  But I could not preview the report in VS!  Sad panda.  And of course, the most vague generic error message in the world:

An error occurred during local report processing
An error occurred during report processing

sarcasm-smiley
How specific and informative!

More Googling led to a somewhat random tip in the vein of “Go to the report properties, Open the report’s Data Source, and re-select the Shared Data Source from the dropdown”.  Combine with “Edit the Shared Data Source, ensure your saved login/pwd is filled in, and Save it”.  Presto!  Previewing now worked.

But why is this?  Well, as it turns out, it’s not all that surprising.  The RDLs are stored without the saved login/pwd, because of course they’re just plain-text XML files.  So to “protect you”, MSFT doesn’t store your SQL auth credentials, even if (and now matter how many times) you click “Save my password” on the connection-properties window (connection-string builder).

Great.  Well, I did that once, for Nate test project.  Let’s see what happens with another project (aka report folder), say Test again.  So I add an existing RDL, I copy-paste the Data Source from the global shared Datasources project, manually edit the rptproj in Notepad++, delete the copied rdl, etc.  I then try to preview the report… and behold, the same maddeningly vague error message!

“Well duh!” you may think to yourself, “the credentials still aren’t stored anywhere un-encrypted!”.  Yes, dear read, you are correct.  But do I want to right-click on this project’s Data Sources, edit it, fill in the credentials, save, rinse, repeat, for every single project?!  A resounding NO!

nobody-got-time-old-fashioned
ANGTFT

So what should I do?  I could go type in the credentials to the actual connection-strings contained within the ‘master’ rds files.  Of course then they’re in plain-text for all to see… but wait, do I care?  Another no.  Let’s go edit some connection-strings!

As a reminder, here’s what they will look like:

<ConnectString>Data Source=MyServer;Initial Catalog=MyReportDB;User ID=MyReportUser;Password=ThisIs@nAw3s0meP@ssw0rd!;</ConnectString>

Why did MSFT choose to name the XML node ConnectString in their rds schema instead of ConnectionString?  Don’t ask me!

The Gotchas

Here’s the first gotcha:  My reports (RDLs) need to be re-pointed at the correct Shared Datasource, because currently, as they were downloaded from my ‘real’ SSRS server — which had my desired top-level folder layout of “Datasources”, “Nate test”, “Marketing”, etc. — their Data Sources xml node points at \Datasources\MyDatasource.rdl.  This was the correct path on the SSRS server and still will be the correct path when all is said & done.  BUT, to support local debugging/previewing, the build output has to make a copy of the rds in the Project’s bin\Debug folder, which it will happily do, and then just as happily proclaim Could not find a part of the path 'C:\Users\Nate\Documents\Visual Studio 2015\Projects\MyReportSolution\Nate test\bin\Debug\Datasources\MyDatasource.rdl'. — because DUH, the build system didn’t make you a ‘Datasources’ sub-folder under the Debug folder, why would it!?  So by either manually editing the rdl file to remove the ​​\Datasources\ path from the node, or by clicking on the Report’s Data Sources node in the Report Data pane and re-pointing it at the local Shard Data Source (which again, is itself a pointer to the globally shared Datasources!), you can fix this issue.

In the rdl XML, that looks like this (using strikethru to show what you remove):

<DataSources>
<DataSource Name=”MyReport_DataSource”>
<DataSourceReference>\Datasources\MyDatasource</DataSourceReference>
<rd:SecurityType>None</rd:SecurityType>
<rd:DataSourceID>gobbledegook-guid</rd:DataSourceID>
</DataSource>
</DataSources>

And there’s another gotcha:  After we do all this, if we open the rds from within VS solution-explorer, it still won’t “think” it saved the authentication!  But the connection-string will be clearly visible with the plain password.  Here’s a couple pictures of what that looks like.

shared-datasource-properties-ssrs-vs2015
Data Source properties
datasource-connection-properties-ssrs-vs2015
Connection properties of the Data Source

Oh, by the way, did I mention how many times VS crashed on me while doing these gymnastics?  At least a few.

Lessons Learned

So, what have we learned?  Well, for one, this is a crappy situation born of poor in-product support.  I should be able to configure Solution-level shared Data Sources, use them in as many Projects (within said Solution) as I want, and have VS configuration management support them; bonus points for doing so with saved & encrypted credentials.  Ideally, when we check this into source-control, we’d check in the “DEV” environment flavor connection-configs.  Then, when the reports get deployed to the “PROD” SSRS server, the same globally shared Data Sources are already present (and they don’t get over-written, thankfully by default!), configured by the DBA with prod credentials, and nobody in the development pipeline needs to know said credentials.  Yay?

But alas.  We live in an imperfect world.  I also live in a world where I’m both the Report Developer, the Report Manager, and the Production DBA.  So guess what I get to do?  That’s right, not care about the connection-string security!

Okay, yes, I care a little.  Depending on my mood.  But not enough to turn my whole dang world upside-down trying to design and implement a robust multi-tiered-environment solution for one measly SSRS repository.  Kudos to you if you do — you’re a better man than I.

Or “better woman”.  But I’m not a woman.  So I guess the correct phrasing would be “You’re a better woman than I am a man.”  No, that’s not right.  “You’re a better person than I am?”  Sure, I guess that works.

In Review

To recap:

  1. Create the solution & projects
  2. Create the global shared Datasources project, and create your Data Sources (RDS‘s)
  3. Import your Reports (RDL‘s) to each requisite report-project
  4. Manually edit your rptproj to point the data-sources at the central shared path
  5. Manually edit your RDL‘s to remove previously established paths to the data-sources
  6. Configure your rptproj‘s to output to the correct folders on the report server — this should be done by default, i.e. VS has the “intelligence” to guess these for you, but just double-check.
  7. Build, Preview, and Deploy!
  8. Commit it all to source-control and let the collaboration begin.

And now to go automate all this with PowerShell… right after this beer.

Anyway.  Until next time, folks!  If I’ve made any mistakes, missteps, or otherwise offending your technical sensibilities, and/or if you know of a better way to accomplish what I’m after, I’d love to hear from you in the comments!  =)

Advertisements

TSQL Tuesday 93: Interviews

This month‘s event is hosted by the fabulous DBA/SQL-consultant (& part time cartoonist) Kendra Little! Go check out her blog, podcast, and training at sqlworkbooks.com – really great stuff.

DBA interviews are tricky.

The problem isn’t so much that the role is vaguely defined. Although, depending on the size of the IT org and the tech stack, it can vary widely from a jack-of-all (DB Dev, report writer, production ops, the works) to a highly specialized performance tuner who works with 7 other teammates, each of whom has a unique surgical specialty in the data platform. But that’s not the problem — well, not the main problem. It is a problem in the sense that the business folks, especially HR, are notoriously and astonishingly ignorant of what a DBA or related role actually involves. But you get past that once you start talking to the tech leads and IT directors.

No, the problem is that, like most higher level technical roles, you don’t really know how a candidate is going to function in it (the role) without actually seeing him or her.. IN it. Do they keep a cool head when production goes down? Do they have a solid plan of attack for the dreaded “everything is slow!” complaint-storm? Do they have a good handle on HA & DR architecture & implementation? Can you rely on them to actually practice and follow thru with those strategies? Can they be a continuous learner and keep abreast of new developments while still tempering that with wisdom & maturity, applying the correct tools to the proper problems? Do try add value to the team and organization by both teaching and learning from others?

These are truly difficult, complex questions that are nearly impossible to deeply assess and fully answer during an interview process. Largely because the only real evidence of their answers lies in actual experience. Sure, a cert here or an MVP there definitely helps your case. But at any rate, we try our best to chip away at the boulder.

aerosmith chip away at the stone record label
it was a record. from the 70s. it’s the best i could come up with. =P

Pivoting to a more positive note, I’ll share some of the better questions that I’ve experienced during my career so far.

Some good examples.

How would you design and build a data copy/sync process across/between tiered environments, say DEV-QA-PROD?

Really great question.  This is a common problem is small-to-medium enterprises with legacy systems where DevOps hasn’t quite reached down to the depths of the internal application stacks and people are still dealing with “refresh cycles” on the order of months, quarters, or even years.  You can approach it purely from a tooling perspective, but that’s not the whole picture.  Thus, it calls for some thought and team-culture ideas as well as “knowing the nerd-knobs”.

We have a complex process flow that involves a lot of stored procedures, say 50 total.  Some of these are non-sequential, meaning they can be executed in arbitrary order, while others need to be sequenced with each other in “blocks”.  This is a vendor product, so ultimately, the customer gets to decide the schedule and order of execution of this flow.  The solution needs to be maintainable by field engineers.  How would you handle this?

Woah.  Talk about diving down a rabbit-hole.  This is interesting in the sense that it exposes a bit of the architecture and some of the potential pain-points that the team is hoping to solve, while leaving enough room for improvement and experimentation by the hopeful candidate.  More to the point, it’s just an example of a more general technique, which to me is very effective: taking an architectural problem that actually comes from the “real world” (the company/team that’s interviewing) and asking for the candidate’s ideas on how to solve it.  You don’t need to get in-the-weeds super-detailed about it, but outlining your ideas helps indicate how you think about complex challenges and shows what kind of value-add you would bring to the team.

And finally, a perennial favorite:

Tell me about a time you broke production, and more importantly, how you addressed and resolved it.

So many stories from the trenches involve downtime and mistakes, it’s good to ‘bond’ over them.  It helps bring the egos back down to earth, and reminds us that we’re all just meatbags, making technology to do our bidding, occasionally to our own regret.  It shows the candidate’s “pressure cooker” mentality, or at least, what they tell you about it.

server did what you instructed it to...you don't say?
Don’t ya hate it when…

In conclusion.

If you’re a DBA, Dev, or IT pro, help your managers better understand your team’s needs when it comes to hiring.  Get involved in the job description write-ups and screening process questionnaires.  Barge your way into those ivory towers, if you have to — or you’ll regret the time you waste on candidates who really belong in a different role than the one you’re after.

If you’re a manager, PLEASE LISTEN to your reports and tech leads.  They know what makes a good team member, they’ve been doing it for a long time.  Don’t dismiss their advice or block them from being part of the hiring process — yes, they are busy, and yes, they can be crotchety, but their input is highly valuable toward bringing in effective & productive talent.

That’s all folks!

PS: I know I missed the “deadline” by about an hour..ish.  I blame DST.  Heck, it’s still Tuesday for the majority of the Western hemisphere.  I’m not biased, but I write in English, so… ya know.  Take it as you will.  Now excuse me while I go hide from the blog-police in my ASCII-bunker.

Automating SQL Installation

..while there are likely even better ways to do this in the long-run, this quick & easy approach was sufficient to save me time and effort..

At this point in my career, I’m not managing massive environments, so I don’t feel the need (nor have the expertise) to use a large scale solution like DSC or SCCM.  But I’ve had to install SQL Server a few times, so I figured it’s worth at least scripting out a standard pre-configured installation, so that A) I don’t need click through a GUI ‘wizard’ hearkening back to the ’90s, and B) the SysAdmins can “fire and forget” (read: stop bugging me about it).

keep it simple stupid
the patented one-eyebrow-raise..

The Disclaimer

Thus, I’m attempting to K.I.S.S., while making it configurable & repeatable.  There are some limitations of this approach, as alluded above.  It’s not “massively scalable” (scaleable? scale-able?) because:

  1. The PoSh script still needs to be deployed locally to the server in question
  2. The installer config (.ini) also lives locally (though it probably could be a UNC path, it’s just a file after all)
  3. The script prompts you for the service account (SQL engine, Agent) credentials and the sa password using the Read-Host -AsSecureString method cmdlet, so some meatbag still has to type those in.  This is because we don’t have an enterprise pwd/secret-management system where I could, say, ask it for a service account credential set and tell it to embed that securely in a script without it actually being visible to me.  So, while yes, they’re kept in a “vault”, it’s not query-able by anything else, so an admin still needs to copy & paste them into whatever configuration screen he’s working with at the time.  Not ideal, I know, but we work with what we’ve got.

PS:  Yeah, yeah, “don’t use sa, rename it or disable it; or use Windows Auth only!”.  Rage, howl, fire & brimstone.  I’m not going to argue about it; we can save that for another post.  This environment dictates that its used during setup and then disabled later, so that’s beyond the scope of the installer config.

So yes, while there are likely even better ways to do this in the long-run, this quick & easy approach was sufficient to save me time and effort for the occasions when a new SQL box/VM needs to be spun-up.

Useful links

  1. A primer on SQL cmd-prompt installation & its arguments
  2. A couple community articles on the subject (the latter about slipstreaming updates)
  3. A technet article & couple Q&A threads (technet, stackoverflow) that helped me figure out how to securely get & put the credentials
  4. An example for mounting an ISO in PowerShell
  5. And finally, two things that I attempted to understand but ultimately failed to implement, because (apparently, at least to me), PowerShell remote-ing is a P.I.T.A.
config.ini, to command prompt, to PowerShell
3 steps toward a better workflow

The Outline

First we need an .ini file to work with.  You could either create it from scratch, or take it from an existing SQL box’s “Setup Bootstrap” folder.  Example path C:\Program Files\Microsoft SQL Server\130\Setup Bootstrap\Log\20170801_073414\ConfigurationFile.ini​  — indicating this was an install done on 8/1/2017 at 7:34am.  Right above that, at simply C:\Program Files\Microsoft SQL Server\130\Setup Bootstrap\Log\, you’ll see a Summary.txt file, which can actually come in handy while you’re testing these unattended installs and wanting to see why it failed.

The first link above, from MSFT Docs, does a pretty nice job of telling you all the things that make up this config file.  You get to bypass the TOS prompt, enter service account details, specify drive letters (paths) for default data/log file locations & tempdb, slipstream update packages (UpdateSource​), and even more advanced stuff like AG settings and whatnot.  My example will be a simple standalone instance using the default name, so I’ll be sticking with the basics.

We can then use this file in the ConfigurationFile argument of setup.exe from the SQL Server install media.  To put a little more color on that: the .ini file is really just a collection of command-line arguments to setup.exe​; you could also list them all out in-line, but that would be tedious and silly.  Here’s a couple major selling points of creating your own config file:

  1. Slipstream updates (SP’s, CU’s), instead of having it go out to MSFT update servers (or *aghast* sticking with the original RTM bits, you heathen you!)
  2. Specify drive letters / default file locations: sure, this may be considered old-hat if you’re running super slick storage, but I still find it makes management a bit easier if I know where my MDFs, LDFs, TempDB, & backups will always be.
  3. Take advantage of 2016’s better TempDB setup options (# files, size & growth)

We will, however, keep a couple arguments out of the .ini file and instead throw them into the ArgumentList from the calling PowerShell script.  Speaking of, here’s what the PowerShell script needs to do:

  1. Prompt the operator (SysAdmin or DBA) for the SQL & Agent service account credentials, and (optionally) the sa pwd (if using it).
  2. Fetch our install media from the central network share where we store such things (server & office ISO​s, for example).
  3. Mount said ISO to our virtual disc drive.
  4. Run its setup.exe with the following arguments:
    1. The config .ini file
    2. The service & sa accounts
  5. After it’s done, un-mount (dismount) the ISO.

Then the DBA can connect to the brand-spankin’-new running SQL instance and do other post-setup configurations as desired (i.e. set max-memory, maxDOP/CTFP, etc).  And sure, those could also be done in PowerShell (thanks in no small part to the awesome team at DbaTools), I chose not to do so in this case.

As the bloggers say, “that’s left as an exercise to the reader”.

Plus, they’re never quite as deterministic as we’d like them to be — they depend on the server’s compute resources, i.e. memory size & CPU cores, as well as estimated workload & environment tier, so it’s often a gamble in “how correct” your initial settings will be anyway.  Still, anything is better than the defaults, so configure-away!

husky puppies sharing
because sharing is caring!

The Code

Here are the Gists I’ve created to go along with this post.  If I’ve made a mistake, or if you, dear reader, have a suggestion, we can incorporate them into the gist without me having to go back and edit the blog post!

Yay technology!

I’d love to get feedback on how you would improve this, what you might do differently, etc.  Drop me a comment or a tweet!

Config/INI file:

PowerShell install script:

Dates, Date-pickers, and the Devil

When a date range, or time period, is specified in SQL, it’s easiest, clearest, and most concise, to use a “greater-than-or-equal-to Period-Start, and less-than Next-Period-Start” logic. Mathematically speaking, we are defining the range as closed on the left, open on the right.

This is a bit rant-y, but… indulge me.  I’ve been writing/refactoring a lot of old reporting queries.  And, like most reports, they deal with dates and datetimes — as parameters, boundaries, or where/join predicates.  I also got way too intense with a recent SSC post (Sql Server Central), which fueled the fire even more.

fluffy-angry-puppy
I’m so cute and ANGRY!

SQL Server is very good at handling temporal datatypes and calculations against them.  We’ve got functions like dateadd, datediff, dateparts, datatypes datetime2 and datetimeoffset, datetime, etc.  It supports all sorts of format conversions if you need to display them in various ways.

..even though that should be left to the presentation layer!

Here’s the issue.  Well, there are several issues, but we only have time for a few.

Here’s the first problem

Report users don’t understand the “end of a time period” problem.  I don’t have a good name for it; others might call it the “Day plus one” problem or the “Less than date” problem.  What do I mean by this?  Well, let’s back up a bit, to DBA Commandment #6, “Thou shalt not use between with datetimes.”  In order to first understand the issue, we have to understand why this is a commandment.

When a date range, or time period, is specified in SQL, it’s easiest, clearest, and most concise, to specify it like so: @TheDate >= @StartOfPeriod and @TheDate < @StartOfNextPeriod.  Mathematically speaking, we’re defining the range as “closed on the left, open on the right”.  In other words, Min <= X < Max.

The reason we do this with datetimes is found right there in the name of the datatype — it has (or can have) a time component!

stone-tablets-with-roman-numerals-to-10
There are probably more than 10, but it’s a good starting point…

Let’s talk examples

Say you’d like to report on the month of March 2017.  How do you determine if your data-points (stored as datetime or, hopefully, datetime2) are within that period, that month?  Well sure, you could write where month(MyDateColumn) = 3 and year(myDateColumn) = 2017 

NO.  That is horrible, don’t do that.

It’s not SARGable and renders your index on that column useless.  (You do have an index on it, don’t you? No? Make one!)  Okay, let’s stick with something SARGable.  How about MyDateColumn between '20170301' and '2017-03-31T23:59:55.999'?  (You did read this post about using culture-neutral datetime literals right?)  But wait!  If your data is a datetime, it’s not actually that precise — your literal gets rounded up to 20170401 and you’re now including dates from April 1st (at midnight)!

Oh that’ll never happen… until it does.

Second problem

Many developers and report-writers assume that the values in their data will never be within the typical “1 second before midnight” or “1/300th of a second before midnight” escape window of your “3/31/2017 23:59:59.997” bounding value.  But can you guarantee that?  Didn’t think so.  Worse, if you use the .999 fraction as given in the 2nd example, you’re either “more” or “less” correct, and nobody can actually tell you which way that pendulum swings because it depends on the statistical likelihood of your data having actual literal “midnight” values vs. realistic (millisecond-y, aka “continuous”) values.  Sure, if you’re storing just a date, these things become a lot less complicated and more predictable.

But then why aren’t you storing it as an actual date, not a datetime!?

So what’s the right answer?

As I said, “greater than or equal to  ‘Start’, and less than ‘End'”, where ‘End’ is the day after the end of the period, at midnight (no later!).  Hence, MyDateColumn >= '20170301' and MyDateColumn < '20170401'.  Simple, yes?

keep calm and keep it simple
KCKS

But wait, there’s more!

I mentioned “date-pickers” in the title.  When it comes to UX, date-pickers are a sore subject, and rightly so — it’s difficult to truly “get it right”.  On a “desktop-ish” device (i.e. something with a keyboard), it may be easiest on the user to give them a simple text-box which can handle various formats and interpret them intelligently — this is what SSRS does.  But on mobile devices, you often see those “spinner” controls, which is a pain in the arse when you have to select, say, your birth date and the “Year” spinner starts at 2017.  #StopIt

I mean, I’m not that old, but spinning thru a few decades is still slower than just typing 4 digits on my keyboard — especially if your input-box is smart enough to flip my keyboard into “numeric only” mode.

Another seemingly popular date-picker UX is the “calendar control”.  Oh gawd.  It’s horrible!  Clicking thru pages and pages of months to find and click (tap?) on an itty bitty day box, only to realize “Oh crap, that was the wrong year… ok let me go back.. click, click, tap..” ad-nauseum.

stop-it-sign
#StopIt again

The point here is, use the type of date-picker that’s right for the context.  If it’s meant to be a date within a few days/weeks of today, past/future — OK, spinner or calendar is probably fine.  If it’s a birth date or something that could reasonably be several years in the past or future, just give me a damn box.  (Heck, I’ll take a series of 3 boxes, M/D/Y or Y/M/D, as long as they’re labeled and don’t break when I omit the leading-zero from a single-digit month #!)  If there’s extra pre-validation logic that “blocks out” certain dates (think bill-payer calendars or Disneyland annual-pass blackout-days), that probably needs to be a calendar too.

..just make sure it’s responsive on a mobile device.

And in all cases, pass that “ending date” to your SQL queries in a consistent, logical, sensible manner.  For reporting, where the smallest increment of a period is 1 day, that probably means automagically “adding 1 day” to their given end-date, because the end-user tends to think in those terms.  I.e. if I say “show me my bank activity from 1/1/2017 to 1/31/2017”, I really mean “through the end of the month“, i.e. the end of the day of 1/31.  So your query is going to end up wanting the end-date parameter to be 2/1/2017, because it’s using the correct & consistent “greater than or equal to start, and less than start-of-next” logic.

context-consistency-clarity
The 3 C’s

Final thoughts

I know it’s not easy to explain to business folks, and it’s not easy to implement correctly.  But it’s important.  The >= & < logic is clear, concise, and can be used consistently regardless of underlying datatype.  You just need to adjust your presentation layer (whether that’s SSRS parameters or a .NET date-picker) to convey their intent to the user, whether that’s “show/enter the last day of the month, but translate to the next day to feed to the query/proc.”, or “make them enter the next-day (day after the end of the month/period) and understand the ‘less than’ logic.”  I’m more inclined to the first, but it depends on your audience.

Thanks for reading, and happy date-ing!

An Open Letter To Management

We need to talk about this stuff – candidly, openly, broadly, deeply.

More accurately, to legacy enterprise management.

Let’s say the following directive comes down from on-high: “Hey, our CEO wants us to provide better financial metrics reports and a dashboard that management can see to show real-time stats about the company.”

*groan*

I mean… Sure!  Yay, digital transformation, modernization, mobile friendly, all that good stuff!!

So, I have some thoughts on this, because I’ve seen the current state of things in small-medium enterprise, and am anxious to help improve that state to provide better value to the business.  To misquote Dennis Miller, I don’t mean to on a rant here, but…

First topic: Reality Check

It starts at the top, with a couple realizations:

  1. Data is ever-growing.
    1. We need to get smarter about managing its growth, including archiving/retention schemes, data warehousing, etc.
    2. This involves compliance regulations and operational resources.
      1. We need to ensure compliance with biz standards and data shelf-life.
      2. We need to automate as much as possible to avoid over-burdening our human resources (and to some extent our servers too).

For example, you can’t expect the same response-time for a query into 10-year-old financial data as you do for 1-year-old data.

  1. Traditional SSRS (SQL Server Reporting Services) is an operational time-sink.
    1. We spend way too much time assigning access, creating redundant “on demand” reports, and making seldom-used email subscriptions.
    2. We’re probably running on an old version, say 2008R2
      1. Vast improvements have come to the MS Data/BI platforms in the last decade and we need to take advantage of them.
      2. It’s not mobile-friendly at all; it’s not even modern-browser friendly, as some of its UX elements are still explicitly functional in Internet Explorer
    3. We tacked-on some 3rd-party application to attempt to bring some data-warehouse functionality into the environment, but only 1 person “knows it intimately” and is comfortable developing new reports with it.
  2. Our ERP system, in its current state/version, is a tangled mess, to the eyes of a DBA & query-writer/report-writer.
    1. We’ve bolted-on so much customization and special-configuration that it’s not suitable for stock/canned reports from the vendor, even if we upgraded to a version of the app that had a decent reporting engine.
    2. We can’t even decide on very basic things like “What is a ‘unit‘ of production?”, or “What are the different areas/groupings we break-out for revenue metrics?”

Ok sure, maybe we can agree on what those groupings are, but we can’t even get a consensus on what we call them!

Second topic: Single Source of Truth

We need to agree on a standard, documented, official set of business rules that answer such questions as “how do we measure revenue?”, “what are our different sub-orgs/departments/groupings for how we report on revenue?”, “what is ‘production output’ and how do we measure it?”, “how do we calculate bonuses for this group of employees?”, etc. More than that, we need to agree on naming things – we need a common, consistent nomenclature and understanding of what it means when someone says “N# Units”, “Department X” or “Order Aging” or “Membership Level” or “Bonus Type Y”.

And even more than that, we need to map those concepts to concrete, documented rule-sets that are manifested in the data somehow (from the simplest example, a “look-up table” or “reference table”, to the complex examples like a “data mart” or “analysis cube” or “ETL process”). This concept is sometimes called a “data dictionary”, which kinda belies its complexity, because it’s really more of a “data encyclopedia” – it needs to document what, how, why, & when.

What our concepts/terms/data-points mean, how they’re used, why they’re useful, & when they should be used.

Third topic: KISS and KPI’s

Management reports need to be simple. Yes, there are power-users who want the detail, and there are auditors who in fact require the detail. But your average C-level (or even P/VP-level) exec doesn’t care about that stuff – they want very simple answers to deceivingly simple (i.e. can be very complex under-the-hood) questions, like “How much money did we make this quarter for department X?”, or “What kind of productivity bonus do I give to group Y?”. But that’s just the beginning – that’s descriptive analytics. What they really want, but are sometimes too afraid to ask, are more powerful questions, like “How much money can I expect to make in market Z or state XX?”, “What are our expected new loyalty program memberships, and how much will they profit us?” — predictive analytics.  (And we’re not even going to touch prescriptive analytics yet, because you’re not ready for that.)

KISS means we need to try our best to hide the nitty-gritty details and “under the hood” logic/calculations from the end-user or report audience. But, that means fully knowing and understanding those details and rules and logic flows so that we can implement them!

KPI is Key Performance Metric. That’s the golden nugget, the one piece of information that the manager/report-viewer ultimately is after, the thing that makes them go “Got it! That’s the answer I was looking for!”, so they can make their business-decision and move on with their day. These aren’t necessarily just single numbers (like an overall revenue figure); they can be pie-charts, bar-graphs, a clear & concise grid, or whatever makes the most sense for the business-problem/business-decision at hand.

This all sounds fantastic, right? So what’s the catch?

Fourth topic: Time & Effort

Time is money, which is resources, which is people, learning, training, developing, implementing, testing, validating… rinse, repeat.  You don’t put that all on the shoulders of a lone DBA; that life-cycle touches many different disciplines and team members – managers, business users, accounting folks, marketing people, analysts, developers, testers, operational leads, and yes, of course, all of IT infrastructure (helpdesk, engineering, DBA).  And you don’t just buy a box off the shelf at your local software retailer and say “look, we’re gonna implement Tableu!”, wave a magical IT wand, and call it day.

Now we, as technologists, are more than willing to learn and educate ourselves, but…

There needs to be a matching dedication from the business to that effort, and to the platform(s) that is/are chosen.

That means, in concrete terms, a few things:

  1. Training budget & resources
    • Conferences, courses thru online training providers, cross-team collaboration.
  2. Product & technology investment
    • Upgrades, net-new products, whatever is needed.
  3. Time allowances & agreements
    • Dedicated scheduling where the “daily grind” operations take a back-seat and we can focus on the new stuff.
  4. Support from SME’s
    • The ability to call-out to a qualified expert when critical questions or roadblocks arise.
    • Can be contractors, consultants, service-providers, or platform-providers. The point is, you only use them if you need them, so you keep the cost relatively low.

That’s if you’re dedicated to in-house team/ability build-up. If you want to outsource, you have a different set of challenges:

  1. Contractors are expensive!
    1. Their requirements are exceedingly rigid.
    2. They’re likely to scoff (yes, even outright laugh) at the quagmire of data & logic & rules that we’ve created and/or want to build into our “magical reporting stack”.
  2. They’ll still require that same product/tech investment.
    1. No contractor is going to accept your old legacy SSRS instance as a baseline for building a modern, responsive, effect reporting system. The first thing they’ll say is “upgrade that, & come back to us.”
    2. Likewise for your legacy ERP system – sure, it’s a little less obsolete, and there are probably plenty of shops running it & developing on it, but good luck getting new-hire contractors to embrace it; at best, they’ll begrudge it; at worst, they’ll charge exorbitant fees for having to work on such an old platform.
  3. Technical debt is their worst enemy.
    1. Like it or not, like most decades-old enterprises, we have technical debt up the wazoo.
    2. Contractors won’t work in a debt-heavy environment; they’ll insist you “fix the debt” and come back to them in a few months/years when it’s all happy & pretty & green.

Technical debt is our enemy, too, but at least we “own” it – i.e. we’re aware of it and we have ideas on how to fix it, if/when we ever get the time.

It’s like our city roads: at least we know where the potholes are, and how to avoid them.

Executive Summary

My point, from this rambling and probably way too lengthy post, is this: We need to talk about this stuff. Yes, Mr. Manager, I know you already said that. Let me embellish:

We need to talk about this stuff, candidly, openly, broadly, deeply, cross-functionally (made-up phrase #2), even company-wide.

Because, while the end-goal is deceptively simple (“We want report dashboards!”), the underlying systems are complex, with lots of moving parts, requiring lots of knowledge (both domain/biz and tech), and lots of management (compliance, governance, automation, visibility/monitoring).

It’s not just a technology challenge. It’s a people challenge. It’s a cultural challenge. It’s an organizational challenge.

It’s a challenge that, when faced, met, and overcome, can lead to spectacular growth and success for all involved!

(And that’s my attempt to end this rant on a positive note. Enjoy!)


PS: No, I’m not happy about WordPress’s inability to understand the ‘style’ attribute of a simple <ol> tag, but I tried… so apologies if the outlines are not intuitive because each level is just another set of numbers, instead of Word-style outlining like 1.. a.. i.. etc.  Grr arg!

Mongos, anybody?

You’ve heard of this thing called MongoDB…

Unless you’ve been living under/in the proverbial rock/hole-in-the-ground, you’ve heard of this thing called MongoDB. It’s one of the biggest flavors of NoSQL database systems out there – specifically, it’s a “document DB”, which means it stores semi-structured data in the form of JSON documents, grouped into collections, grouped into (of course) DBs.

Now, I’m a Windows guy. SQL Server & Windows Server are my comfort zones. I like GUIs, but I’m ok with a command line too. PowerShell has become my friend good acquaintance. But every once in a while, we have to step outside our comfort zones, no? So when I was told “hey, you’ve got to get a handle on managing these MongoDB instances that we’ve got running on servers so-and-so, as part of this ‘Imaging system’ application”… I thought “Hahaha…”, but what came out was “Sure thing!”  Something a bit like this:

hahahayeah

(courtesy of DBA Reactions)

So the first order of business was to decide on a MongoDB “GUI” – a Windows app that could at least connect-to and give me a visual overview of the running MongoDB instances (properly referred to as a mongod, stemming from the Linux term “daemon”, which on the Windows side is basically like a process or service). I tried both the “official” desktop app from the big org, MongoDB Compass, and a neat open-source tool called Robomongo.

And I actually like them both, for different reasons; most of which can probably be attributed to my lack of depth with the technology, but hey. Anyway, Compass is really nice in that it gives you this kind of statistical overview of the collections in your DBs, performing some basic aggregates on a 10% or 1k sample of the collection documents to give you a graphical 40-thousand-foot view of the data. But where it breaks down for me is that little “query” bar, which is just an empty pair of curly-braces. It’s only for “selecting” (finding, querying); no other operations to see here. So we can’t manipulate our data with it, but it’s definitely great for viewing!

 

Whereas with Robomongo, I can right-click on the DBs/collections and do very simple things like “show documents”, “show statistics”, etc. And it actually writes the equivalent mongo shell command for me to poke at; say, to inject more logic to the find to get something specific or to write a new command or two as I read thru the docs and learn things like aggregates, indexes, and whatnot. Being a shell, it allows us to write or update data as well as read it.

But alas, GUIs will only take you so far. This is a tech born & bred on the command-line, so to really dig into it, I needed to let go of the mouse. And the mongo shell was actually pretty straightforward! A couple visits to their docs pages & some visual verification by using the GUI tools to check my work, and things were starting to come together. And even for a complete Javascript noob like me, the command syntax was straightforward enough. Soon I was configuring a replset and mongodump/mongorestore-ing and re-syncing a replica after purging some old stale collections.

Even though it’s a CLI/Linux flavored technology, it works perfectly fine in Windows… Except for one thing.  So, you install it as a service, and you typically start & stop services using net start & net start and as long as your service’s CLI arguments are all correct, you should be good — in theory! Trouble is, the service tends not to stop gracefully. So I found that, instead, the following command was more useful: mongo admin --eval "shutdownServer()". This uses the actual mongo shell to send the native shutdown command to the mongod, instead of relying on the Windows services gymnastics to do it correctly.

It just goes to show, dear reader, that you’ve got to get your hands dirty and try out new technology before dismissing it as “not my job” or “somebody else’s problem”.

PS: Nope, that’s not Compass’s logo or anybody else’s; I made it meself, with good old Paint.NET!

Drafted with StackEdit, finished with WordPress.

No Regrets

Continuing from my previous post…

I loved my first job, no doubt. But slowly over the years, I started to feel pigeon-holed. I was always on-deck for problems that happened in the system due to what I’ll call “bad data”. If you’ve been in the industry, you know what that means – something in your datastore doesn’t “follow the rules”, or “breaks something that normally doesn’t break”, or “is just a little off-kilter”. (Otherwise known as “edge cases”, which I’ll talk about in a later post!) And often, it’s because the application allowed that data, which breaks the rules, into the system – Devs hate to hear this. But just as often, it’s because the data-guy, the DB-dev or analyst, ingested or transformed the data in a way that breaks the rules – so there’s plenty of blame to go around. And we knew that, and we respected that. We didn’t pass the blame, we just buckled down and tried to get it fixed.

But, too often, I was the one called upon to fix it, because I was the closest to the data – and yes, I knew it intimately, no doubt. Because of that, I started to feel left behind the curve. The other devs were pushing new tech stacks, learning new things, while I kept pluggin’ away at good ol’ TSQL (with the occasional smattering of C#). And yes, of course that’s partially my fault; I could have self-asserted, pushed for more learning opportunities, etc. I don’t regret my first job. I learned a ton – basically everything I know about databases, technology, and IT. But it was time for a change.

And that’s how I ended up where I am now!

Written with StackEdit.