Skip to content

A developer, a sysadmin, and a DBA walk into a conference room…

Programming

…and that's not a joke, it's how your design meetings should be starting.

A few days ago on Twitter someone asked a seemingly innocent question: "I'm writing a post on the failure of Stored Procedures as a platform.  What, in your view, were the reasons they didn't work out?"

A lot of reasons were given: "They're hard to test" (They're not - your unit tests should be testing your database.),  "They're not in git" (They should be - If they arent your revision control process is fucked because your database isn't controlled), "They're fundamentally unreadable and require exponentially more tacit knowledge aka are awful for new devs to understand" (They're not "fundamentally" anything, and if they're documented well any competent developer should be able to understand them), "They encourage silos where DBAs say no." (This is a people problem: Your process doesn't facilitate understanding between your DBAs and the rest of your team).
Some folks even came up with what I would argue are good reasons, like "Badly written stored procedures don't scale well" (which is true: If your stored procedures involve lots of processing overhead the DB server becomes a processing bottleneck, which is a Bad Thing), and "It's an additional moving part in the system" (generally something to avoid, unless that moving part is the simplest solution to a problem).

I was all set to have a friendly difference of opinion on this issue until I saw this blog post, which starts out great and quickly goes off the rails into the weeds and starts eating slugs with the DevOps "we don't need no stinkin' sysadmin/DBA" children.

So now you get a rant about why you still need a Developer, a Sysadmin, and a DBA.

Rant, Part The First: Why Stored Procedures are a Good Thing.

I'm a fan of the sensible use of stored procedures. I'll even give an example of how we use them at my day job:  

Our software analyzes medical tests. One of the factors in our analysis is the age of the patient. Age is naturally a derived value (Date of Test minus patient's Date of Birth gives Age When Tested), but we use the age A LOT. Doing the math every time becomes computationally expensive at scale, and even reducing the problem by creating a database view is still a significant amount of CPU Hours over a large number of tests, so our database architecture breaks from the rules of normalization: We store the derived value (age) in the tables that are queried by the things that care about it.

This is all well and good until the humans doing the data entry mistype a patient's date of birth. Now the ages are wrong everywhere. The solution is a stored procedure, triggered when a patient's date of birth is updated. The stored procedure calculates the new age (using the database's built-in date math), updates the tables where age is cached, and queues the tests to be re-analyzed in case the age change affects the patient's results.  The stored procedure all happens within the active transaction, and if my develoeprs are bad at SQL and don't start a transaction it's OK because all the changes are part of the update: If any part of the stored procedure fails the DB will reject the whole change with a meaningful exception.

Could we do all of that with application-level logic? Sure.
But now instead of the application making one logical update (changing the patient's date of birth - exactly what the user did on their screen)  and all of the data taking care of itself we now have at least five updates in the application code (changing the date of birth, updating the age on the two tables where it's cached, invalidating the old test results, and queueing the tests for re-analysis).

As a developer doing all this work in the applicationis undesirable: It means more moving parts in the application code, and it forces new developers to conceptualize beyond the immediate action. If the person we just hired doesn't know about all the places where Age is cached and forgets one they can make a mess out of our data.
Putting this work in the application layer also means that the developers are responsible for maintaining the sanity of the data, and that shouldn't be their job (or more properly the database shouldn't trust that they're doing it right, just like how referential integrity constraints won't let you insert something that references ID #12345 if that ID doesn't exist). The database should be enforcing data-level constraints.

As a DBA, having the application send more round-trip traffic to the databse is undesirable: It means a longer transaction with more back-and-forth communication - more chances for the work to get interrupted & rolled back, and more traffic clogging up the DB server's network link (this can be a Bad Thing if you're moving a lot of data and close to saturating that link). If the developers are really bad at database stuff it may mean multiple transactions where a failure leaves the system in an inconsistent state: Some ages updated while others aren't, or the old results invalidated but the tests not re-queued so the doctor never gets their results, etc.

One stored procedure and one AFTER UPDATE ON patient trigger reduces the application developer's workload ("Just update the age and the data takes care of itself"),  centralizes all the data housekeeping (the DBA will be the first perosn to know if a new table that caches age gets added, so they can update the stored procedure), and is the simplest solution to the problem.

 

Rant, Part The Second: You need a specialist!

The blog post that started me ranting started off really well, saying what I've been saying for well over a decade:

In general, the DBA role is shifting from the lower layers of the technology stack up into the higher layers, where its concerns overlap more and more with technical operations and even development.

This is great! I've long argued that DBAs should be involved in development early and often, because the "A" doesn't just stand for "Administrator" - a good DBA is also a Database Architect, someone who understands how to design a system to efficiently organize, store and retrieve data. They know when something should be indexed (and when to use different types of indexing to optimize performance), they know when you should insist on Boyce-Codd Normal Form and when you should swallow your bile and store that derived value in a table for the sake of speed, and they know what questions to ask during the design of a system to ensure that the data model completely covers the problem domain.

Can you muddle through with your application developer just throwing stuff into a data swamp? Sure. You can even go "schemaless" with NoSQL if you want. It will work fine… right up until it stops working (performance tanks, an integrity constraint you missed creates an error that costs your company real money, etc.)

When a patient develops a heart condition their family doctor doesn't slice them open and start hacking on the heart: They refer them to a specialist who understands that subsystem intimately and is qualified to work on it. The family doctor may be able to do some basic work in this area - A pill for blood pressure, an exercise regimen for overall health, etc. - but they're not experts on the heart, and they recognize that any significant work needs to be performed by - or at least done in consultation with - an expert.
The same thing is true for database work: A devloper can sketch out a data model, and if they're competent they can do a really good job of it. They can even turn that model into a SQL schema that works at small scales. Optimizing maintaining, and scaling that design is the role of a specialist who intimately understands databases and data management: The DBA.

Your DBA will tell you when it makes sense to have read-only slaves, or to implement a multi-master architecture. They will help you design a process to back up your data for disaster recovery, and to have redundancy in the face of inevitable hardware (or cloud provider) failures. They'll work with you to ensure the security and integrity of your data.

Rant, Part The Third: You know nothing, young DevOps monkey…

I'm not a DBA - at least not a real one. I understand data modeling, relational design, indexing, and the like well enough to fill the role in my current company, but when we scale up I'll be turning that role over to someone better qualified. A more senior specialist.

So why do I care what people think of DBAs? Because before the industry turned on the DBA they turned on the sysadmin and a piss-poor understanding of "DevOps Culture" that had developers taking control of operations and trying to run systems they didn't understand.
This leads to Bad Things!

What will happen to companies that pretend they don't need data/databse specialists? I predict Bad Things!

Companies, and in particular the developers working at today's new startups, need to remember that developers are not operations people.
Developers and Operations staff should work closely together, but their goals are different: Developers want to create something awesome, and Ops wants to make sure this awesome thing is stable, secure, and scalable for the users. They focus on separate aspects of a common problem.

Developers are similarly not DBAs. Good DBAs dance on the line between development and operations: comfortable working in both worlds, but not really a part of either.
Much like the how Operations folks should be involved in designing the final platform Database folks should also be involved in the  design process. Ultimately they'll write code that is a component of your application: Views, triggers, and stored procedures are all logic that supports the application. They'll also work with the operations staff to ensure that the equipment (or cloud service) running your database meets all the necessary performance requirements, and they will monitor the performance and stability of the database in production.

It is not possible for one person to be an expert in every domain, and DevOps should not be about trying to shove exprtise aside in the name of expediency. Instead it should be about bringing the domain experts together so the right solution can be designed, developed, and delivered in a timely manner rather than building something half-assed ahd having to fix it later when its flaws become apparent. 

  • Developers know what can be done at the application level, and who further subdivide (UI/UX experts, processing experts, etc.)
  • Operations know how the supporting infrastructure will behave, and can guide the developers in ensuring that their design will scale and tolerate failures
  • Data Specialists (DB Admins / DB Architects) know how to select the right database tools for the job and will guide developers in making the best use of them

Oh yeah, you need one more expert in the room: The User who knows what the final product actually has to do.

If these groups all communicate effectively good software is the natural result. If they don't, or if one group anoints themselves god-king of the product without appropriate respect for the expertise of the others, there is nothing but pain and misery at the end of the journey.

Trackbacks

No Trackbacks

Comments

Display comments as Linear | Threaded

No comments

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
To leave a comment you must approve it via e-mail, which will be sent to your address after submission.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

Twitter, Pavatar, Gravatar author images supported.
Form options