In this final installment of my 3-part series on the Brennan Election Security Report, let's consider voter registration systems. We are delighted to see another organization talking about the importance and security risks associated with, voter registration systems...
Viewing entries in
Our commentary in the last segment on the Brennan Election Security Report might seem nit-picky, but the OSET Institute believes that in order to properly design a solution for a specific problem you need to be able to identify it precisely. We’re going to make a bold assertion: Many of the solutions in the Brennan Report, while undoubtedly helpful, fail to do enough to increase the security of our elections. The recommendations fall short of properly addressing the problem as we described it in Part 1. These solutions are actually incremental steps that try to improve a system that needs to be fundamentally re-invented. ...
Even philanthropic efforts to produce public benefits in the form of civic technology have real costs associated with software development. The open source model, however, means the costs are significantly less than current proprietary commercial alternatives, while the innovative benefits, unconstrained by commercial mandates, can be significantly greater. More importantly, there is some reality distortion over the real costs to building civic engagement IT, such as election administration and voting systems. They are markedly different than many other civic engagement tools that require only APIs and interactive web services leveraging government data stores to better engage and serve citizens. Tuesday's post by Ms. Voting Matters on our Voter Services Portal ignited comments and questions about the real cost to build the Voter Services Portal. The VSP is not "yet another simple web site," but a collection of software to provide services to voters that integrate with back-end legacy systems, and set the foundation to drive a series of voter service innovations as well as other election management tools in the near future. We breakdown the cost model and actual costs here...
Wow. How time flies. Its our birthday this week! (Monday the 17th to be precise; it was a Friday in November 2006.) We are 8-years old! You know, that's a long life by the measure of most commercial technology ventures. But a bit different as a non-profit technology venture. So, we wanted to post something today in honor of our birthday and the progress we've made. Please read on...
In a public speech yesterday, U.S. Attorney General Eric Holder called for universal, automatic voter registration, and stated that current technology can accomplish that, despite the fact that the current system is complex and error-prone. As Reuters reported on Holder's remarks:
By coordinating existing databases, the government could register "every eligible voter in America" and ensure that registration did not lapse during a move.
That's easy to say, but it requires some careful thought to make it easy to do. After some discussion with election officials recently, I've concluded that it is in fact easy to do in tech terms, but not in a way that you might think. To explain, let me first say that one thing that's not going to happen anytime soon is a "federal government takeover" of voter registration. VR will remain a state responsibility for the medium term, I predict.
Second, something that might happen, but would be a bad idea, is the combination of inter-state record matching and automatic registration. Why? Because we've already seen that in some states, recent practice includes automatic de-registration: if a computer's matching algorithm says that you moved from one state to another, you get un-registered in the first state. (Though not registered in the second!) Of course that's a problem if the match is incorrect -- and we've seen plenty examples of dodgey databases yielding false positive matches -- but it also can be a problem even if it is correct.
Ironically, the most recent instance of that story I've heard personally was from a Yale political science professor who specializes in election observation in other countries, and is keenly aware of voter registration issues as a bar to voting. While retaining her residence in CT, the prof did something that looked to some computer like taking up residence at another address -- result: CT's VR system de-registered her. Not the right way of doing universal, automatic, permanent.
One state election official explained the higher-level issue to me recently with two main points.
1. The current system places responsibility on the citizen to apprise the appropriate government. So when it appears that there has been a change of address, the state VR operators should reach to the voter in question to get the real story from them. That includes making it easier for voters to quickly find out their VR status and get help on what they can do next. (Which is what we're doing with online VR technology this year.)
2. When deciding what to do about a reported VR change, the responsibility is the election official's not some computer's. Technology can help suggest to an election official that a voter's record may be out of date, but that should not mean that the voter record should invalidated, either automatically or with a pro-forma confirmation by a person who has no more information than the computer did. What should the election official do instead? See point #1 above!
In other words, a simple interpretation of Holder's words about database co-ordination can lead to data-mining and matching that is error prone not just because the databases have imperfect information, but also because some of the most important information -- voter's intent -- is not in the database, for example "I did a postal address forwarding from my CT home to a DC address not because I moved but because I'm visiting for several weeks and don't want to miss my mail."
So that got me thinking about functional requirements - surprise, techie thinks about requirements not policies! - and we came up with a way to use those two principles to deliver many of the benefits of universal automatic permanent registration, without actually changing election laws and overhauling existing voter database systems. What's required is an inter-government information sharing system:
- that can notify state VR system operators about events that are possibly relevant to VR, without having to be authoritative about the event or even the person involved;
- that can enable state VR system operators to take further steps to determine whether there's been an change in voter eligibility;
- is sufficiently flexible for a wide variety and number of government organizations to participate with ease.
In addition, not required, but darned useful to residents of the 21st century, this system would be complemented by online assistance to members of the public to help them quickly and accurately respond inquiries from election officials.
The latter we are, as I have said, already working on, and well into it. But that inter-government information sharing system, what is that? It would clearly have to be not complicated, not expensive, and not requiring changes in election law or policy. Is that possible?
I think so. Stay tuned, we may be on to something.
So, we have a phrase we like to use around here borrowed from the legal academic world. Used to describe an action or conduct in analyzing a nuance in tort negligence, is the phrase "frolic and detour." I am taking a bit of detour and frolicking in an increasingly noisy element of explaining the complexity of our work here. (The detour comes from the fact that as "Development Officer" my charge is ensuring the Foundation and projects are financed, backed, supported, and succeed in adoption. The frolic is in the form of commentary below about software development methodologies although I am not currently engaged or responsible for technical development outside of my contributions in UX/UI design.) Yet, I won't attempt to deny that this post is also a bit of promotion for our stakeholders -- elections IT officials who expect us to address their needs for formal requirements, specifications, benchmarks, and certification, while embracing the agility and speed of modern development methodologies. This post was catalyzed by chit-chat at dinner last evening with an energetic technical talent who is jacked-up about the notion of elections technology being an open source infrastructure. Frankly, in 5 years we haven't met anyone who wasn't jacked-up about our cause, and their energy is typically around "damn, we can do this quick; let's git 'er done!" But it is about at this point where the discussion always seems to get a bit sideways. Let me explain.
I guess I am exposing a bit of old school here, but having had the formal training in computer systems science and engineering (years ago) I believe data modeling -- especially for database-backed enterprise apps -- is an absolute imperative priority. And the stuff of elections systems is serious technology, containing a significant degree of fault tolerance, integrity and verification assurance, and perhaps most important a sound data model. And modeling takes time and requires documentation, both of which are nearly antithetical in today's pop culture of agile development.
Bear in mind, the TTV Project embraces agile methods for UX/UI development efforts. And there are a number of components in the TTV elections technology framework that do not require extensive up-front data modeling and can be developed purely in an iterative environment.
However, we claim that data modeling is critical for certain enterprise-grade elections applications because (as many seasoned architects have observed): [a] the data itself has meaning and value outside of the app that manipulates it, and [b] scalability requires a good DB design because you cannot just add in scalability later. The data model or DB design defines the structure of the database and the relationships between the data sets; it is, in essence the foundation on which the application(s) are built. A solid DB design is essential to achieve a scalable application. Which leads to my lingering question: How do agile development shops design a database?
I've heard the "Well, we start with a story..." approach. And when I ask those who I really respect as enterprise software architects with real DB design chops, who also respect and embrace agile methodologies, they tend to express reservations about the agile mindset being boorishly applied to truly scalable, enterprise grade relational DB design that results in a well performing application, and related data integrity.
Friends, I have no intention of hating on agile principles of lightweight development methods -- they have an important role in today's application software development space and an important role here at the Foundation, but at the same time, I want to try to explain why we cannot simply just "bang out" new elections apps for ballot marking, tabulation, or ballot design and generation in a series of sprints and scrums.
First, in all candor, I fear this confusion rests in the reality that fewer and fewer developers today have had a complete computer science education, and cannot really claim to be disciplined software engineers or architects. Many (not all) have just "hacked" with, and self-taught themselves, development tools because they built a web site or implemented a digital shopping bag for a friend (much like the well intentioned developer my wife and I met last evening).
Add in the fact, the formality and discipline of compiled code has given way to the rapid prototyping benefits of interpreted code. And in the processes of this new modern training in software development (almost exclusively for the sandbox of the web browser as the UX/UI vehicle) what has been forgotten is that data modeling exists not because it creates overhead and delays, but because it removes such impediments.
Look at this another way. I like to use building analogies -- perhaps because I began my collegiate studies long ago in architectural engineering before realizing that computer graphics would replace drafting. There is a reason we spend weeks, sometimes months traveling by large holes in the ground with towers of re-bar, forms, and concrete pouring without any clue of what really will stand there once finished. And yet, later as the skyscraper takes form, the speed with which it comes together seems to accelerate almost weekly. Without that foundation carefully laid, the building cannot stand for any extended period of time, let alone bear the dynamic and static weights of its appointments, systems, and occupants. So too, is this the case with complex, highly scalable, fault tolerant enterprise software -- without the foundation of a sold data model, the application(s) will never be sustainable.
I admit that I have been out of production grade software development (i.e., in the trenches coding, compiling; link, load, dealing with lint and running in debug mode) for years, but I can still climb on the bike and turn the pedals. The fact is, data flow and data model could not be more different. The former cannot exist without the latter. It was well understood and data modeling has demonstrated many times that one cannot create a data flow out of nothing. There has to be a base model as a foundation of one or more data flows, each mapping to its application. Yet in our discussion punctuated by a really nice wine and great food, this developer seemed to want to dismiss modeling as something that can be done later... perhaps like refactoring (!?)
I am beginning to believe this fixation of modern developers with "rapid" non-data-model development is misguided, if not dangerous for its latent time shifted costs.
Recently, a colleague at another Company was involved with the development of a system where no time whatsoever was spent on data model design. Indeed, the screens started appearing in record time. The UX/UI was far from complete, but usable. And the team was cheered as having achieved great "savings" in the development process. However, when it came time to expand and extend the app with additional requirements, the developers waffled and explained they would have to recode the app in order to meet the new process requirements. The data was unchanged, but processes were evolving. The balance of the project ground to a halt in the dismissal of the first team over arguments about why requirements planning up front should have been done, and they figured out who to hire in to solve it.
I read somewhere of another development project where the work was getting done in 2 week cycles. They were about 4 cycles away from finishing when on the tracker schedule a task called "concurrency" appeared for the next to last (penultimate) cycle. The project subsequently imploded because all of the code had to be refactored (a core entity actually was determined to be two entities.) Turns out that no upfront modeling led to this sequence of events, but unbelievably, the (agile) Development Firm working on the project, spun this as a "positive outcome;" that is they explained, "Hey, its a good thing we caught this a month before go-live." Really? Why wasn't that caught before that pungent smell of freshly cut code started wafting through the lab?
Spin doctoring notwithstanding, the scary thing to me is that performance and concurrency problems caused by a failure to understand the data are being caught far too late in the Agile development process, which makes it difficult if not impossible to make real improvements. In fact, I fear that many agile developers have the misguided principle that all data models should be:
create table DATA (key INTEGER, stuff BLOB);
Actually, we shouldn't joke about this. That idea comes from a scary reality: a DBA (database architect) friend tells about a development team he is interacting with on an outsourced State I.T. project that has decided to migrate a legacy non-Oracle application to Oracle using precisely this approach. Data that had been stored as records in old ISAM type files, will be stored in Oracle as byte sequences in Blobs, with an added surrogate generated unique primary key. When he asked what's the point of that approach, no one at the development shop could give him a reasonable answer other than "in the time frame we have, it works." It begs the question: What do you call an Oracle Database where all the data in it is invisible to Oracle itself and cannot be accessed and manipulated directly using SQL? Or said differently, would you call a set of numbered binary records a "database," or just "a collection of numbered binary records?"
In another example of the challenges of agile development in a database-driven app world, a DBA colleague describes being brought in on an emergency contract basis to an Agile project under development on top of Oracle, to deal with "performance problems" in the database. Turns out the developers were using Hibernate and apparently relied on it to create their tables on an as-needed basis, simply adding a table or a column in response to incoming user requirements and not worrying about the data model until it crawled out of the code and attacked them.
This sort of approach to app development is what I am beginning to see as "hit and run." Sure, it has worked so far in the web app world of start-ups: get it up and running as fast as possible, then exit quickly and quietly before they can identify you as triggering the meltdown when scale and performance start to matter.
After chatting with this developer last evening (and listening to many others over recent months lament that we're simply moving too slowly) I am starting to think of Agile development as a methodology of "do anything rather than nothing, regardless of whether its right." And this may be to support the perception of rapid progress: "Look, we developed X components/screens/modules in the past week." Whether any of this code will stand up to production performance environments is to be determined later.
Another Agile principle is of incremental development and delivery. It's easy for a developer to strip out a piece of poorly performing code and replace it with a chunk that offers better or different capabilities. Unfortunately, you just cannot do this in a Database. For example: you cannot throw away old data in old tables and simply create new empty tables.
The TrustTheVote Project continues to need the kind of talent this person exhibited last evening at dinner. But her zeal aside (and obvious passion for the cause of open source in elections), and at the risk of running off the (Ruby) rails here, we simply cannot afford to have these problems happen with the TrustTheVote Project.
Agile methodologies will continue to have their place in our work, but we need to be guided by some emerging realities, and appreciate that for as fast as someone wants to crank out a poll book app or a ballot marking device, we cannot afford to short-cut simply for the sake of speed. Some may accuse me of being a waterfall Luddite in an agile world; however, I believe there has to be some way to mesh these things, even if it means requirements scrums, data modeling sprints, or animated data models.