Today, members of the Core Team are in Vail, Colorado at the IACREOT Conference to unveil the next phase of VoteStream, the elections results and reporting subsystem of our Open Source Election Technology Framework. This is an awesome day, and we owe a great deal of thanks to the Knight Foundation for continuing to support this important part of the Framework.
Viewing entries in
Elections data standards are essential to delivering real innovation. The annual Election Data Standards meeting opened today in Los Angeles, CA. We thought we'd give you an overview of just what in the hec this is about and why its essential to creating a voting experience that's easy, convenient, and dare we say delightful. Dry? Kinda. But a peek at the real in the trenches work we're doing. Yep.
OSET Foundation Board Member Chris Kelly, a Silicon Valley venture investor and philanthropist, and former Facebook exec, pens an op-ed for TechCrunch on the Election Day 2014.
BusyBooth, an app being developed by the TrustTheVote Project, is the public-service, polling-place app voters have been waiting for.
The TrustTheVote Project Core Team has been hard at work on the Alpha version ofVoteStream, our election results reporting technology. They recently wrapped up a prototype phase funded by the Knight Foundation, and then forged ahead a bit, to incorporate data from additional counties, provided by by participating state or local election officials after the official wrap-up.
Along the way, there have been a series of postings here that together tell a story about the VoteStream prototype project. They start with a basic description of the project in Towards Standardized Election Results Data Reporting and Election Results Reload: the Time is Right. Then there was a series of posts about the project’s assumptions about data, about software (part one and part two), and about standards and converters (part one and part two).
Of course, the information wouldn’t be complete without a description of the open-source software prototype itself, provided Not Just Election Night: VoteStream.
Actually the project was as much about data, standards, and tools, as software. On the data front, there is a general introduction to a major part of the project’s work in “data wrangling” in VoteStream: Data-Wrangling of Election Results Data. After that were more posts on data wrangling, quite deep in the data-head shed — but still important, because each one is about the work required to take real election data and real election result data from disparate counties across the country, and fit into a common data format and common online user experience. The deep data-heads can find quite a bit of detail in three postings about data wrangling, in Ramsey County MN, in Travis County TX, and in Los Angeles CountyCA.
Today, there is a VoteStream project web site with VoteStream itself and the latest set of multi-county election results, but also with some additional explanatory material, including the election results data for each of these counties. Of course, you can get that from the VoteStream API or data feed, but there may be some interest in the actual source data. For more on those developments, stay tuned!
If you've read some of the ongoing thread about our VoteStream effort, it's been a lot about data and standards. Today is more of the same, but first with a nod that the software development is going fine, as well. We've come up with a preliminary data model, gotten real results data from Ramsey County, Minnesota, and developed most of the key features in the VoteStream prototype, using the TrustTheVote Project's Election Results Reporting Platform. I'll have plenty to say about the data-wrangling as we move through several different counties' data. But today I want to focus on a key structuring principle that works both for data and for the work that real local election officials (LEOS) do, before an election, during election night, and thereafter.
Put simply, the basic structuring principle is that the election definition comes first, and the election results come later and refer to the election definition. This principle matches the work that LEOs do, using their election management system to define each contest in an upcoming election, define each candidate, and do on. The result of that work is a data set that both serves as an election definition, and also provides the context for the election by defining the jurisdiction in which the election will be held. The jurisdiction is typically a set of electoral districts (e.g. a congressional district, or a city council seat), and a county divided into precincts, each of which votes on a specific set of contests in the election.
Our shorthand term for this dataset is JEDI (jurisdiction election data interchange), which is all the data about an election that an independent system would need to know. Most current voting system products have an Election Management System (EMS) product that can produce a JEDI in a proprietary format, for use in reporting, or ballot counting devices. Several states and localities have already adopted the VIP standard for publishing a similar set of information.
We've adopted the VIP format as the standard that that we'll be using on the TrustTheVote Project. And we're developing a few modest extensions to it, that are needed to represent a full JEDI that meets the needs of VoteStream, or really any system that consumes and displays election results. All extensions are optional and backwards compatible, and we'll be submitting them as suggestions, when we think we got a full set. So far, it's pretty basic: the inclusion of geographic data that describes a precinct's boundaries; a use of existing meta-data to note whether a district is a federal, state, or local district.
So far, this is working well, and we expect to be able to construct a VIP-standard JEDI for each county in our VoteStream project, based on the extant source data that we have. The next step, which may be a bit more hairy, is a similar standard for election results with the detailed information that we want to present via VoteStream.
PS: If you want to look at a small artificial JEDI, it's right here: Arden County, a fictional county that has just 3 precincts, about a dozen districts, and Nov/2012 election. It's short enough that you can page through it and get a feel for what kinds of data are required.
Last time, I explained how our VoteStream work depends on the 3rd of 3 assumptions: loosely, that there might be a good way to get election results data (and other related data) out of their current hiding places, and into some useful software, connected by an election data standard that encompasses results data. But what are we actually doing about it? Answer: we are building prototypes of that connection, and the lynchpin is an election data standard that can express everything about the information that VoteStream needs. We've found that the VIP format is an existing, widely adopted standard that provides a good starting point. More details on that later, but for now the key words are "converters" and "connectors". We're developing technology that proves the concept that anyone with basic data modeling and software development skills can create a connector, or data converter, that transforms election data (including but most certainly not limited to vote counts) from one of a variety of existing formats, to the format of the election data standard.
And this is the central concept to prove -- because as we've been saying in various ways for some time, the data exists but is locked up in a variety of legacy and/or proprietary formats. These existing formats differ from one another quite a bit, and contain varying amounts of information beyond basic vote counts. There is good reason to be skeptical, to suppose that is a hard problem to take these different shapes and sizes of square data pegs (and pentagonal, octahedral, and many other shaped pegs!) and put them in a single round hole.
But what we're learning -- and the jury is still out, promising as our experience is so far -- that all these existing data sets have basically similar elements, that correspond to a single standard, and that it's not hard to develop prototype software that uses those correspondence to convert to a single format. We'll get a better understanding of the tricky bits, as we go along making 3 or 4 prototype converters.
Much of this feasibility rests on a structuring principle that we've adopted, which runs parallel to the existing data standard that we've adopted. Much more on that principle, the standard, its evolution, and so on … yet to come. As we get more experience with data-wrangling and converter-creation, there will certainly be a lot more to say.
It's time to finish -- in two parts -- the long-ish explanation of the assumptions behind our current "VoteStream" prototype stage of the TrustTheVote Project's Election Result Reporting Platform (ENRS) project. As I said before, it is an exercise in validating some key assumptions, and discovering their limits. Previously, I've described our assumptions about election results data, and the software that can present it. Today, I'll explain the 3rd of three basic assumptions, which in a nutshell is this:
- If the data has the characteristics that we assumed, and
- if the software (to present that data) is as feasible and useful as we assumed;
- then there is a method for getting the data from its source to the reporting software, and
- that method is practical for real-world elections organization, scalable, and feasible to be adopted widely.
So, where are we today? Well, as previous postings have described, we made a good start on validating the first 2 assumptions during the previous design phase. And since starting this prototype phase, we've improved the designs and put them into action. So far so good: the data is richer than we assumed; the software is actually significantly more flexible than before, and effectively presents the data. We're pretty confident that our assumptions were valid on those two points.
But where did the 2012 election results data come from, and how did it get into the ENRS prototype? Invented elections, or small transcribed subsets of real results, were fine for design; but in this phase it needs to be real data, complete data, from real election officials, used in a regular and repeated way. That's the kind of connection between data source and ENRS software that we've been assuming.
Having stated this third of three assumptions, the next point is about what we're doing to prove that assumption, and assess it limits. That will be part two of two, of this last segment of my account of our assumptions and progress to date.
A rose by any other name would smell as sweet, but if you want people to understand what a software package does, it needs a good name. In our Election Night Reporting System project, we've learned that it's not just about election night, and it's not just about reporting either. Even before election night, a system can convey a great deal of information about an upcoming election and the places and people that will be voting in it. To take a simple example: we've learned that in some jurisdictions, a wealth of voter registration information is available and ready to be reported alongside election results data that will start streaming in on election night from precincts and counties all over.
It's not a "system" either. The technology that we've been building can be used to build a variety of useful systems. It's better perhaps to think of it as a platform for "Election Result Reporting" systems of various kinds. Perhaps the simplest and most useful system to build on this platform is a system that election officials can load with data in a standard format, and which then publishes the aggregated data as an "election results and participation data feed". No web pages, no API, just a data feed, like the widely used (in election land) data feed technique using the Voting Information Project and their data format.
In fact, one of the recent lessons learned, is that the VIP data standard is a really good candidate for an election data standard as well, including:
- election definitions (it is that already),
- election results that reference an election definition (needs a little work to get there), and
- election participation data (a modest extension to election results).
As a result (no pun intended) we're starting work on defining requirements for how to use VIP format in our prototype of the "Election Results Reporting Platform" (ERRP).
But the prototype needs to be a lot more than the ERRP software packaged in to a data feed. It needs to also provide a web services API to the data, and it needs to have a web user interface for ordinary people to use. So we've decided to give the prototype a better name, which for now is "VoteStream".
Our VoteStream prototype shows how ERRP technology can be packaged to create a system that's operated by local election officials (LEOs) to publish election results -- including but not limited to publishing unofficial results data on election night, as the precincts report in. Then, later, the LEOs can expand the data beyond vote counts that say who won or lost. That timely access on election night is important, but just as important is the additional information that can be added during the work in which the total story on election results is put together -- and even more added data after the completion of that "canvass" process.
That's VoteStream. Some other simpler ERRP-based system might be different: perhaps VoteFeed, operated by a state elections organization to collate LEO's data and publish to data hounds, but not to the general public and their browsers. Who knows? We don't, not yet anyhow. We're building the platform (ERRP), and building a prototype (VoteStream) of an LEO-oriented system on the platform.
The obvious next question is: what is all that additional data beyond the winner/loser numbers on election night? We're still learning the answers to that question, and will share more as we go along.
Today, I'll be concluding my description of one area of assumptions in our Election Night Reporting System project -- our assumptions about software. In my last post, I said that our assumptions about software were based on two things: our assumptions about election results data (which I described previously), and the results of the previous, design-centric phase of our ENRS work. Those results consist of two seemingly disparate parts:
- the UX design itself, that enables people to ask ENRS questions, and
- a web service interface definition, that enable to software to ask ENRS questions.
In case (1), the answer is web pages delivered by a web app. In case (2) the answers are data delivered via an application programming interface (API).
Exhibit A is our ENRS design website http://design.enrs.trustthevote.org which shows a preliminary UX design for a map-based visualization and navigation of the election results data for the November 2010 election in Travis County, Texas. The basic idea was to present a modest but useful variety way to slice and dice the data, that would be meaningful to ordinary voters and observers of elections. The options include slicing the data at the county level, or the individual precinct level, or in-between, and to filter by one of various different kinds of election results or contests or referenda. Though preliminary, the UX design well received, and it's the basis for current work to do a more complete UX that also provides features for power users (data-heads) without impacting the view of ordinary observers.
Exhibit B is the application programming interface (API), or for now just one example of it:
That does not look like a very exciting web page (click it now if you don't believe me!), and a full answer of "what's an API" can wait for another day.
But the point here is that the URL is a way for software to request a very specific slice through a large set of data, and get it in a software-centric digestable way. The URL (which you can see above in the address bar) is the question, and the answer is what you above as the page view. Now, imagine something like your favorite NBA or NFL scoreboard app for your phone, periodically getting updates on how your favorite candidate is doing, and alerting you in a similar way that you get alerts about your favorite sports team. That app asks questions of ENRS, and gets answers, in exactly the way you see above, but of course it is all "under the hood" of the app's user interface.
So, finally, we can re-state the software assumption of our ENRS project:
- if one can get sufficiently rich election data, unlocked from the source, in a standard format,
- then one can feasibly develop a lightweight modern cloud-oriented web app, including a web service, that enables election officials to both:
- help ordinary people understand complex election results data, and
- help independent software navigate that data, and present it to the public in many ways, far beyond the responsibilities of election officials.
We're trying to prove that assumption, by developing the software -- in our usual open source methodology of course -- in a way that (we hope) provides a model for any tech organization to similarly leverage the same data formats and APIs.
Today I'm continuing with the second of a 3-part series about what we at the TrustTheVote Project are hoping to prove in our Election Night Reporting System project. As I wrote earlier, we have assumptions in three areas, one of which is software. I'll try to put into a nutshell a question that we're working on an answer to:
If you were able to get the raw election results data available in a wonderful format, what types of useful Apps and services could you develop?
OK, that was not exactly the shortest question, and in order to understand what "wonderful format" means, you'd have to read my previous post on Assumptions About Data. But instead, maybe you'd like to take a minute to look at some of the work from our previous phase of ENRS work, where we focused on two seemingly unrelated aspects of ENRS technology:
- The user experience (UX) of a Web application that local election officials could provide to help ordinary folks visualize and navigate complex election results information.
- A web services API that would enable other folk's systems (not elections officials) to receive and use the data in a manner that's sufficiently flexible for a variety other services ranging from professional data mining to handy mobile apps.
They're related because the end results embodied a set of assumptions about available data.
Now we're seeing that this type of data is available, and we're trying to prove with software prototyping that many people (not just elections organizations, and not just the TrustTheVote Project) could do cool things with that data.
There's a bit more to say -- or rather, to show and tell -- that should fit in one post, so I'll conclude next time.
PS: Oh there is one more small thing: we've had a bit of an "Ah-ha" here in the Core Team, prodded by our peeps on the Project Outreach team. This data and the apps and services that can leverage that data for all kinds of purposes has use far beyond the night of an election. And we mentioned that once before, but the ah-ha is that what we're working on is not just about election night results... its about all kinds of election results reporting, any time, any where. And that means ENRS is really not that good of a code name or acronym. Watch as "ENRS" morphs into "E2RP" for our internal project name -- Election Results Reporting Platform.
In a previous post I said that our ENRS project is basically an effort to investigate a set of assumptions about how the reporting of election results can be transformed with innovations right at the source -- in the hands of the local election officials who manage the elections that create the data. One of those assumptions is that we -- and I am talking about election technologists in a broad community, not only the TrustTheVote Project -- can make election data standards that are important in five ways:
- Flexible to encompass data coming from a variety of elections organizations nationwide.
- Structured to accommodate the raw source data from a variety of legacy and/or proprietary systems, feasibly translated or converted into a standard, common data format.
- Able to simply express the most basic results data: how many votes each candidate received.
- Able to express more than just winners and losers data, but nearly all of the relevant information that election officials currently have but don't widely publish (i.e., data on participation and performance).
- Flexible to express detailed breakdowns of raw data, into precinct-level data views, including all the relevant information beyond winners and losers.
Hmm. It took a bunch of words to spell that out, and for everyone but election geeks it may look daunting. To simplify, here are three important things we're doing to prove out those assumptions to some extent.
- We're collecting real election results data from a single election (November, 2012) from a number of different jurisdictions across the country, together with supporting information about election jurisdictions' structure, geospatial data, registration, participation, and more.
- We're learning about the underlying structure of this data in its native form, by collaborating with the local elections organizations that know it best.
- We're normalizing the data, rendering it in a standard data format, and using software to crunch that data, in order to present it in a digestible way to regular folks who aren't "data geeks."
And all of that comprises one set of assumptions we're working on; that is, we're assuming all of these activities are feasible and can bear fruit in an exploratory project. Steady as she goes; so far, so good.
In my last post, I said that the time is right for breaking the logjam in election results reporting, enabling a big reload on technology for reporting, and big increase in public transparency. Now, let me explain why, starting with the biggest of several reasons. Elections data standards are needed to define common data formats into which a variety of results data can converted.
Those standards are emerging now, and previously the lack of them was a real problem.
- We can't reasonably expect a local elections office to take additional efforts to publish the data, or otherwise serve the public with election results services, if the result will be just one voice in a Babel of dozens of different data languages and dialects.
- We can't reasonably expect a 3rd party organization to make use of the data from many sources, unless it's available in a single standard format, or they have the wherewithal to do huge amounts of work on data conversion, repeatedly.
The good news is that election data standards have come along way in the last couple of years, due to:
- Significant support from a the U.S. Governments standards body -- the National Institute of Standards and Technology (NIST);
- Sustained effort from the volunteers working in standards committees in the international standards body -- the IEEE 1622 Working Group; and
- Practical experience with evolving de facto standards, particularly with the data formats and services of the Pew Voting Information Project (VIP), and the several elections organizations that participate in providing VIP data.
There are other reasons why the time is right, but they are more widely understood:
- We now have technologies that perennially understaffed and underfunded elections organization can feasibly adopt quickly and cheaply including powerful web application frameworks, supported by cloud hosting operations, within a growing ecosystem of web services that enable many organizations to access a variety of data and apps.
- "Open government," "open data," and even "big data" are buzz phrases now commonly understood, which describe a powerful and maturing set of technologies and IT practices. This new language of government IT innovation facilitates actionable conversations about the opportunity to provide the public with far more robust information on elections and their participation and performance.
It's a "promised land" of government IT and the so-called Gov 2.0 movement (arguably we think more like Gov 3.0 when you think about it in terms of 2.0 was all about collaboration and 3.0 is becoming all about the "utility web"--real apps available on demand -- a direction some of these services will inevitably take). However, for election technology in the near term, we first have to cross the river by learning how to "get the data out" (and that is more like Gov 2.0) More next time on our assumptions about how that river can be crossed, and our experiences to date on doing that crossing.
Now that we are a ways into our "Election Night Reporting System" project, we want to start sharing some of what we are learning. We had talked about a dedicated Wiki or some such, but our time was better spent digging into the assignment graciously supported by the Knight Foundation Prototype Fund. Perhaps the best place to start is a summary of what we've been saying within the ENRS team, about what we're trying to accomplish. First, we're toying with this silly internal project code name, "ENRS" and we don't expect it to hang around forever. Our biggest grip is that what we're trying to do extends way beyond the night of elections, but more about that later.
Our ENRS project is based on a few assumptions, or perhaps one could say some hypotheses that we hope to prove. "Prove" is probably a strong word. It might better to say that we expect that our assumptions will be valid, but with practical limitations that we'll discover.
The assumptions are fundamentally about three related topics:
- The nature and detail of election results data;
- The types of software and services that one could build to leverage that data for public transparency; and
- Perhaps most critically, the ability for data and software to interact in a standard way that could be adopted broadly.
As we go along in the project, we hope to say more about the assumptions in each of these areas.
But it is the goal of feasible broad adoption of standards that is really the most important part. There's a huge amount of latent value (in terms of transparency and accountability) to be had from aggregating and analyzing a huge amount of election results data. But most of that data is effectively locked up, at present, in thousands of little lockboxes of proprietary and/or legacy data formats.
It's not as though most local election officials -- the folks who are the source of election results data, as they conduct elections and the process of tallying ballots -- want to keep the data locked up, nor to impede others' activities in aggregating results data across counties and states, and analyzing it. Rather, most local election officials just don't have the means to "get the data out" in way that supports such activities.
We believe that the time is right to create the technology to do just that, and enable election officials to use the technology quickly and easily. And this prototype phase of ENRS is the beginning.
Lastly, we have many people to thank, starting with Chris Barr and the Knight Foundation for its grant to support this prototype project. Further, the current work is based on a previous design phase. Our thanks to our interactive design team led by DDO, and the Travis County, TX Elections Team who provided valuable input and feedback during that earlier phase of work, without which the current project wouldn't be possible.