Some Proposed Goals
- Design a common non-proprietary data specification that is inclusive enough to allow data interchange.
- it would be highly useful to have a very lightweight and simple data format that it would be easy for lots of systems to output data in that format (or for data in other formats to be easily transformed into this format). Separate from considerations of what application(s) will be used to analyze that data or even what assumptions about the network as a whole will be applied to the analysis. Ideally it should also be accommodating of a wide range of theoretical positions and questions even about seemingly "fundamental" aspects (such as perhaps flexibility about what is/is not a node for analytical perspectives).
- multiple (dozens ... hundreds) node attribute variables
- multiple (dozens ... hundreds) link attribute variables
- extremely multiplex links
- missing data
- time data
- value labels
- really big (*huge* <-- PLEASE DEFINE) networks
- discrete or continuous transformations of single variables
- egocentric or complete networks
- inclusion of useful kitchen apliences
- can be translated / represented in relational db
- has a well defined process of extension to meet future
needs so it doesn't become immediately obsolete.
- a clear group or team responsible for maintaining and updating specification
- Easy for a parser to ignore information it doesn't know how to read
Data Version Control
- Version control and merging datasets needs to be key to the effort. IF researchers A and B use data from public sources, say the USPTO database, and then repair the bad data, they essentially have 4 data sets (at least). The easier we can make comparing, merging and rolling back the more we will be able to build on each other's work, something that could be vastly improved.