Goldfish Bowl idea being suggested here – two facilitators in the centre of the circle and, just as in tag team wrestling, you tag in to chat.
Firstly: frustration around how data is collected, how data does not match up and cannot be consistently accessed. In this case about criminal prosecutions. Three sets of data about the same people are not matchable for statistical analysis. Been trying for years to get Scottish government to align the three data sets. Kate tags in… you started by saying that you think its about how they collect the data. Do you see a solution?
No, don’t see a solution to collecting but as a user it’s essential to have a clear path through the data – you want to find out what the effective interventions are, my particular interest is domestic abuse, it’s very hard to track cases through the system to see what does or does not work. Doesn’t need to be about individuals (and thus hitting data protection issues) but about statistically tracking processes and seeing global stats that do not infringe individual rights. Sentencing is about people, reporting is about incidents. It’s about how to get over the apples and oranges approach.
Kate’s tagged in again. Linked data lets you get over various privacy issues technically in theory but in practice it is tricky to keep the path through the data and know that we are both talking about the same thing whilst retaining privacy at the same time. Tag in from Norman: surely the issue is you have to name a person to link up that data. Kate: Stamp people at birth with a URI. Context is crucial here but perhaps you have multiple URIs for different context.
Gilly: I’m talking about tagging – just tick a box to indicate domestic abuse. Can be charged as lots of different things so you don’t neccessarily know that it is a domestic abuse case. Could then do global count of tags. Agree re: issue of individual identifyers. Is there not a way to tag this data? Norman: perhaps talking tagging takes us away from linked data. Tagging is fairly cheap. Matter of political will rather than technical things.
Tag in from… an unknown chap! Raising questions of data specs. Norman: any data set has a schema it just needs tagging up in the right way.
Tag in from Keith: modelling statistical data can be problematic – identifyers needed to follow links. Looksing at Chris’s OpenlyLocal.com site he’s created identifyers for various pieces of data including each council. Unknown chap: you can mash up and track data from these items.
Tag in from Norman: not all developers need to know about linked data to make this work. Tag in from Chris: problem with some URI’s is that they reflect other identifyers. Needs to be recognition that there is value in opening up data in a linked way. Often at the moment data is released as stand alone data sets. They are not being launched as linked sets but there is real benefit to releasing data in that linked way. Those benefits have to be conveyed to those sharing data though.
Tag in from Roy: real differences between data and personal data. IT side of sharing is easy but culture of sharing, especially in, say, social service departments, is the tough problem to fix. Norman: formats and standards matter here too. CSV is good but linked data and RDF much better.
Tag in from Keith: What is the easy end of the problem? Tag in from Janet: the cultural heritage end of public data. We have lots of fairly well structured data that could be made available. Recently created database of parks, open spaces etc. including Scotland. Deliberately tried to link to existing heritage and museum systems by using matching terms and data fields so that cross walks of metadata could take place. You could publish the way the taxonomies match up in the various databases (though not sure if this has taken place). This is documented and recorded on the relavent websites. Essentially an XHTML database – you can scrape from the web or request the data to download and use. Tag in from Paola: how do you do the crosswalk? Janet: it’s fairly straightforward. Defining our terms was the key thing. For instance historical periods are referred to in different terms depending whether you are a historian, an archeologist, an artist etc. So we have a controlled vocabulary describing each term clearly. Paola: and how is this queried? SQL? Janet: yes. And could migrate to RDF. Norman tags in: sounds like a good human user interface. How does that work for machines? Janet: has to sort of work for both. Data is in Dublin Core to do this. Norman: if I built an app do I have to scrap the XHTML? Janet: you can request what you like but you have to be approved to do this as there is some sensitive data in the database. Norman: so that’s back to the open versus closed debate. So the benefit of RDF is that we could just request the data mechanically without having to download CSV or receive approval to that sensitive data. Janet: more semantic data in cultural heritage databases in general. Whole information systems have, for years, been based on the concept of sharing with others. For instance local authorities use essentially the same data model for this sort of data. Norman: there are quite complex data structures in that local authority data? Janet: are you thinking of eGov standards brought in 8 years ago. That’s essentially based on Dublin Core. Not finely graded. Norman: are there fixed ontologies for this stuff? Janet: In theory yes, but in practice more complicated than that.
Tag in from Gavin: technical nature of this a bit beyond me but as a layman I want to know why RDF is useful, so you could show me what you could do with the data so that I can see why it’s worth making available. Norman: as a human all sites are accessible but if you are a machine they are not all readable, screen scraping is complex and tailored. For well structured sites then that’s easier but it’s still a huge amount of work. Gavin: is it about geo data and mapping information? My data interest is about provision of youth services in Edinburgh. Norman: well if you release that data you could do it as CSV but that’s tricky. Machine readable formats, so RDF, that’s much easier to understand – can mechanically interpret that data.
Tag in from Chris: You should absolutely be defining your data clearly – to understand a database can be terribly complex. You have to understand your own data and then look to helping others to understand that data and align it to others’ data for comparison and understanding. Tag in from Norman: BBC had lots of different databases on programme names, to combine it they used the wikipedia URL in the RDF – easy to query and avoid turf wars. Could do the same for, e.g. data on schools. Helps make it humanly understandable. Also other folk will unproblematically use the same URI to refer to Schools.
Chris: part of UK legislation has now gone online. Why is this important? So if you are stopped in the street you can look up that law but also discussion that is more understandable to you around that law. Actually it’s enabling individuals to understand our world and the laws that govern us. Most folk using it will have no idea what technology underpins that service but it is hugely important. URI not only refers to legislation but also to a paragraph and to changes over time. So when you pull out your phone and access an app – you can view what’s there, what has changed and how that affects you. Easy to understand and access.
Tag in from Gavin: so I have data and see the value in sharing it but how do I do that? Norman: know that there are people who can do this for you.
And with that…. we all run away for food.