HMLR Inspire Polygon Index Map Comments on
Transcription
HMLR Inspire Polygon Index Map Comments on
HMLR Inspire Polygon Index Map Comments on the Ordnance Survey PowerPoint presentation to OPSI presented to demonstrate high levels of coincidence with OS MasterMap Introduction In the course of investigating a complaint by 77M OPSI was presented with a set of 21 MS PowerPoint slides (attached) which I understand were shown and explained in Ordnance Survey’s evidence to demonstrate that there is a high level of coincidence between HMLR Inspire polygons and features on OS MasterMap. I have been asked to comment on Ordnance Survey’s methodology and the apparent conclusion that the level of coincidence between the HMLR data and OS’s product is high. I also understand that OS introduced the additional assertion that because the level of coincidence between the HMLR data and their own MasterMap product is high, this exposes Ordnance survey to the commercial risk of potential customers choosing to use the HMLR data as a substitute for the Ordnance Survey data from which they were derived. As I was not present at OS’s PowerPoint presentation and have seen no documentation to explain how their sample areas were chosen all comments in this note relate only to what can be concluded from the PowerPoint presentation and background knowledge. HMLR use of Ordnance Survey data There is no dispute over whether HMLR data is informed by Ordnance Survey mapping. The Land Transfer Act of 1897 introduced the requirement that the Ordnance Survey map would become the basis of registered mapping (Andrew Edwards, 2001, Quinquennial Review of HM Land Registry). What is in contention is whether, and how much of, Ordnance Survey’s Intellectual Property (IP) is embedded in the HMLR Index map Polygons (in particular the Inspire edition) and what should be the fair royalty due to Ordnance Survey for the use of that IP by third parties. Coincidence The Ordnance Survey presentation implies that the “coincidence” with the HMLR map is demonstrated because it is possible to duplicate a large proportion of the HMLR polygons by following lines which appear in Ordnance Survey’s MasterMap product. Indeed it would be very surprising if this was not the case because HMLR have used OS paper maps, the OS digital LandLine product and most recently OS MasterMap as the base on which to record their mapping since 1897 because they are legally obliged to do so. Had HMLR surveyors independently surveyed property boundaries on the ground, unnecessary because of the availability of Ordnance Survey mapping. Or, if independent analysts had inferred parcel boundaries from aerial photography, it would be the case that there would be a high level of “coincidence” between the three versions because the same real-world features are being recorded and, culturally most property boundaries are marked by hedges, fences, walls, or are implicit in physical aspects of building outlines. So coincidence between anyone recording real-world features independently and the recording of those features on Ordnance Survey maps is entirely unsurprising. The argument about coincidence is, in fact, an argument about intellectual property – discussed more fully below, and not about the recording of the facts that are independently observable about the real world. If an independent expert was asked the question how much “coincidence” is there between the HMLR Inspire polygons and Ordnance Survey MasterMap they are likely to have developed a quite different methodology and set of metrics to those offered by Ordnance Survey. Ordnance Survey appear to be attempting to calculate what proportion of the HMLR line work exists on OS MasterMap. In order to do so they appear to have chosen sample areas specifically selected to maximise that ratio, rather than using any objective sampling methodology. If one was cynical one may believe that the exercise was carried out intentionally in a way to make the case that because most HMLR line-work coincides with lines that appear on OS MasterMap, it is, in effect largely a simple sub-set of OS MasterMap and, in those circumstances OS has the right to dictate the terms on which that sub-set is passed to third parties. An alternative, I would suggest, more objective view, is that a different question should be asked which is not what proportion of HMLR line features “coincide” with Ordnance Survey lines, but what proportion of the lines that appear on OS MasterMap also appear in the HMLR product. Because of the level of detail and the multiplicity of features on MasterMap that proportion is very small indeed. If one was to take that far more comprehensive view of the coincidence question then methodologies which are quite different to that adopted and demonstrated by OS might be adopted. Coincidence at the atomic level OS MasterMap, as with any geographic data set is structured hierarchically. At the atomic level the data comprises a large collection of coordinate pairs (Easting and Northing, or x,y). These are joined in pairs as line segments, line segments are chained into lines, lines are connected to form polygons. The OS demonstration is based on the comparison of polygons. OS have constructed BLPU (Basic Land and Property Unit) polygons to compare with HMLR’s land parcels. The BLPU is a compound feature usually comprising the boundary fence lines and, in the case of semi-detached or terraced housing, lines representing party walls within the buildings. The BLPU is an addressable unit which is defined in BS7666, the BSI Addressing standard. In the past BLPUs, when defined, were captured from OS mapping by local authorities as part of their Local Land and Property Gazetteer (LLPG) custodian role. When local authorities captured BLPUs as a spatial and visual representation of addresses in their LLPG they encountered similar derived data issues now faced by HMLR. Some authorities resorted to capturing BLPUs from aerial photography to avoid that risk. The OS PowerPoint implication implies that the polygons they show exist as discrete polygons in OS MasterMap, in fact they do not. The BLPU has not, historically, been a MasterMap feature and BLPUs cannot have been copied from MasterMap by HMLR. By constructing BLPUs Ordnance Survey have, in effect, replicated HMLR’s normal work practice, which is informed by HMLR’s independently gathered intelligence on where the property boundaries actually lie. In those circumstances, and given that highly structured areas of simple property boundaries were chosen, it is not at all surprising that Ordnance Survey demonstrated high levels of “coincidence” Sampling A more objective approach to measuring real coincidence would be to sample different environments. An approach used by Ordnance Survey before, involved selecting a stratified random sample of kilometre grid squares in areas with different densities of properties and different property types. The stratification was done by using the ONS Output Area Classification and ensuring that each of types of area in the classification was represented in the sample in proportion to the number of such areas in the country. A range of similar sampling strategies could be devised and each would be fairer than the choice of areas made by Ordnance Survey for their presentation. Points If the HMLR line-work had been directly copied from OS data, each coordinate pair (x,y) in HMLR’s data would also appear in the OS data. This could be easily checked for the sample areas. It could then be argued that the true coincidence as a percentage, at point level, between the HMLR data and the OS data was: 𝑛 × 100 𝑚 Where n is the number of points that make up the HMLR polygons and m the number of points in the equivalent area of MasterMap. In practice it is extremely unlikely that any HMLR points actually coincide with MasterMap coordinates perfectly because they were not copied from the underlying data, but re-traced with error. In this case “coincidence” should be treated as points in the HMLR data sufficiently close to an OS point, bearing in mind digitising error, or intentional displacement, to be treated as “most probably” derived from and informed by the location of an OS point. Such a margin of error would inflate n, but, given what a tiny subset of all features, and their constituent points are re-used by HMLR the level of coincidence would remain small. Lines It may be argued that, though atomic, points are too abstract a geometric feature to be analysed. However precisely the same logic could be used to compare line features in HMLR’s product with those in MasterMap. A similar calculation could be made by taking line segments (the straight lines joining pairs of points) in HMLR’s product, placing a “bounding box“ around those line segments based on the same average error used for points and measuring the length of Line Segments from the equivalent area of OS MasterMap that fall within the HMLR bounding boxes as a proportion of the total line length of all the line segments in the sample area. While relatively easy to compute algorithmically this exercise, again would give a result which was small. Areas OS MasterMap is structured as a set of non-overlapping areas represented by polygons made up of line segments, in turn made up of points. As pointed out above, the BLPU is made up from a set of adjacent polygons not, usually an individual polygon, because Land Parcels have not, historically, been a MasterMap feature type. However it would be possible to carry out an analysis where every HMLR polygon was taken and its area compared to the sum of the areas of all OS MasterMap polygons fully contained within it, or within the HMLR Polygon bounded by an error buffer. HMLR polygons which could be made up entirely of OS polygons would be deemed coincident, those whose areas were significantly different to the best fit of OS polygons overlapping them noncoincident. Again this is an exercise that could be carried out algorithmically and most closely corresponds to what OS appear to have done by hand. Other examples Appendix 1 shows HMLR Inspire polygons superimposed on third party mapping and aerial imagery. These do not purport to be an objective sample, however they are probably as objective as the demonstration areas chosen by Ordnance Survey. The three examples show the area around Manchester University, including a significant estate of local authority and former local authority housing in the East, and two outer sub-urban areas part of Lymm in Cheshire and part of New Mills in Derbyshire. All three were chosen because the author has either worked or lived in the locality. They were not chosen to show either high or low levels of potential coincidence between HMLR data and OS data. What they do demonstrate when examined closely is how varied, spatially inconsistent and incomplete the HMLR data is compared to the detailed topographic mapping in OS’s examples chosen to demonstrate coincidence. An objective exercise The issue of coincidence between the HMLR Inspire Polygon index map, which is an indicative representation of registered property ownership and OS MasterMap data could only be resolved objectively by defining and agreeing a methodology similar to those suggested in the “thought experiment” above. What is certain is that the OS PowerPoint presentation (attached) presented to OPSI as evidence of coincidence is weak, inconclusive and partial, apparently designed to prove the point that Ordnance Survey were asserting. It should not be considered to be an objective assessment of the coincidence between the two products. Intellectual Property This argument is not one about the coincidence of points, lines or areas, because it is unlikely that any reasonable lay person, or juror, would conclude that there is either a high level of coincidence or a significant risk of substitution of HMLR data for OS MasterMap in its entirety. The argument being pursued using the “evidence” supplied by OS is one of the legitimacy of their business model whereby, by the application of the IPR protection provided by the European Database Directive (Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases) Ordnance Survey may extract royalties up to the total value of MasterMap licences from any organization that has captured data which has in any way been informed by Ordnance Survey Mapping. This approach was known jovially within Ordnance Survey as the “hint of IP argument”. Probably unintentionally, the EU directive tore up centuries of copyright law, whereby copyright expired, and the legal tradition that “facts” may not be copyrighted and passed, to those maintaining databases, a potential income stream and an ability to seriously impede competition, in perpetuity. This situation is greatly aggravated when the bulk of the database involved (The OS MasterMap database) would conventionally have been deemed out of copyright, an unprotected collection of facts, or Crown Copyright data that should be released in line with the “normal” provision of European Public Sector Information regulations, at the marginal cost of duplication. Instead our National Mapping Agency is obliged to defend its business model by making spurious assertions about “coincidence” or “substitution”. There is also an inherent implication in Ordnance Survey’s stance that not all points, lines and areas in its MasterMap database are of equal value and equally worth fighting over. In practice the cadastral subset of property data (land ownership parcels and building footprints) now made available by many countries as Open Data, is deemed to be of particular value, hence the extreme sensitivity over the release of the HMLR Polygons, which could be useful to competitors wishing to produce more stripped down mapping products. Ordnance Survey was previously equally sensitive over its other “crown jewels”, administrative and statistical boundaries and street data. Both those battles have been lost as logic argued that boundaries needed to become open data, and duplication by commercial competitors and the Open Street Map community has destroyed Ordnance Survey’s street mapping monopoly. Conclusion I believe Ordnance Survey’s claim to have demonstrated a significant degree of coincidence between HMLR’s Inspire polygon dataset and their own MasterMap data set to be largely spurious and unsupported by any objective exercise to measure that coincidence. If the OS methodology had/has been documented and it could be demonstrated that there is independent support for the definition of “coincidence” that it measures, the “evidence” could be taken more seriously. As it stands the exercise appears simply to have been designed by Ordnance Survey to demonstrate that some HMLR Inspire polygons in areas of high registration and simple parcel and building form do, unsurprisingly, coincide with some features on OS MasterMap. What their exercise does not demonstrate is what proportion of the OS MasterMap data set is replicated, even approximately, in HMLR data or what the risk of substitution is. It should also be borne in mind that Ordnance survey introduced the “substitution” risk concept into the conversation only when they might have felt that “coincidence” alone was not sufficiently high to justify the level of royalty they wished to levy against those wishing to use the HMLR data. An objective and independent exercise to measure "coincidence" properly defined and agreed could be carried out easily and cheaply. Professor Robert Barr OBE Lymm, Cheshire September 2014 Note This article is an expression of my own opinion and does not necessarily represent the views of any organization with which I am associated. Manchester University Part of Lymm, Cheshire Part of New Mills, Derbyshire