HMLR Inspire Polygon Index Map Comments on

Transcription

HMLR Inspire Polygon Index Map Comments on
HMLR Inspire Polygon Index Map
Comments on the Ordnance Survey PowerPoint presentation to OPSI presented to demonstrate
high levels of coincidence with OS MasterMap
Introduction
In the course of investigating a complaint by 77M OPSI was presented with a set of 21 MS
PowerPoint slides (attached) which I understand were shown and explained in Ordnance Survey’s
evidence to demonstrate that there is a high level of coincidence between HMLR Inspire polygons
and features on OS MasterMap.
I have been asked to comment on Ordnance Survey’s methodology and the apparent conclusion that
the level of coincidence between the HMLR data and OS’s product is high. I also understand that OS
introduced the additional assertion that because the level of coincidence between the HMLR data
and their own MasterMap product is high, this exposes Ordnance survey to the commercial risk of
potential customers choosing to use the HMLR data as a substitute for the Ordnance Survey data
from which they were derived.
As I was not present at OS’s PowerPoint presentation and have seen no documentation to explain
how their sample areas were chosen all comments in this note relate only to what can be concluded
from the PowerPoint presentation and background knowledge.
HMLR use of Ordnance Survey data
There is no dispute over whether HMLR data is informed by Ordnance Survey mapping. The Land
Transfer Act of 1897 introduced the requirement that the Ordnance Survey map would become the
basis of registered mapping (Andrew Edwards, 2001, Quinquennial Review of HM Land Registry).
What is in contention is whether, and how much of, Ordnance Survey’s Intellectual Property (IP) is
embedded in the HMLR Index map Polygons (in particular the Inspire edition) and what should be
the fair royalty due to Ordnance Survey for the use of that IP by third parties.
Coincidence
The Ordnance Survey presentation implies that the “coincidence” with the HMLR map is
demonstrated because it is possible to duplicate a large proportion of the HMLR polygons by
following lines which appear in Ordnance Survey’s MasterMap product.
Indeed it would be very surprising if this was not the case because HMLR have used OS paper maps,
the OS digital LandLine product and most recently OS MasterMap as the base on which to record
their mapping since 1897 because they are legally obliged to do so.
Had HMLR surveyors independently surveyed property boundaries on the ground, unnecessary
because of the availability of Ordnance Survey mapping. Or, if independent analysts had inferred
parcel boundaries from aerial photography, it would be the case that there would be a high level of
“coincidence” between the three versions because the same real-world features are being recorded
and, culturally most property boundaries are marked by hedges, fences, walls, or are implicit in
physical aspects of building outlines.
So coincidence between anyone recording real-world features independently and the recording of
those features on Ordnance Survey maps is entirely unsurprising. The argument about coincidence
is, in fact, an argument about intellectual property – discussed more fully below, and not about the
recording of the facts that are independently observable about the real world.
If an independent expert was asked the question how much “coincidence” is there between the
HMLR Inspire polygons and Ordnance Survey MasterMap they are likely to have developed a quite
different methodology and set of metrics to those offered by Ordnance Survey.
Ordnance Survey appear to be attempting to calculate what proportion of the HMLR line work exists
on OS MasterMap. In order to do so they appear to have chosen sample areas specifically selected
to maximise that ratio, rather than using any objective sampling methodology.
If one was cynical one may believe that the exercise was carried out intentionally in a way to make
the case that because most HMLR line-work coincides with lines that appear on OS MasterMap, it is,
in effect largely a simple sub-set of OS MasterMap and, in those circumstances OS has the right to
dictate the terms on which that sub-set is passed to third parties.
An alternative, I would suggest, more objective view, is that a different question should be asked
which is not what proportion of HMLR line features “coincide” with Ordnance Survey lines, but what
proportion of the lines that appear on OS MasterMap also appear in the HMLR product. Because of
the level of detail and the multiplicity of features on MasterMap that proportion is very small
indeed.
If one was to take that far more comprehensive view of the coincidence question then
methodologies which are quite different to that adopted and demonstrated by OS might be
adopted.
Coincidence at the atomic level
OS MasterMap, as with any geographic data set is structured hierarchically. At the atomic level the
data comprises a large collection of coordinate pairs (Easting and Northing, or x,y). These are joined
in pairs as line segments, line segments are chained into lines, lines are connected to form polygons.
The OS demonstration is based on the comparison of polygons. OS have constructed BLPU (Basic
Land and Property Unit) polygons to compare with HMLR’s land parcels. The BLPU is a compound
feature usually comprising the boundary fence lines and, in the case of semi-detached or terraced
housing, lines representing party walls within the buildings. The BLPU is an addressable unit which is
defined in BS7666, the BSI Addressing standard. In the past BLPUs, when defined, were captured
from OS mapping by local authorities as part of their Local Land and Property Gazetteer (LLPG)
custodian role.
When local authorities captured BLPUs as a spatial and visual representation of addresses in their
LLPG they encountered similar derived data issues now faced by HMLR. Some authorities resorted to
capturing BLPUs from aerial photography to avoid that risk.
The OS PowerPoint implication implies that the polygons they show exist as discrete polygons in OS
MasterMap, in fact they do not. The BLPU has not, historically, been a MasterMap feature and
BLPUs cannot have been copied from MasterMap by HMLR.
By constructing BLPUs Ordnance Survey have, in effect, replicated HMLR’s normal work practice,
which is informed by HMLR’s independently gathered intelligence on where the property boundaries
actually lie. In those circumstances, and given that highly structured areas of simple property
boundaries were chosen, it is not at all surprising that Ordnance Survey demonstrated high levels of
“coincidence”
Sampling
A more objective approach to measuring real coincidence would be to sample different
environments. An approach used by Ordnance Survey before, involved selecting a stratified random
sample of kilometre grid squares in areas with different densities of properties and different
property types. The stratification was done by using the ONS Output Area Classification and ensuring
that each of types of area in the classification was represented in the sample in proportion to the
number of such areas in the country. A range of similar sampling strategies could be devised and
each would be fairer than the choice of areas made by Ordnance Survey for their presentation.
Points
If the HMLR line-work had been directly copied from OS data, each coordinate pair (x,y) in HMLR’s
data would also appear in the OS data. This could be easily checked for the sample areas. It could
then be argued that the true coincidence as a percentage, at point level, between the HMLR data
and the OS data was:
𝑛
× 100
𝑚
Where n is the number of points that make up the HMLR polygons and m the number of points in
the equivalent area of MasterMap.
In practice it is extremely unlikely that any HMLR points actually coincide with MasterMap
coordinates perfectly because they were not copied from the underlying data, but re-traced with
error.
In this case “coincidence” should be treated as points in the HMLR data sufficiently close to an OS
point, bearing in mind digitising error, or intentional displacement, to be treated as “most probably”
derived from and informed by the location of an OS point. Such a margin of error would inflate n,
but, given what a tiny subset of all features, and their constituent points are re-used by HMLR the
level of coincidence would remain small.
Lines
It may be argued that, though atomic, points are too abstract a geometric feature to be analysed.
However precisely the same logic could be used to compare line features in HMLR’s product with
those in MasterMap. A similar calculation could be made by taking line segments (the straight lines
joining pairs of points) in HMLR’s product, placing a “bounding box“ around those line segments
based on the same average error used for points and measuring the length of Line Segments from
the equivalent area of OS MasterMap that fall within the HMLR bounding boxes as a proportion of
the total line length of all the line segments in the sample area. While relatively easy to compute
algorithmically this exercise, again would give a result which was small.
Areas
OS MasterMap is structured as a set of non-overlapping areas represented by polygons made up of
line segments, in turn made up of points. As pointed out above, the BLPU is made up from a set of
adjacent polygons not, usually an individual polygon, because Land Parcels have not, historically,
been a MasterMap feature type. However it would be possible to carry out an analysis where every
HMLR polygon was taken and its area compared to the sum of the areas of all OS MasterMap
polygons fully contained within it, or within the HMLR Polygon bounded by an error buffer.
HMLR polygons which could be made up entirely of OS polygons would be deemed coincident, those
whose areas were significantly different to the best fit of OS polygons overlapping them noncoincident.
Again this is an exercise that could be carried out algorithmically and most closely corresponds to
what OS appear to have done by hand.
Other examples
Appendix 1 shows HMLR Inspire polygons superimposed on third party mapping and aerial imagery.
These do not purport to be an objective sample, however they are probably as objective as the
demonstration areas chosen by Ordnance Survey.
The three examples show the area around Manchester University, including a significant estate of
local authority and former local authority housing in the East, and two outer sub-urban areas part of
Lymm in Cheshire and part of New Mills in Derbyshire. All three were chosen because the author has
either worked or lived in the locality. They were not chosen to show either high or low levels of
potential coincidence between HMLR data and OS data. What they do demonstrate when examined
closely is how varied, spatially inconsistent and incomplete the HMLR data is compared to the
detailed topographic mapping in OS’s examples chosen to demonstrate coincidence.
An objective exercise
The issue of coincidence between the HMLR Inspire Polygon index map, which is an indicative
representation of registered property ownership and OS MasterMap data could only be resolved
objectively by defining and agreeing a methodology similar to those suggested in the “thought
experiment” above. What is certain is that the OS PowerPoint presentation (attached) presented to
OPSI as evidence of coincidence is weak, inconclusive and partial, apparently designed to prove the
point that Ordnance Survey were asserting. It should not be considered to be an objective
assessment of the coincidence between the two products.
Intellectual Property
This argument is not one about the coincidence of points, lines or areas, because it is unlikely that
any reasonable lay person, or juror, would conclude that there is either a high level of coincidence or
a significant risk of substitution of HMLR data for OS MasterMap in its entirety.
The argument being pursued using the “evidence” supplied by OS is one of the legitimacy of their
business model whereby, by the application of the IPR protection provided by the European
Database Directive (Directive 96/9/EC of the European Parliament and of the Council of 11 March
1996 on the legal protection of databases) Ordnance Survey may extract royalties up to the total
value of MasterMap licences from any organization that has captured data which has in any way
been informed by Ordnance Survey Mapping. This approach was known jovially within Ordnance
Survey as the “hint of IP argument”.
Probably unintentionally, the EU directive tore up centuries of copyright law, whereby copyright
expired, and the legal tradition that “facts” may not be copyrighted and passed, to those maintaining
databases, a potential income stream and an ability to seriously impede competition, in perpetuity.
This situation is greatly aggravated when the bulk of the database involved (The OS MasterMap
database) would conventionally have been deemed out of copyright, an unprotected collection of
facts, or Crown Copyright data that should be released in line with the “normal” provision of
European Public Sector Information regulations, at the marginal cost of duplication.
Instead our National Mapping Agency is obliged to defend its business model by making spurious
assertions about “coincidence” or “substitution”.
There is also an inherent implication in Ordnance Survey’s stance that not all points, lines and areas
in its MasterMap database are of equal value and equally worth fighting over. In practice the
cadastral subset of property data (land ownership parcels and building footprints) now made
available by many countries as Open Data, is deemed to be of particular value, hence the extreme
sensitivity over the release of the HMLR Polygons, which could be useful to competitors wishing to
produce more stripped down mapping products.
Ordnance Survey was previously equally sensitive over its other “crown jewels”, administrative and
statistical boundaries and street data. Both those battles have been lost as logic argued that
boundaries needed to become open data, and duplication by commercial competitors and the Open
Street Map community has destroyed Ordnance Survey’s street mapping monopoly.
Conclusion
I believe Ordnance Survey’s claim to have demonstrated a significant degree of coincidence between
HMLR’s Inspire polygon dataset and their own MasterMap data set to be largely spurious and
unsupported by any objective exercise to measure that coincidence. If the OS methodology had/has
been documented and it could be demonstrated that there is independent support for the definition
of “coincidence” that it measures, the “evidence” could be taken more seriously.
As it stands the exercise appears simply to have been designed by Ordnance Survey to demonstrate
that some HMLR Inspire polygons in areas of high registration and simple parcel and building form
do, unsurprisingly, coincide with some features on OS MasterMap.
What their exercise does not demonstrate is what proportion of the OS MasterMap data set is
replicated, even approximately, in HMLR data or what the risk of substitution is. It should also be
borne in mind that Ordnance survey introduced the “substitution” risk concept into the conversation
only when they might have felt that “coincidence” alone was not sufficiently high to justify the level
of royalty they wished to levy against those wishing to use the HMLR data.
An objective and independent exercise to measure "coincidence" properly defined and agreed could
be carried out easily and cheaply.
Professor Robert Barr OBE
Lymm, Cheshire
September 2014
Note
This article is an expression of my own opinion and does not necessarily represent the views of any
organization with which I am associated.
Manchester University
Part of Lymm, Cheshire
Part of New Mills, Derbyshire