Spatial Data Science across Languages

2024

Thank you for being here

Faculty of Science

Map and Data Centre

Program committee

Yomna and Edzer

Danka and the team

Practical info

  • Toliets
  • Breaks
  • Dinner

Hybrid setting

  • Zoom
  • Discord
  • Notes

Ask questions on Discord, where we can have a persistent discussion, rather than on Zoom.

Scope, Goals, and Outcomes

Bridge the language barriers

Bring the communities together

Discuss the differences and commonalities

Find ways to discuss, cooperate,
and synchronise the efforts

Primarily developer-oriented audience

In-person discussion as the main goal

Recommendations

Connections

Cross-pollination

Expectations

  • What do you expect from this workshop?

Program

  • Each session to be kickstarted by a short talk.

Looking back at 2023

Personal reflections

Different maturity of ecosystems

Omnipresence of Arrow

Freedom vs. security mindset

Working paper

Point vs block support

Geodetic coordinates

Data cubes

File formats, data connectivity, and in-memory representation

  • GeoParquet
  • GeoArrow
  • Zarr, GeoTIFF

Inter-package dependencies and shared upstream libraries

Recommendations

Open Standards

It is encouraging to see that involvement in the development of open geospatial is no longer restricted to members of the Open Geospatial Consortium (OGC), and takes place in issues of public OGC GitHub repositories (for instance for GeoZarr and GeoParquet), or even completely outside OGC communication channels (e.g. STAC, GDAL, and openEO).

It would be convenient to have routines to verify polygons form a coverage (i.e. are not overlapping), or to create non-overlapping polygons from overlap- ping ones polygons, in a way that all spatial data science languages can profit from.

Wider adoption of GeoParquet as a more efficient file format for whole-file read/write and wider adoption of GeoAr- row as a common metadata standard and memory model may help address […] challenges.

Field domains

Splitting spatially extensive block support variables to point geometries should at all times be avoided.

[…] split and merge policies follow from a variable being spatially extensive or intensive, and hence one of them is obsolete.

A less ambiguous approach would be to have a single field domain called is_spatially_extensive with a boolean value.

Geodetic coordinates, spherical geometry

#sdsl2024