Commons:Spacemedia
Spacemedia is a tool written by Don-vip that continuously harvests media libraries of various space agencies in order to find free media not yet uploaded to Wikimedia Commons.
The tool runs on Toolforge (Kubernetes/jdk17 cluster) and uses OptimusPrimeBot account to upload automatically new discovered media.
The discussion to authorize this tool takes place here.
As of 2020-04-22 the tool is currently importing its first images for the test run.
Links[edit]
- Health-check
- Tool page on Toolsadmin
- Project on Phabricator (go there for bug reports!)
- Diffusion repository on Phabricator (go there for source code!)
- Platform monitoring
Useful discussions / projects[edit]
- User:Fæ/Imagehash
- User_talk:Fæ/2017#Image_hashes_-_tests_on_PD_US_collections
- User_talk:Fæ/2019#ESA_duplicate_pictures
- Commons:Bots/Requests/OptimusPrimeBot
Monitored repositories and status[edit]
Agency / repository | Licence(s) | Remark | Monitor status | Upload status | Total | Free | Missing | Priority |
---|---|---|---|---|---|---|---|---|
US Air Force / Space Force | ||||||||
https://www.flickr.com/photos/airforcespacecommand | Cc-by-2.0, Cc-zero | Should be PD-USGov-Military-Air Force | Green | Red | 1319 | 65 | 65 | P1 |
https://www.flickr.com/people/129133022@N07/ | PD | Should be PD-USGov-Military-Air Force | Green | Red | 180 | 128 | 121 | P1 |
https://www.spaceforce.mil/Multimedia/Photos/ | PD | Should be PD-USGov-Military-Space Force | Red | Red | P1 | |||
DLR (Deutsches Zentrum für Luft- und Raumfahrt, German Aerospace Center) | ||||||||
https://www.flickr.com/photos/dlr_de | Cc-by-2.0, Cc-by-sa-2.0 | Green | Red | 7668 | 2954 | 2114 | P1 | |
https://www.dlr.de/EN/organisation-dlr/news/all-news.html | Cc-by-sa-3.0 | Many news images in cc-by-sa, need to check individually though | Red | Red | P2 | |||
ESA (European Space Agency) | ||||||||
http://www.esa.int/spaceinimages/Images | Cc-by-sa-3.0-IGO | Green | Red | 24686 | 3889 | 545 | P1 | |
https://www.flickr.com/photos/europeanspaceagency/ | Cc-by-2.0, Cc-by-sa-2.0, CC-PD-Mark | Green | Red | 8279 | 1207 | 642 | P2 | |
https://www.flickr.com/photos/esa_events | Cc-by-2.0, Cc-by-sa-2.0, CC-PD-Mark | Green | Red | 20827 | 1088 | 1060 | P3 | |
https://www.youtube.com/channel/UClB8L8TJEQfZ41Ii0gJRTSQ | Cc-by-sa-3.0-IGO | Red | Red | ~50 | ~50 | ~50 | P3 | |
EU (European Union) | ||||||||
https://emergency.copernicus.eu/mapping/list-of-activations-rapid - https://emergency.copernicus.eu/mapping/list-of-activations-risk-and-recovery | Attribution-Copernicus | required to inform about the source of information and/or data by using provided reference | Red | Red | ? | ? | 0 | P3 |
EUMETSAT (European Organisation for the Exploitation of Meteorological Satellites) | ||||||||
https://l-zone.info | CC-BY-SA 3.0 IGO | Unless otherwise noted, content available on EUMETSAT’s Learning Zone is available under CC BY-SA 3.0 IGO | Red | Red | ? | ? | 0 | P2 |
https://www.flickr.com/people/eumetsat/ | ? | Red | Red | ? | ? | 0 | P2 | |
ESO (European Southern Observatory) / IAU (International Astronomy Union) / Hubble / Webb | ||||||||
https://www.eso.org/public/images/ | ESO (CC-by-4.0) | Green | 13802 | P1 | ||||
https://www.iau.org/public/images/ | CC-by-4.0 | Green | P1 | |||||
https://www.spacetelescope.org/images/ | ESA-Hubble | Green | P1 | |||||
https://hubblesite.org/resource-gallery/images | PD-Hubble | Green | P1 | |||||
https://webbtelescope.org/ | PD-Webb | Green | P1 | |||||
https://esawebb.org/ | ESA-Webb | Red | P1 | |||||
ISRO (Indian Space Research Organisation) | ||||||||
http://pib.gov.in/PhotoCategories.aspx?MenuId=8 | GODL-India (through PIB) | https://gitlab.com/Gazoth/pib-upload/issues/1 | Red | P2 | ||||
KARI (Korean Aerospace Research Institute) | ||||||||
https://www.kari.re.kr/kor/kariimg/list.do?img_gbn=PHO | KOGL | Green | Red | 1053 | 1018 | 913 | P1 | |
Vera C. Rubin Observatory - LSST (Large Synoptic Survey Telescope) | ||||||||
https://gallery.lsst.org | Cc-by-4.0 | To do | Red | 1572 | P1 | |||
ALMA (Atacama Large Millimeter/submillimeter Array) | ||||||||
https://www.almaobservatory.org/en/copyright-notice/ | Cc-by-4.0 | To do | Red | P1 | ||||
NASA (National Aeronautics and Space Administration) | ||||||||
https://images.nasa.gov/ | PD-USGov-NASA | Green | Red | 172548 | 172548 | 167939 | P1 | |
https://photojournal.jpl.nasa.gov/ | PD-USGov-NASA | Use Template:NASA Photojournal | Red | P2 | ||||
https://earthobservatory.nasa.gov/images | PD-USGov-NASA | Red | P2 | |||||
https://eol.jsc.nasa.gov/Collections/ | PD-USGov-NASA | To coordinate with User_talk:Askeuhd/Archive_2#Usefulness_of_ISS_image_dumps | Red | P2 | ||||
https://www.nasa.gov/multimedia/imagegallery/iotd.html | PD-USGov-NASA | To do. https://kaijento.github.io/2017/05/06/web-scraping-nasa-image-of-the-day/ | Red | P2 | ||||
https://www.sti.nasa.gov/harvesting-data-from-ntrs/ | PD-USGov-NASA | To do. Run requests between 8PM-8AM U.S. ET. No more than one request every 3 seconds. | Red | 466225 | P2 | |||
https://svs.gsfc.nasa.gov/ | PD-USGov-NASA | Red | P2 | |||||
https://www.uahirise.org/ | PD-NASA-HiRISE | https://www.uahirise.org/media/usage.php | Red | ? | P5 | |||
https://visibleearth.nasa.gov/ | PD-USGov-NASA | Red | Red | |||||
https://modis.gsfc.nasa.gov/gallery/showall.php | PD-USGov-NASA | Red | Red | |||||
https://epic.gsfc.nasa.gov/galleries | PD-USGov-NASA | Red | Red | |||||
https://ails.arc.nasa.gov/ | PD-USGov-NASA | Red | Red | |||||
https://www.flickr.com/photos/atmospheric-infrared-sounder | Cc-by-2.0 | Should be PD-USGov-NASA | Green | Red | 112 | 112 | 98 | P5 |
https://www.flickr.com/photos/earthrightnow | Cc-by-2.0 | Should be PD-USGov-NASA | Green | Red | 798 | 613 | 608 | P5 |
https://www.flickr.com/photos/gsfc | Cc-by-2.0, Cc-zero | Should be PD-USGov-NASA | Green | Red | 6332 | 5486 | 2029 | P4 |
https://www.flickr.com/photos/morpheuslander | Cc-by-2.0 | Should be PD-USGov-NASA | Green | Red | 258 | 258 | 256 | P5 |
https://www.flickr.com/photos/nasa2explore | Cc-by-2.0, CC-PD-Mark | Should be PD-USGov-NASA | Green | Red | 53287 | 71 | 68 | P5 |
https://www.flickr.com/photos/nasa_appel | Cc-by-2.0 | Should be PD-USGov-NASA | Green | Red | 875 | 56 | 56 | P5 |
https://www.flickr.com/photos/nasa_goddard | Cc-by-2.0 | Should be PD-USGov-NASA | Green | Red | 7061 | 6983 | 6708 | P3 |
https://www.flickr.com/photos/nasa_ice | Cc-by-2.0 | Should be PD-USGov-NASA | Green | Red | 688 | 686 | 399 | P5 |
https://www.flickr.com/photos/nasablueshift | Cc-by-2.0, Cc-by-sa-2.0 | Should be PD-USGov-NASA | Green | Red | 1134 | 1123 | 1034 | P4 |
https://www.flickr.com/photos/nasacommons | Flickr-no known copyright restrictions | Should be PD-USGov-NASA | Green | Red | 3276 | 3276 | 1578 | P4 |
https://www.flickr.com/photos/nasaearthobservatory | Cc-by-2.0 | Should be PD-USGov-NASA | Green | Red | 316 | 315 | 137 | P5 |
https://www.flickr.com/photos/nasafo | CC-PD-Mark | Should be PD-USGov-NASA | Green | Red | 2620 | 130 | 130 | P5 |
https://www.flickr.com/photos/nasahubble | Cc-by-2.0, CC-PD-Mark | Should be PD-USGov-NASA | Green | Red | 2160 | 1496 | 639 | P4 |
https://www.flickr.com/photos/nasakennedy | Cc-by-2.0, Cc-by-sa-2.0, CC-PD-Mark | Should be PD-USGov-NASA | Green | Red | 13827 | 2949 | 1994 | P4 |
https://www.flickr.com/photos/nasarobonaut | Cc-by-2.0 | Should be PD-USGov-NASA | Green | Red | 164 | 164 | 163 | P5 |
https://www.flickr.com/photos/nasawebbtelescope | Cc-by-2.0 | Should be PD-USGov-NASA | Green | Red | 2268 | 2259 | 2144 | P4 |
https://www.flickr.com/photos/uahirise-mars | Cc-zero, CC-PD-Mark | Should be PD-USGov-NASA | Green | Red | 2677 | 1109 | 30 | P5 |
https://video.ibm.com/channel/nasa-tv-wallops | TBC | Should be PD-USGov-NASA | Green | Red | ? | ? | ? | P5 |
https://www.flickr.com/people/kevinmgill | Cc-by-2.0 | Red | Red | ? | ? | ? | P5 | |
NSO (National Solar Observatory) - USA | ||||||||
https://nso.edu/about/image-use-policy/ | Cc-By 4.0 | Red | Red | ? | ? | ? | P5 | |
University of Bern - Switzerland | ||||||||
https://www.cassis.unibe.ch/ | Cc-By-Sa 3.0 IGO | Red | Red | ? | ? | ? | P5 | |
INPE (Instituto Nacional de Pesquisas Espaciais) - Brazil | ||||||||
http://www.dpi.inpe.br/galeria/ | INPE, Cc-By-Sa 4.0 | Red | Red | ? | ? | ? | P5 | |
https://www.flickr.com/photos/observacao-da-terra/ | INPE, Cc-By-Sa 2.0 | Red | Red | ? | ? | ? | P5 | |
Planet | ||||||||
https://www.planet.com/gallery/ | Formerly Cc-by-4.0. No longer :( | Red | 61+ | 61+ | P4 | |||
SpaceX | ||||||||
https://www.flickr.com/photos/spacex | Formerly Cc-zero-SpaceX. No longer :( | Green | Red | 783 | 766 | 0 | P1 |
Technical description[edit]
The tool is developed using Maven, Spring Boot, Spring Data JPA, Spring Cache, Jsoup, Dozer, Hibernate Search, Flickr4Java and Apache Commons libraries.
The tool connects to the Commons database replica to search for existing medias (by their base36 SHA-1) in these three tables:
- image: current version of images
- oldimage: old versions of images
- filearchive: deleted images (when another version with a different SHA-1 exists)
The result of the analysis is persisted in a dedicated user database.
Most of the configuration is in application.properties. Sensitive information are not stored in the git repository and must be provided at runtime, on the command line.
Launch tool on local machine[edit]
java -Dcommons.datasource.username=$user -Dcommons.datasource.password=$password -Dflickr.api.key=$flickr_api_key -Dflickr.secret=$flickr_secret -jar spacemedia.jar --spring.profiles.active=dev
Launch tool on Toolforge[edit]
java -Xmx4G -Ddomain.datasource.url=jdbc:mariadb://tools.db.svc.eqiad.wmflabs:3306/${user}__spacemedia -Ddomain.datasource.username=$user -Ddomain.datasource.password=$password -Dcommons.datasource.username=$user -Dcommons.datasource.password=$password -Dflickr.api.key=$flickr_api_key -Dflickr.secret=$flickr_secret -jar spacemedia.jar --spring.profiles.active=toolforge
Resource usage[edit]
The production database takes around 300Mb.
The tool can be quite memory hungry when checking if ESA files are valid (to avoid uploading corrupted files to Wikimedia Commons), especially for large TIFF files.