Commons:Spacemedia

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Spacemedia is a tool written by Don-vip that continuously harvests media libraries of various space agencies in order to find free media not yet uploaded to Wikimedia Commons.

The tool runs on Toolforge (Kubernetes/jdk17 cluster) and uses OptimusPrimeBot account to upload automatically new discovered media.

The discussion to authorize this tool takes place here.

As of 2020-04-22 the tool is currently importing its first images for the test run.

Links[edit]

Useful discussions / projects[edit]

Monitored repositories and status[edit]

Repositories providing free medias with compatible licences
Agency / repository Licence(s) Remark Monitor status Upload status Total Free Missing Priority
US Air Force / Space Force
https://www.flickr.com/photos/airforcespacecommand Cc-by-2.0, Cc-zero Should be PD-USGov-Military-Air Force Green Red 1319 65 65 P1
https://www.flickr.com/people/129133022@N07/ PD Should be PD-USGov-Military-Air Force Green Red 180 128 121 P1
https://www.spaceforce.mil/Multimedia/Photos/ PD Should be PD-USGov-Military-Space Force Red Red P1
DLR (Deutsches Zentrum für Luft- und Raumfahrt, German Aerospace Center)
https://www.flickr.com/photos/dlr_de Cc-by-2.0, Cc-by-sa-2.0 Green Red 7668 2954 2114 P1
https://www.dlr.de/EN/organisation-dlr/news/all-news.html Cc-by-sa-3.0 Many news images in cc-by-sa, need to check individually though Red Red P2
ESA (European Space Agency)
http://www.esa.int/spaceinimages/Images Cc-by-sa-3.0-IGO Green Red 24686 3889 545 P1
https://www.flickr.com/photos/europeanspaceagency/ Cc-by-2.0, Cc-by-sa-2.0, CC-PD-Mark Green Red 8279 1207 642 P2
https://www.flickr.com/photos/esa_events Cc-by-2.0, Cc-by-sa-2.0, CC-PD-Mark Green Red 20827 1088 1060 P3
https://www.youtube.com/channel/UClB8L8TJEQfZ41Ii0gJRTSQ Cc-by-sa-3.0-IGO Red Red ~50 ~50 ~50 P3
EU (European Union)
https://emergency.copernicus.eu/mapping/list-of-activations-rapid - https://emergency.copernicus.eu/mapping/list-of-activations-risk-and-recovery Attribution-Copernicus required to inform about the source of information and/or data by using provided reference Red Red ? ? 0 P3
EUMETSAT (European Organisation for the Exploitation of Meteorological Satellites)
https://l-zone.info CC-BY-SA 3.0 IGO Unless otherwise noted, content available on EUMETSAT’s Learning Zone is available under CC BY-SA 3.0 IGO Red Red ? ? 0 P2
https://www.flickr.com/people/eumetsat/ ? Red Red ? ? 0 P2
ESO (European Southern Observatory) / IAU (International Astronomy Union) / Hubble / Webb
https://www.eso.org/public/images/ ESO (CC-by-4.0) Green 13802 P1
https://www.iau.org/public/images/ CC-by-4.0 Green P1
https://www.spacetelescope.org/images/ ESA-Hubble Green P1
https://hubblesite.org/resource-gallery/images PD-Hubble Green P1
https://webbtelescope.org/ PD-Webb Green P1
https://esawebb.org/ ESA-Webb Red P1
ISRO (Indian Space Research Organisation)
http://pib.gov.in/PhotoCategories.aspx?MenuId=8 GODL-India (through PIB) https://gitlab.com/Gazoth/pib-upload/issues/1 Red P2
KARI (Korean Aerospace Research Institute)
https://www.kari.re.kr/kor/kariimg/list.do?img_gbn=PHO KOGL Green Red 1053 1018 913 P1
Vera C. Rubin Observatory - LSST (Large Synoptic Survey Telescope)
https://gallery.lsst.org Cc-by-4.0 To do Red 1572 P1
ALMA (Atacama Large Millimeter/submillimeter Array)
https://www.almaobservatory.org/en/copyright-notice/ Cc-by-4.0 To do Red P1
NASA (National Aeronautics and Space Administration)
https://images.nasa.gov/ PD-USGov-NASA Green Red 172548 172548 167939 P1
https://photojournal.jpl.nasa.gov/ PD-USGov-NASA Use Template:NASA Photojournal Red P2
https://earthobservatory.nasa.gov/images PD-USGov-NASA Red P2
https://eol.jsc.nasa.gov/Collections/ PD-USGov-NASA To coordinate with User_talk:Askeuhd/Archive_2#Usefulness_of_ISS_image_dumps Red P2
https://www.nasa.gov/multimedia/imagegallery/iotd.html PD-USGov-NASA To do. https://kaijento.github.io/2017/05/06/web-scraping-nasa-image-of-the-day/ Red P2
https://www.sti.nasa.gov/harvesting-data-from-ntrs/ PD-USGov-NASA To do. Run requests between 8PM-8AM U.S. ET. No more than one request every 3 seconds. Red 466225 P2
https://svs.gsfc.nasa.gov/ PD-USGov-NASA Red P2
https://www.uahirise.org/ PD-NASA-HiRISE https://www.uahirise.org/media/usage.php Red ? P5
https://visibleearth.nasa.gov/ PD-USGov-NASA Red Red
https://modis.gsfc.nasa.gov/gallery/showall.php PD-USGov-NASA Red Red
https://epic.gsfc.nasa.gov/galleries PD-USGov-NASA Red Red
https://ails.arc.nasa.gov/ PD-USGov-NASA Red Red
https://www.flickr.com/photos/atmospheric-infrared-sounder Cc-by-2.0 Should be PD-USGov-NASA Green Red 112 112 98 P5
https://www.flickr.com/photos/earthrightnow Cc-by-2.0 Should be PD-USGov-NASA Green Red 798 613 608 P5
https://www.flickr.com/photos/gsfc Cc-by-2.0, Cc-zero Should be PD-USGov-NASA Green Red 6332 5486 2029 P4
https://www.flickr.com/photos/morpheuslander Cc-by-2.0 Should be PD-USGov-NASA Green Red 258 258 256 P5
https://www.flickr.com/photos/nasa2explore Cc-by-2.0, CC-PD-Mark Should be PD-USGov-NASA Green Red 53287 71 68 P5
https://www.flickr.com/photos/nasa_appel Cc-by-2.0 Should be PD-USGov-NASA Green Red 875 56 56 P5
https://www.flickr.com/photos/nasa_goddard Cc-by-2.0 Should be PD-USGov-NASA Green Red 7061 6983 6708 P3
https://www.flickr.com/photos/nasa_ice Cc-by-2.0 Should be PD-USGov-NASA Green Red 688 686 399 P5
https://www.flickr.com/photos/nasablueshift Cc-by-2.0, Cc-by-sa-2.0 Should be PD-USGov-NASA Green Red 1134 1123 1034 P4
https://www.flickr.com/photos/nasacommons Flickr-no known copyright restrictions Should be PD-USGov-NASA Green Red 3276 3276 1578 P4
https://www.flickr.com/photos/nasaearthobservatory Cc-by-2.0 Should be PD-USGov-NASA Green Red 316 315 137 P5
https://www.flickr.com/photos/nasafo CC-PD-Mark Should be PD-USGov-NASA Green Red 2620 130 130 P5
https://www.flickr.com/photos/nasahubble Cc-by-2.0, CC-PD-Mark Should be PD-USGov-NASA Green Red 2160 1496 639 P4
https://www.flickr.com/photos/nasakennedy Cc-by-2.0, Cc-by-sa-2.0, CC-PD-Mark Should be PD-USGov-NASA Green Red 13827 2949 1994 P4
https://www.flickr.com/photos/nasarobonaut Cc-by-2.0 Should be PD-USGov-NASA Green Red 164 164 163 P5
https://www.flickr.com/photos/nasawebbtelescope Cc-by-2.0 Should be PD-USGov-NASA Green Red 2268 2259 2144 P4
https://www.flickr.com/photos/uahirise-mars Cc-zero, CC-PD-Mark Should be PD-USGov-NASA Green Red 2677 1109 30 P5
https://video.ibm.com/channel/nasa-tv-wallops TBC Should be PD-USGov-NASA Green Red ? ? ? P5
https://www.flickr.com/people/kevinmgill Cc-by-2.0 Red Red ? ? ? P5
NSO (National Solar Observatory) - USA
https://nso.edu/about/image-use-policy/ Cc-By 4.0 Red Red ? ? ? P5
University of Bern - Switzerland
https://www.cassis.unibe.ch/ Cc-By-Sa 3.0 IGO Red Red ? ? ? P5
INPE (Instituto Nacional de Pesquisas Espaciais) - Brazil
http://www.dpi.inpe.br/galeria/ INPE, Cc-By-Sa 4.0 Red Red ? ? ? P5
https://www.flickr.com/photos/observacao-da-terra/ INPE, Cc-By-Sa 2.0 Red Red ? ? ? P5
Planet
https://www.planet.com/gallery/ Formerly Cc-by-4.0. No longer :( Red 61+ 61+ P4
SpaceX
https://www.flickr.com/photos/spacex Formerly Cc-zero-SpaceX. No longer :( Green Red 783 766 0 P1

Technical description[edit]

The tool is developed using Maven, Spring Boot, Spring Data JPA, Spring Cache, Jsoup, Dozer, Hibernate Search, Flickr4Java and Apache Commons libraries.

The tool connects to the Commons database replica to search for existing medias (by their base36 SHA-1) in these three tables:

  • image: current version of images
  • oldimage: old versions of images
  • filearchive: deleted images (when another version with a different SHA-1 exists)

The result of the analysis is persisted in a dedicated user database.

Most of the configuration is in application.properties. Sensitive information are not stored in the git repository and must be provided at runtime, on the command line.

Launch tool on local machine[edit]

java -Dcommons.datasource.username=$user -Dcommons.datasource.password=$password -Dflickr.api.key=$flickr_api_key -Dflickr.secret=$flickr_secret -jar spacemedia.jar --spring.profiles.active=dev

Launch tool on Toolforge[edit]

java -Xmx4G -Ddomain.datasource.url=jdbc:mariadb://tools.db.svc.eqiad.wmflabs:3306/${user}__spacemedia -Ddomain.datasource.username=$user -Ddomain.datasource.password=$password -Dcommons.datasource.username=$user -Dcommons.datasource.password=$password -Dflickr.api.key=$flickr_api_key -Dflickr.secret=$flickr_secret -jar spacemedia.jar --spring.profiles.active=toolforge

Resource usage[edit]

The production database takes around 300Mb.

The tool can be quite memory hungry when checking if ESA files are valid (to avoid uploading corrupted files to Wikimedia Commons), especially for large TIFF files.