LAADS Data Download Scripts
- Downloading Files Using LAADS DAAC App Keys
- Deprecation of FTP
- Downloading via HTTP
- Earthdata Profile
- Authorization
- App Keys
- Automation
- Code Samples
Downloading Files Using LAADS DAAC App Keys
ESDIS, our parent organization, requires that we track who downloads files. In addition, MODIS and VIIRS Science teams require that we protect some data from download by the general public but permit it for authorized users.
To meet these requirements, ESDIS has implemented Earthdata Profile (URS), a profile manager that helps us keep a record of anyone using our services. In addition to URS, LAADS manages authorizations that restrict access to certain resources by user and by type of access.
In order to access restricted resources (e.g. SENTINEL-3) a user would need to first get authorization from the owner of that resource and log in. Users that are currently logged into LAADS should not need to log out and log back in in order to see the changes.
Deprecation of FTP
File Transfer Protocol (FTP) has been around for many generations and has been used successfully for transfering files via FTP clients and numerous scripting languages. Unfortunately, FTP is also considered a security risk by many cybersecurity experts.
NASA’s Goddard Space Flight Center will be blocking all requests to public facing FTP servers—including LAADS DAAC and LANCE NRT—as of 20 April 2018.
Downloading via HTTP
Hypertext Transfer Protocol (HTTP) is the protocol that drives most web site internet traffic today. A variant of the protocol, called “HTTPS”, “S” for “Secure”, has been chosen to replace FTP.
HTTPS encrypts all transactions between client and server. This makes it extremely difficult for third parties to intercept what is being transferred.
All HTTPS downloads will require either an active login session or an app key (i.e. token) passed in via an Authorization HTTP header. LAADS DAAC currently requires HTTPS downloading of all data.
Earthdata Profile
Each user must create an Earthdata URS profile in order to download files. Each URS profile is tracked in LAADS by email address, not URS usernames.
Authorization
After creating an Earthdata profile, you may request authorization to access a resource from the owner of that resource. The following rubrick will guide you on the process for each type of resource:
Resource | Instructions |
---|---|
MERIS or SENTINEL-3 |
Any user can agree to the corresponding license agreements when attempting to download a MERIS or SENTINEL-3 file. Each user will be prompted to agree to the terms of the license at that time. |
Other |
Have your PI contact us with the details. We need to know the following:
If you are not directly associated with a MODIS, VIIRS, or GOES-R Science Team then you will need to contact us directly. Please tell us
|
App Keys
Users wishing to download via a browser simply need to log in and click links within the LAADS Archive. Scripted downloads will need to use LAADS app keys in order to be properly authorized.
LAADS app keys are string tokens that identify who you are. App keys get passed in the Authorization header of each HTTP GET request. See code samples below.
Requesting an App Key
Any user that does not already have an app key for LAADS DAAC can perform the following steps:
- login by going to Profile -> Earthdata Login
- select Profile -> App Keys from the top menu
- create your new app key by entering a description and clicking the "Create New App Key" button
Retrieving an App Key
Any forgotten or lost app keys can be retrieved via the following steps:
- login to the LAADS DAAC web site using the Profile -> Earthdata Login menu
- select Profile -> App Keys from the menu
- highlight and copy any existing app key from the list
How to use an App Key
This example uses the curl
command to make a request for the file on our web server at the URL https://ladsweb.modaps.eosdis.nasa.gov/PATH_TO_MY_FILE
curl -v -H 'Authorization: Bearer MY_APP_KEY' 'https://ladsweb.modaps.eosdis.nasa.gov/PATH_TO_MY_FILE' > result
The example above passes your app key via the Authorization
HTTP header while utilizing the Bearer
schema. When finished, the resulting download will be written to a file called “result” in whatever directory (folder) you run the command from.
The app key located in the header is how LAADS identifies users. If the app key is invalid, missing, or an -h
is used instead of -H
, the curl command will not work. The -v
parameter tells curl to be verbose so extra information about the request and response will be printed out. If this extra information is not needed the -v
parameter can be left off.
Curl is available for all current operating systems, including Linux, MacOS, and Microsoft Windows.
Note
- all characters in the command are important, including dashes, colons, and quotation marks
- copy your app key in place of the string
MY_APP_KEY
- copy the path to the file you need in place of the string
PATH_TO_MY_FILE
- Most LAADS DAAC file paths look like
archive/allData/COLLECTION/PRODUCT/YEAR/DAY_OF_YEAR/FILENAME
An example of an existing file is below.
https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/6/MOD02QKM/2007/018/MOD02QKM.A2007018.0105.006.2014227230926.hdf
This path should return a MODIS Terra quarter kilometer (250 m) top of atmosphere reflectance product for year 2007, day-of-year 018 (i.e. January 18), from collection 6.
Automation
If all you need is one file and you know which file it is, it is much easier to go to the LAADS archive and click to download as needed. If you need many files (e.g. all of last month's MOD09 data) you might prefer to rely on scripts. We have samples for Shell Script, Perl, and Python.
Code Samples
Most current programming languages support HTTPS communication or can call on applications that support HTTPS communication. See sample scripts below. We provide support for wget, linux shell script, Perl, and Python. When recursively downloading entire directories of files, wget will likely require the least amount of code to run.
To use these, click "Download source" to download or copy and paste the code into a file with an extension reflecting the programming language (.sh for Shell Script, .pl for Perl, .py for Python). Be sure the Unix execute permissions are set for the file. Lastly, open a terminal or shell and execute the file. Command-line examples are also included below.
wget
wget is an open source utility that can download entire directories of files with just one command. The only path that is required is the root directory. wget will automatically traverse the directory and download any files it locates.
wget is free and available for Linux, macOS, and Windows.
Installation
- Linux
- Launch a command-line terminal
- Type
yum install wget -y
- macOS
- Install Homebrew (admin privileges required)
- Launch Applications > Utiliites > Terminal
- Type
brew install wget
- Windows
- Download the latest 32-bit or 64-bit binary (.exe) for your system
- Move it to C:\Windows\System32 (admin privileges will be required)
- Click the Windows Menu > Run
- Type
cmd
and hit Enter - In the Windows Command Prompt, type
wget -h
to verify the binary is being executed successfully - If any errors appear, the wget.exe binary may be not be located in correct directory or you may need to switch from 32-bit <-> 64-bit
Command-Line/Terminal Usage:
wget -e robots=off -m -np -R .html,.tmp -nH --cut-dirs=3 "https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/PATH_TO_DATA_DIRECTORY" --header "Authorization: Bearer APP_KEY" -P TARGET_DIRECTORY_ON_YOUR_FILE_SYSTEM
Be sure to replace the following:
- PATH_TO_DATA_DIRECTORY: location of source directory in LAADS Archive
- APP_KEY: Your app key
- TARGET_DIRECTORY_ON_YOUR_FILE_SYSTEM: Where you would like to download the files. Examples include /Users/jdoe/data for macOS and Linux or C:\Users\jdoe\data for Windows
Linux Shell Script
Download source (remove .txt extension when downloaded)
Command-Line/Terminal Usage:
% laads-data-download.sh
#!/bin/bash function usage { echo "Usage:" echo " $0 [options]" echo "" echo "Description:" echo " This script will recursively download all files if they don't exist" echo " from a LAADS URL and stores them to the specified path" echo "" echo "Options:" echo " -s|--source [URL] Recursively download files at [URL]" echo " -d|--destination [path] Store directory structure to [path]" echo " -t|--token [token] Use app token [token] to authenticate" echo "" echo "Dependencies:" echo " Requires 'jq' which is available as a standalone executable from" echo " https://stedolan.github.io/jq/download/" } function recurse { local src=$1 local dest=$2 local token=$3 echo "Querying ${src}.json" for dir in $(curl -s -H "Authorization: Bearer ${token}" ${src}.json | jq '.[] | select(.size==0) | .name' | tr -d '"') do echo "Creating ${dest}/${dir}" mkdir -p "${dest}/${dir}" echo "Recursing ${src}/${dir}/ for ${dest}/${dir}" recurse "${src}/${dir}/" "${dest}/${dir}" done for file in $(curl -s -H "Authorization: Bearer ${token}" ${src}.json | jq '.[] | select(.size!=0) | .name' | tr -d '"') do if [ ! -f ${dest}/${file} ] then echo "Downloading $file to ${dest}" # replace '-s' with '-#' below for download progress bars curl -s -H "Authorization: Bearer ${token}" ${src}/${file} -o ${dest}/${file} else echo "Skipping $file ..." fi done } POSITIONAL=() while [[ $# -gt 0 ]] do key="$1" case $key in -s|--source) src="$2" shift # past argument shift # past value ;; -d|--destination) dest="$2" shift # past argument shift # past value ;; -t|--token) token="$2" shift # past argument shift # past value ;; *) # unknown option POSITIONAL+=("$1") # save it in an array for later shift # past argument ;; esac done if [ -z ${src+x} ] then echo "Source is not specified" usage exit 1 fi if [ -z ${dest+x} ] then echo "Destination is not specified" usage exit 1 fi if [ -z ${token+x} ] then echo "Token is not specified" usage exit 1 fi recurse "$src" "$dest" "$token"
Perl
Download source (remove .txt extension when downloaded)
Command-Line/Terminal Usage:
% perl laads-data-download.pl
#!/usr/bin/env perl use strict; use warnings; use Getopt::Long qw( :config posix_default bundling no_ignore_case no_auto_abbrev); use LWP::UserAgent; use LWP::Simple; use JSON; my $source = undef; my $destination = undef; my $token = undef; GetOptions( 's|source=s' => \$source, 'd|destination=s' => \$destination, 't|token=s' => \$token) or die usage(); sub usage { print "Usage:\n"; print " $0 [options]\n\n"; print "Description:\n"; print " This script will recursively download all files if they don't exist\n"; print " from a LAADS URL and stores them to the specified path\n\n"; print "Options:\n"; print " -s|--source [URL] Recursively download files at [URL]\n"; print " -d|--destination [path] Store directory structure to [path]\n"; print " -t|--token [token] Use app token [token] to authenticate\n"; } sub recurse { my $src = $_[0]; my $dest = $_[1]; my $token = $_[2]; my $ua = LWP::UserAgent->new; print "Recursing $dest\n"; my $req = HTTP::Request->new(GET => $src.".json"); $req->header('Authorization' => 'Bearer '.$token); my $resp = $ua->request($req); if ($resp->is_success) { my $message = $resp->decoded_content; my $listing = decode_json($message); for my $entry (@$listing){ if($entry->{size} == 0){ mkdir($dest."/".$entry->{name}); recurse($src.'/'.$entry->{name}, $dest.'/'.$entry->{name}, $token); } } for my $entry (@$listing){ # Set below to 1 for download progress, or consider LWP::UserAgent::ProgressBar $ua->show_progress(0); if($entry->{size} != 0 and ! -e $dest.'/'.$entry->{name}){ print "Downloading $dest/$entry->{name}\n"; my $req = HTTP::Request->new(GET => $src.'/'.$entry->{name}); $req->header('Authorization' => 'Bearer '.$token); my $resp = $ua->request($req, $dest.'/'.$entry->{name}); } else { print "Skipping $entry->{name} ...\n"; } } } else { print "HTTP GET error code: ", $resp->code, "\n"; print "HTTP GET error message: ", $resp->message, "\n"; } } if(!defined($source)){ print "Source not set\n"; usage(); die; } if(!defined($destination)){ print "Destination not set\n"; usage(); die; } if(!defined($token)){ print "Token not set\n"; usage(); die } recurse($source, $destination, $token);
Interactive Perl simulation of gftp using http:
% perl gftp
#!/usr/bin/perl # This script simulates the interactive behavior of the ftp tool that is # available on linux machines, but uses HTTP instead of FTP to communicate # with the server. Since it uses only core perl modules, it should run # anywhere that perl is available. # # NOTE: this script does use the "curl" command for downloading # resources from the server. You must have curl installed on # your system. # # curl is available from https://curl.haxx.se/ use strict; use warnings; use Cwd; use File::Basename; use File::Path qw(make_path); use JSON::PP; use Term::ReadLine; use Term::ANSIColor qw(:constants :pushpop); my $http_url = "https://ladsweb.modaps.eosdis.nasa.gov/archive/allData"; my $pwd = '/'; # check earthdata token my $TOKEN = undef; if(0 != loadToken()){ saveToken(); } help(); my $TERM = Term::ReadLine->new('GFTP'); while(1){ # print "$pwd>"; # my $input = <>; $TERM->ornaments(0); my $input = $TERM->readline("$pwd> "); chomp $input; my @parms = split(/\s+/, $input); my $cmd = lc(shift @parms) if scalar @parms; next unless $cmd; if("$cmd" eq "q" or "$cmd" eq "exit" or "$cmd" eq "bye"){ last; } if("$cmd" eq "ls"){ my $dir = $parms[0]; $dir = '.' unless $dir; my $pattern = ''; if($dir =~ /\*$/){ my $dir2 = dirname($dir); $pattern = basename($dir); $pattern =~ s/\*//g; $dir = $dir2; } my $src = normalize_path($pwd, $dir); my ($files, $dirs) = check($src, [$pattern]); if (! defined $files && ! defined $dirs) { print "no such directory: $src\n"; next; } else { foreach (@$dirs){ print BLUE, "$_\n", RESET; } foreach (@$files){ print "$_\n"; } } } # ls elsif("$cmd" eq "cd"){ $pwd = normalize_path($pwd, $parms[0]); } # cd elsif("$cmd" eq "lcd"){ my $dir = @parms[0]; if (! -r $dir ){ print RED, "Error: local dir [$dir] not exists.\n", RESET; } else{ chdir $dir; } } # lcd elsif("$cmd" eq "pwd"){ print "[$pwd]\n"; } #pwd elsif("$cmd" eq "lpwd"){ print "Now in local dir: [", cwd(), "]\n"; } #lpwd elsif("$cmd" eq "get"){ print "[get] : no file specified." if scalar @parms < 1; foreach my $item (@parms){ my $src = $pwd; if($item =~ m|^/|){ my @parts = split(m|/+|, normalize_path($pwd, $item)); $item = pop @parts; $src = join('/', @parts); $src = '/' unless $src; } http_get($src, [$item]); } } # get elsif("$cmd" eq "mget"){ @parms = ('.*') unless @parms and scalar @parms > 0; http_get($pwd, [@parms], 1); } # mget elsif("$cmd" eq "token"){ saveToken(); } # token elsif("$cmd" eq "?"){ help(); } #? print "\n"; } exit; # remove .. and . directory pieces from the path so that it is in normalized form. sub normalize_path { my ($current_working_dir, $path) = @_; $path = '.' unless $path; my $nocheck = 0; #remove trailing '/' $current_working_dir =~ s|/+$||; $path =~ s|/+$||; my $_pwd = $current_working_dir; if($path =~ m|^/|){ $_pwd = $path; } elsif($path eq '.'){ $nocheck=1; } else{ $_pwd = join('/', $_pwd, $path); } # handle requests for parent directories if($_pwd =~ /\.\./){ while ((my $pos = index($_pwd, '..')) >= 0) { my $start = rindex($_pwd, '/', $pos-2); $start = 0 unless $start >= 0; my $new_dir = substr($_pwd, 0, $start); $pos += 2; my $end_str = substr($_pwd, $pos); $new_dir = join('', $new_dir, $end_str) if $end_str; $_pwd = $new_dir; } } # handle requests for current directories $_pwd =~ s|^\./||; while ($_pwd =~ m|/\./|) { $_pwd =~ s|/\./|/|g; } $_pwd =~ s|/\.$||; $_pwd = '/' unless $_pwd; if($nocheck || defined check($_pwd)) { return $_pwd; } print RED, "Error: [$_pwd] not exists.\n", RESET; return $current_working_dir; } # get the contents of a directory sub check { my ($from, $patterns) = @_; die "no from" unless $from; $patterns = [''] unless $patterns && scalar @$patterns; my $json_str=`curl -s -H "Authorization: Bearer $TOKEN" "${http_url}/${from}.json"`; if ($json_str =~ /) { return undef; # got html, probably an error page } my $json = decode_json($json_str); my $files = []; my $dirs = []; foreach my $row (@{$json}) { if ($row->{size} == 0) { foreach my $regex (@$patterns) { chomp $regex; $regex = '.*' unless $regex; push @$dirs, $row->{name} if $row->{name} =~ /$regex/; } } else { foreach my $regex (@$patterns) { chomp $regex; $regex = '.*' unless $regex; push @$files, $row->{name} if $row->{name} =~ /$regex/; } } } return ($files, $dirs); } # get the specified file(s) from the specified directory sub http_get { my ($from, $patterns, $recursive) = @_; die "no from" unless $from; my ($files, $dirs) = check($from, $patterns); foreach my $file (@$files){ print("fetching $from/$file\n"); my $out_location = "$from"; $out_location =~ s|^/||; if (-d $out_location) { $out_location = "$out_location/$file"; } else { $out_location = $file; } my $cmd = join(' ', 'curl', qq{-o "$out_location"}, '-s', qq{-H "Authorization: Bearer $TOKEN"}, qq{"$http_url/$from/$file"}, ); my $result = system($cmd); if ($result != 0) { print RED, "FAIL: $cmd\n", RESET; } } if ($recursive) { foreach my $dir (@$dirs){ # this is recursive and can get a LOT of files, so ask user and make sure # it's what they want. print GREEN, " $dir is a directory. Download all matching files from it?[ynq]> ", RESET; my $input = $TERM->readline(); chomp $input; if ($input =~ /^[yY]/) { my $path = "$from/$dir"; $path =~ s|^/||; make_path($path) unless -d $path; http_get("/$path", $patterns, $recursive); } last if $input =~ /^[qQ]/; } } } # load the URS authentication token from special file if there is one sub loadToken{ my $home = glob('~/'); my $tokenFile = "$home/.earthdatatoken"; if(-r $tokenFile){ open (IN, '<', $tokenFile)||die "Can't open $tokenFile: $!\n"; $TOKEN =; chomp $TOKEN; close(IN); print "Token loaded: [$TOKEN].\n"; return 0; } else{ return 9; } } # prompt user for token, and save it in special file sub saveToken{ my $home = glob('~/'); my $tokenFile = "$home/.earthdatatoken"; print "Input token:\n"; $TOKEN = <>; chomp($TOKEN); open (OUT, '>', $tokenFile)||die "Can't open $tokenFile: $!\n"; print OUT "$TOKEN\n"; close(OUT); print "Token saved.\n"; } # print out command menu sub help{ print "Supported cmd: [ls] [cd] [lcd] [pwd] [lpwd] [get] [mget] [token] [q]\n"; print "[ls]: list dirs and files in remote dir\n"; print "[cd]: go to remote dir\n"; print "[lcd]: go to local dir\n"; print "[pwd]: print the current remote dir\n"; print "[lpwd]: print the current local dir\n"; print "[get]: download one or more specified files to current local dir\n"; print "[mget]: download files that match a pattern to current local dir.\n"; print " Don't use *; e.g.: mget h12v04; mget hdf\n"; print " Will also recursively download files from matching subdirectories.\n"; print "[token]: change token\n"; print "[q]: quit\n"; print "[?]: show help message\n"; } 0;
Python
Download source (remove .txt extension when downloaded)
Command-Line/Terminal Usage:
% python laads-data-download.py
#!/usr/bin/env python # script supports either python2 or python3 # # Attempts to do HTTP Gets with urllib2(py2) urllib.requets(py3) or subprocess # if tlsv1.1+ isn't supported by the python ssl module # # Will download csv or json depending on which python module is available # from __future__ import (division, print_function, absolute_import, unicode_literals) import argparse import os import os.path import shutil import sys try: from StringIO import StringIO # python2 except ImportError: from io import StringIO # python3 ################################################################################ USERAGENT = 'tis/download.py_1.0--' + sys.version.replace('\n','').replace('\r','') def geturl(url, token=None, out=None): headers = { 'user-agent' : USERAGENT } if not token is None: headers['Authorization'] = 'Bearer ' + token try: import ssl CTX = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2) if sys.version_info.major == 2: import urllib2 try: fh = urllib2.urlopen(urllib2.Request(url, headers=headers), context=CTX) if out is None: return fh.read() else: shutil.copyfileobj(fh, out) except urllib2.HTTPError as e: print('HTTP GET error code: %d' % e.code(), file=sys.stderr) print('HTTP GET error message: %s' % e.message, file=sys.stderr) except urllib2.URLError as e: print('Failed to make request: %s' % e.reason, file=sys.stderr) return None else: from urllib.request import urlopen, Request, URLError, HTTPError try: fh = urlopen(Request(url, headers=headers), context=CTX) if out is None: return fh.read().decode('utf-8') else: shutil.copyfileobj(fh, out) except HTTPError as e: print('HTTP GET error code: %d' % e.code(), file=sys.stderr) print('HTTP GET error message: %s' % e.message, file=sys.stderr) except URLError as e: print('Failed to make request: %s' % e.reason, file=sys.stderr) return None except AttributeError: # OS X Python 2 and 3 don't support tlsv1.1+ therefore... curl import subprocess try: args = ['curl', '--fail', '-sS', '-L', '--get', url] for (k,v) in headers.items(): args.extend(['-H', ': '.join([k, v])]) if out is None: # python3's subprocess.check_output returns stdout as a byte string result = subprocess.check_output(args) return result.decode('utf-8') if isinstance(result, bytes) else result else: subprocess.call(args, stdout=out) except subprocess.CalledProcessError as e: print('curl GET error message: %' + (e.message if hasattr(e, 'message') else e.output), file=sys.stderr) return None ################################################################################ DESC = "This script will recursively download all files if they don't exist from a LAADS URL and stores them to the specified path" def sync(src, dest, tok): '''synchronize src url with dest directory''' try: import csv files = [ f for f in csv.DictReader(StringIO(geturl('%s.csv' % src, tok)), skipinitialspace=True) ] except ImportError: import json files = json.loads(geturl(src + '.json', tok)) # use os.path since python 2/3 both support it while pathlib is 3.4+ for f in files: # currently we use filesize of 0 to indicate directory filesize = int(f['size']) path = os.path.join(dest, f['name']) url = src + '/' + f['name'] if filesize == 0: try: print('creating dir:', path) os.mkdir(path) sync(src + '/' + f['name'], path, tok) except IOError as e: print("mkdir `%s': %s" % (e.filename, e.strerror), file=sys.stderr) sys.exit(-1) else: try: if not os.path.exists(path): print('downloading: ' , path) with open(path, 'w+b') as fh: geturl(url, tok, fh) else: print('skipping: ', path) except IOError as e: print("open `%s': %s" % (e.filename, e.strerror), file=sys.stderr) sys.exit(-1) return 0 def _main(argv): parser = argparse.ArgumentParser(prog=argv[0], description=DESC) parser.add_argument('-s', '--source', dest='source', metavar='URL', help='Recursively download files at URL', required=True) parser.add_argument('-d', '--destination', dest='destination', metavar='DIR', help='Store directory structure in DIR', required=True) parser.add_argument('-t', '--token', dest='token', metavar='TOK', help='Use app token TOK to authenticate', required=True) args = parser.parse_args(argv[1:]) if not os.path.exists(args.destination): os.makedirs(args.destination) return sync(args.source, args.destination, args.token) if __name__ == '__main__': try: sys.exit(_main(sys.argv)) except KeyboardInterrupt: sys.exit(-1)
Last updated: December 2, 2019