LAADS Data Download Scripts

For information on how to work with download tokens, see How to Access LAADS Data

Automation

If all you need is one file and you know which file it is, it is much easier to go to the LAADS archive and click to download as needed. If you need many files (e.g. all of last month\'s MOD09 data) you might prefer to rely on scripts. We have samples for Shell Script, Perl, and Python.

Code Samples

Most current programming languages support HTTPS communication or can call on applications that support HTTPS communication. See sample scripts below. We provide support for wget, linux shell script, Perl, and Python. When recursively downloading entire directories of files, wget will likely require the least amount of code to run.

To use these, click "Download source" to download or copy and paste the code into a file with an extension reflecting the programming language (.sh for Shell Script, .pl for Perl, .py for Python). Be sure the Unix execute permissions are set for the file. Lastly, open a terminal or shell and execute the file. Command-line examples are also included below.

wget

wget is an open source utility that can download entire directories of files with just one command. The only path that is required is the root directory. wget will automatically traverse the directory and download any files it locates.

wget is free and available for Linux, macOS, and Windows.

Installation

  1. Linux
    1. Launch a command-line terminal
    2. Type yum install wget -y
  2. macOS
    1. Install Homebrew (admin privileges required)
    2. Launch Applications > Utiliites > Terminal
    3. Type brew install wget
  3. Windows
    1. Download the latest 32-bit or 64-bit binary (.exe) for your system
    2. Move it to C:\Windows\System32 (admin privileges will be required)
    3. Click the Windows Menu > Run
    4. Type cmd and hit Enter
    5. In the Windows Command Prompt, type wget -h to verify the binary is being executed successfully
    6. If any errors appear, the wget.exe binary may be not be located in correct directory or you may need to switch from 32-bit <-> 64-bit

Command-Line/Terminal Usage:

wget -e robots=off -m -np -R .html,.tmp -nH --cut-dirs=3 "https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/PATH_TO_DATA_DIRECTORY" --header "Authorization: Bearer MY_TOKEN" -P TARGET_DIRECTORY_ON_YOUR_FILE_SYSTEM

Be sure to replace the following:

  • PATH_TO_DATA_DIRECTORY: location of source directory in LAADS Archive
  • TOKEN: Your token
  • TARGET_DIRECTORY_ON_YOUR_FILE_SYSTEM: Where you would like to download the files. Examples include /Users/jdoe/data for macOS and Linux or C:\Users\jdoe\data for Windows

Linux Shell Script

Download source (remove .txt extension when downloaded)

Command-Line/Terminal Usage:

% laads-data-download.sh
#!/bin/bash

function usage {
  echo "Usage:"
  echo "  $0 [options]"
  echo ""
  echo "Description:"
  echo "  This script will recursively download all files if they don't exist"
  echo "  from a LAADS URL and stores them to the specified path"
  echo ""
  echo "Options:"
  echo "    -s|--source [URL]         Recursively download files at [URL]"
  echo "    -d|--destination [path]   Store directory structure to [path]"
  echo "    -t|--token [token]        Use app token [token] to authenticate"
  echo ""
  echo "Dependencies:"
  echo "  Requires 'jq' which is available as a standalone executable from"
  echo "  https://stedolan.github.io/jq/download/"
}

function recurse {
  local src=$1
  local dest=$2
  local token=$3
  
  echo "Querying ${src}.json"

  for dir in $(curl -s -H "Authorization: Bearer ${token}" ${src}.json | jq '.[] | select(.size==0) | .name' | tr -d '"')
  do
    echo "Creating ${dest}/${dir}"
    mkdir -p "${dest}/${dir}"
    echo "Recursing ${src}/${dir}/ for ${dest}/${dir}"
    recurse "${src}/${dir}/" "${dest}/${dir}"
  done

  for file in $(curl -s -H "Authorization: Bearer ${token}" ${src}.json | jq '.[] | select(.size!=0) | .name' | tr -d '"')
  do
    if [ ! -f ${dest}/${file} ] 
    then
      echo "Downloading $file to ${dest}"
      # replace '-s' with '-#' below for download progress bars
      curl -s -H "Authorization: Bearer ${token}" ${src}/${file} -o ${dest}/${file}
    else
      echo "Skipping $file ..."
    fi
  done
}

POSITIONAL=()
while [[ $# -gt 0 ]]
do
  key="$1"

  case $key in
    -s|--source)
    src="$2"
    shift # past argument
    shift # past value
    ;;
    -d|--destination)
    dest="$2"
    shift # past argument
    shift # past value
    ;;
    -t|--token)
    token="$2"
    shift # past argument
    shift # past value
    ;;
    *)    # unknown option
    POSITIONAL+=("$1") # save it in an array for later
    shift # past argument
    ;;
  esac
done

if [ -z ${src+x} ]
then 
  echo "Source is not specified"
  usage
  exit 1
fi

if [ -z ${dest+x} ]
then 
  echo "Destination is not specified"
  usage
  exit 1
fi

if [ -z ${token+x} ]
then 
  echo "Token is not specified"
  usage
  exit 1
fi

recurse "$src" "$dest" "$token"

Perl

Download source (remove .txt extension when downloaded)

Command-Line/Terminal Usage:

% perl laads-data-download.pl
#!/usr/bin/env perl
use strict;
use warnings;
use Getopt::Long qw( :config posix_default bundling no_ignore_case no_auto_abbrev);
use LWP::UserAgent;
use LWP::Simple;
use JSON;

my $source      = undef;
my $destination = undef;
my $token       = undef;

GetOptions( 's|source=s' => \$source, 'd|destination=s' => \$destination, 't|token=s' => \$token) or die usage();

sub usage {
  print "Usage:\n";
  print "  $0 [options]\n\n";
  print "Description:\n";
  print "  This script will recursively download all files if they don't exist\n";
  print "  from a LAADS URL and stores them to the specified path\n\n";
  print "Options:\n";
  print "  -s|--source [URL]         Recursively download files at [URL]\n";
  print "  -d|--destination [path]   Store directory structure to [path]\n";
  print "  -t|--token [token]        Use app token [token] to authenticate\n";
}

sub recurse {
  my $src   = $_[0];
  my $dest  = $_[1];
  my $token = $_[2];
  my $ua = LWP::UserAgent->new;
  print "Recursing $dest\n";
  my $req = HTTP::Request->new(GET => $src.".json");
  $req->header('Authorization' => 'Bearer '.$token);
  my $resp = $ua->request($req);
  if ($resp->is_success) {
    my $message = $resp->decoded_content;
    my $listing = decode_json($message);
    for my $entry (@$listing){
      if($entry->{size} == 0){
        mkdir($dest."/".$entry->{name});
        recurse($src.'/'.$entry->{name}, $dest.'/'.$entry->{name}, $token);
      }
    }

    for my $entry (@$listing){
      # Set below to 1 for download progress, or consider LWP::UserAgent::ProgressBar
      $ua->show_progress(0);
      if($entry->{size} != 0 and ! -e $dest.'/'.$entry->{name}){
        print "Downloading $dest/$entry->{name}\n";
        my $req = HTTP::Request->new(GET => $src.'/'.$entry->{name});
        $req->header('Authorization' => 'Bearer '.$token);
        my $resp = $ua->request($req, $dest.'/'.$entry->{name});
      } else {
        print "Skipping $entry->{name} ...\n";
      }
    }
  }
  else {
    print "HTTP GET error code: ", $resp->code, "\n";
    print "HTTP GET error message: ", $resp->message, "\n";
  }
}


if(!defined($source)){
  print "Source not set\n";
  usage();
  die;
}

if(!defined($destination)){
  print "Destination not set\n";
  usage();
  die;
}

if(!defined($token)){
  print "Token not set\n";
  usage();
  die
}

recurse($source, $destination, $token);

Interactive Perl simulation of gftp using http:

% perl gftp 
#!/usr/bin/perl
# This script simulates the interactive behavior of the ftp tool that is
# available on linux machines, but uses HTTP instead of FTP to communicate
# with the server. Since it uses only core perl modules, it should run
# anywhere that perl is available.
#
# NOTE: this script does use the "curl" command for downloading
#       resources from the server. You must have curl installed on
#       your system.
#
#       curl is available from https://curl.haxx.se/
use strict;
use warnings;

use Cwd;
use File::Basename;
use File::Path qw(make_path);
use JSON::PP;
use Term::ReadLine;
use Term::ANSIColor qw(:constants :pushpop);

my $http_url = "https://ladsweb.modaps.eosdis.nasa.gov/archive/allData";

my $pwd = '/';

# check earthdata token
my $TOKEN = undef;
if(0 != loadToken()){
  saveToken();
}

help();

my $TERM = Term::ReadLine->new('GFTP');

while(1){
#  print "$pwd>";
#  my $input = <>;
  $TERM->ornaments(0);
  my $input = $TERM->readline("$pwd> ");
  chomp $input;
  my @parms = split(/\s+/, $input);
  my $cmd = lc(shift @parms) if scalar @parms;
  next unless $cmd;

  if("$cmd" eq "q" or "$cmd" eq "exit" or "$cmd" eq "bye"){
    last;
  }

  if("$cmd" eq "ls"){
    my $dir = $parms[0];
    $dir = '.' unless $dir;

    my $pattern = '';
    if($dir =~ /\*$/){
      my $dir2 = dirname($dir);
      $pattern = basename($dir);

      $pattern =~ s/\*//g;
      $dir = $dir2;
    }

    my $src = normalize_path($pwd, $dir);
    my ($files, $dirs) = check($src, [$pattern]);
    if (! defined $files && ! defined $dirs) {
      print "no such directory: $src\n";
      next;
    }
    else {
      foreach (@$dirs){
        print BLUE, "$_\n", RESET;
      }

      foreach (@$files){
        print "$_\n";
      }
    }
  } # ls

  elsif("$cmd" eq "cd"){
    $pwd = normalize_path($pwd, $parms[0]);
  } # cd

  elsif("$cmd" eq "lcd"){
    my $dir = @parms[0];
    if (! -r $dir ){
      print RED, "Error: local dir [$dir] not exists.\n", RESET;
    }
    else{
      chdir $dir;
    }
  } # lcd

  elsif("$cmd" eq "pwd"){
    print "[$pwd]\n";
  } #pwd

  elsif("$cmd" eq "lpwd"){
    print "Now in local dir: [", cwd(), "]\n";
  } #lpwd

  elsif("$cmd" eq "get"){
    print "[get] : no file specified." if scalar @parms < 1;
    foreach my $item (@parms){
      my $src = $pwd;

      if($item =~ m|^/|){
        my @parts = split(m|/+|, normalize_path($pwd, $item));
        $item = pop @parts;
        $src = join('/', @parts);
        $src = '/' unless $src;
      }
      http_get($src, [$item]);
    }
  } # get
  
  elsif("$cmd" eq "mget"){
    @parms = ('.*') unless @parms and scalar @parms > 0;
    http_get($pwd, [@parms], 1);
  } # mget

  elsif("$cmd" eq "token"){
    saveToken();
  } # token
  
  elsif("$cmd" eq "?"){
    help();
  } #?

  print "\n";
}

exit;

# remove .. and . directory pieces from the path so that it is in normalized form.
sub normalize_path
{
  my ($current_working_dir, $path) = @_;
  $path = '.' unless $path;
  my $nocheck = 0;
  
  #remove trailing '/'
  $current_working_dir =~ s|/+$||;
  $path =~ s|/+$||;
  
  my $_pwd = $current_working_dir;
  if($path =~ m|^/|){
    $_pwd = $path;
  }
  elsif($path eq '.'){
    $nocheck=1;
  }
  else{
    $_pwd = join('/', $_pwd, $path);
  }

  # handle requests for parent directories
  if($_pwd =~ /\.\./){
    while ((my $pos = index($_pwd, '..')) >= 0) {
      my $start = rindex($_pwd, '/', $pos-2);
      $start = 0 unless $start >= 0;
      my $new_dir = substr($_pwd, 0, $start);
      $pos += 2;
      my $end_str = substr($_pwd, $pos);
      $new_dir = join('', $new_dir, $end_str) if $end_str;
      $_pwd = $new_dir;
    }
  }
  # handle requests for current directories
  $_pwd =~ s|^\./||;
  while ($_pwd =~ m|/\./|) {
    $_pwd =~ s|/\./|/|g;
  }
  $_pwd =~ s|/\.$||;
  $_pwd = '/' unless $_pwd;

  if($nocheck || defined check($_pwd)) {
    return $_pwd;
  }
  
  print RED, "Error: [$_pwd] not exists.\n", RESET;
  return $current_working_dir;
}

# get the contents of a directory
sub check
{
  my ($from, $patterns) = @_;
  die "no from" unless $from;
  $patterns = [''] unless $patterns && scalar @$patterns;
  
  my $json_str=`curl -s -H "Authorization: Bearer $TOKEN" "${http_url}/${from}.json"`;
  if ($json_str =~ /</)
  {
    return undef;  # got html, probably an error page
  }
  my $json = decode_json($json_str);
    
  my $files = [];
  my $dirs = [];
  foreach my $row (@{$json}) {
    if ($row->{size} == 0) {
      foreach my $regex (@$patterns) {
        chomp $regex;
        $regex = '.*' unless $regex;
        push @$dirs, $row->{name} if $row->{name} =~ /$regex/;
      }
    }
    else {
      foreach my $regex (@$patterns) {
        chomp $regex;
        $regex = '.*' unless $regex;
        push @$files, $row->{name} if $row->{name} =~ /$regex/;
      }
    }
  }

  return ($files, $dirs);
}

# get the specified file(s) from the specified directory
sub http_get
{
  my ($from, $patterns, $recursive) = @_;
  die "no from" unless $from;
  my ($files, $dirs) = check($from, $patterns);
  foreach my $file (@$files){
    print("fetching $from/$file\n");
    my $out_location = "$from";
    $out_location =~ s|^/||;
    if (-d $out_location) {
      $out_location = "$out_location/$file";
    }
    else {
      $out_location = $file;
    }
    my $cmd = join(' ',
      'curl',
      qq{-o "$out_location"},
      '-s',
      qq{-H "Authorization: Bearer $TOKEN"},
      qq{"$http_url/$from/$file"},
    );
    my $result = system($cmd);
    if ($result != 0) {
      print RED, "FAIL: $cmd\n", RESET;
    }
  }
  if ($recursive) {
    foreach my $dir (@$dirs){
      # this is recursive and can get a LOT of files, so ask user and make sure
      # it's what they want.
      print GREEN, "    $dir is a directory. Download all matching files from it?[ynq]> ", RESET;
      my $input = $TERM->readline();
      chomp $input;
      if ($input =~ /^[yY]/) {
        my $path = "$from/$dir";
        $path =~ s|^/||;
        make_path($path) unless -d $path;
        http_get("/$path", $patterns, $recursive);
      }
      last if $input =~ /^[qQ]/;
    }
  }
}

# load the URS authentication token from special file if there is one
sub loadToken{
  my $home = glob('~/');
  my $tokenFile = "$home/.earthdatatoken";

  if(-r $tokenFile){
    open (IN, '<', $tokenFile)||die "Can't open $tokenFile: $!\n";
    $TOKEN = <IN>;
    chomp $TOKEN;
    close(IN);
    print "Token loaded: [$TOKEN].\n";
    return 0;
  }
  else{
    return 9;
  }
}

# prompt user for token, and save it in special file
sub saveToken{
  my $home = glob('~/');
  my $tokenFile = "$home/.earthdatatoken";

  print "Input token:\n";
  $TOKEN = <>;
  chomp($TOKEN);
  open (OUT, '>', $tokenFile)||die "Can't open $tokenFile: $!\n";
  print OUT "$TOKEN\n";
  close(OUT);
  print "Token saved.\n";
}

# print out command menu
sub help{
  print "Supported cmd: [ls] [cd] [lcd] [pwd] [lpwd] [get] [mget] [token] [q]\n";
  print "[ls]: list dirs and files in remote dir\n";
  print "[cd]: go to remote dir\n";
  print "[lcd]: go to local dir\n";
  print "[pwd]: print the current remote dir\n";
  print "[lpwd]: print the current local dir\n";
  print "[get]: download one or more specified files to current local dir\n";
  print "[mget]: download files that match a pattern to current local dir.\n";
  print "        Don't use *; e.g.: mget h12v04; mget hdf\n";
  print "        Will also recursively download files from matching subdirectories.\n";
  print "[token]: change token\n";
  print "[q]: quit\n";
  print "[?]: show help message\n";
}

0;

Python

Download source (remove .txt extension when downloaded)

Command-Line/Terminal Usage:

% python laads-data-download.py
#!/usr/bin/env python

# script supports either python2 or python3
#
# Attempts to do HTTP Gets with urllib2(py2) urllib.requets(py3) or subprocess
# if tlsv1.1+ isn't supported by the python ssl module
#
# Will download csv or json depending on which python module is available
#

from __future__ import (division, print_function, absolute_import, unicode_literals)

import argparse
import os
import os.path
import shutil
import sys

try:
    from StringIO import StringIO   # python2
except ImportError:
    from io import StringIO         # python3


################################################################################


USERAGENT = 'tis/download.py_1.0--' + sys.version.replace('\n','').replace('\r','')


def geturl(url, token=None, out=None):
    headers = { 'user-agent' : USERAGENT }
    if not token is None:
        headers['Authorization'] = 'Bearer ' + token
    try:
        import ssl
        CTX = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
        if sys.version_info.major == 2:
            import urllib2
            try:
                fh = urllib2.urlopen(urllib2.Request(url, headers=headers), context=CTX)
                if out is None:
                    return fh.read()
                else:
                    shutil.copyfileobj(fh, out)
            except urllib2.HTTPError as e:
                print('HTTP GET error code: %d' % e.code(), file=sys.stderr)
                print('HTTP GET error message: %s' % e.message, file=sys.stderr)
            except urllib2.URLError as e:
                print('Failed to make request: %s' % e.reason, file=sys.stderr)
            return None

        else:
            from urllib.request import urlopen, Request, URLError, HTTPError
            try:
                fh = urlopen(Request(url, headers=headers), context=CTX)
                if out is None:
                    return fh.read().decode('utf-8')
                else:
                    shutil.copyfileobj(fh, out)
            except HTTPError as e:
                print('HTTP GET error code: %d' % e.code(), file=sys.stderr)
                print('HTTP GET error message: %s' % e.message, file=sys.stderr)
            except URLError as e:
                print('Failed to make request: %s' % e.reason, file=sys.stderr)
            return None

    except AttributeError:
        # OS X Python 2 and 3 don't support tlsv1.1+ therefore... curl
        import subprocess
        try:
            args = ['curl', '--fail', '-sS', '-L', '--get', url]
            for (k,v) in headers.items():
                args.extend(['-H', ': '.join([k, v])])
            if out is None:
                # python3's subprocess.check_output returns stdout as a byte string
                result = subprocess.check_output(args)
                return result.decode('utf-8') if isinstance(result, bytes) else result
            else:
                subprocess.call(args, stdout=out)
        except subprocess.CalledProcessError as e:
            print('curl GET error message: %' + (e.message if hasattr(e, 'message') else e.output), file=sys.stderr)
        return None



################################################################################


DESC = "This script will recursively download all files if they don't exist from a LAADS URL and stores them to the specified path"


def sync(src, dest, tok):
    '''synchronize src url with dest directory'''
    try:
        import csv
        files = [ f for f in csv.DictReader(StringIO(geturl('%s.csv' % src, tok)), skipinitialspace=True) ]
    except ImportError:
        import json
        files = json.loads(geturl(src + '.json', tok))

    # use os.path since python 2/3 both support it while pathlib is 3.4+
    for f in files:
        # currently we use filesize of 0 to indicate directory
        filesize = int(f['size'])
        path = os.path.join(dest, f['name'])
        url = src + '/' + f['name']
        if filesize == 0:
            try:
                print('creating dir:', path)
                os.mkdir(path)
                sync(src + '/' + f['name'], path, tok)
            except IOError as e:
                print("mkdir `%s': %s" % (e.filename, e.strerror), file=sys.stderr)
                sys.exit(-1)
        else:
            try:
                if not os.path.exists(path):
                    print('downloading: ' , path)
                    with open(path, 'w+b') as fh:
                        geturl(url, tok, fh)
                else:
                    print('skipping: ', path)
            except IOError as e:
                print("open `%s': %s" % (e.filename, e.strerror), file=sys.stderr)
                sys.exit(-1)
    return 0


def _main(argv):
    parser = argparse.ArgumentParser(prog=argv[0], description=DESC)
    parser.add_argument('-s', '--source', dest='source', metavar='URL', help='Recursively download files at URL', required=True)
    parser.add_argument('-d', '--destination', dest='destination', metavar='DIR', help='Store directory structure in DIR', required=True)
    parser.add_argument('-t', '--token', dest='token', metavar='TOK', help='Use app token TOK to authenticate', required=True)
    args = parser.parse_args(argv[1:])
    if not os.path.exists(args.destination):
        os.makedirs(args.destination)
    return sync(args.source, args.destination, args.token)


if __name__ == '__main__':
    try:
        sys.exit(_main(sys.argv))
    except KeyboardInterrupt:
        sys.exit(-1)

Last updated: November 13, 2020