TAP interface

The Table Access Protocol (TAP) was developed by IVOA is a second generation DAL interface being developed to provide a general access mechanism for tabular data, including but not limited to astronomical catalogs.

The URL for Gaia@AIP’s TAP service is:
https://gaia.aip.de/tap/

There are numerous ways to use TAP that are explained below.

You can find the full TAP documentation on the IVOA documentation website.

How to use TAP in Python

To run the code examples below, you will need Python 3 and pyvo version>=1.0.

Install pyvo using pip:

pip install pyvo==1.0

To use the TAP interface with your account and to have the results stored in your private database, you can provide pyvo with your API token. The token can be accessed under the url /accounts/token/ of the particular Gaia@AIP site. You’ll need to replace the example token with the token from gaia.aip.de/accounts/token/.

The TAP protocol has two types of jobs. First one is a synchronous job, where the query result comes back immediately and is used for very fast searches. The other one is asynchronous job. The async job is submitted and the results come back after the job is completed. You can devide your search results into chunks and run multiple jobs. The scripts below are examples how to do that.

Run a very short query as a sync job

This code will fetch results for a very simple query that will run very fast.

import requests
import pyvo as vo

name = 'GAIA@AIP'
url = 'https://gaia.aip.de/tap'
token = 'Token YOURTOKEN'
qstr = 'select top 10 * from gaiadr2.gaia_source'

print('\npyvo version %s \n' % vo.__version__)
print('TAP service %s \n' % name)

# Setup authorization
tap_session = requests.Session()
tap_session.headers['Authorization'] = token

tap_service = vo.dal.TAPService(url, session=tap_session)

tap_result = tap_service.run_sync(qstr)
tap_result.to_table()

print(tap_result)

Run a larger query as an asyn job

This code will fetch results for one single query.

from pkg_resources import parse_version
import requests
import pyvo as vo
from pyvo.auth.authsession import AuthSession
import pandas as pd

#
# Verify the version of pyvo 
#
if parse_version(vo.__version__) < parse_version('1.0'):
    raise ImportError('pyvo version must larger than 1.0')
    
print('\npyvo version {version} \n'.format(version=vo.__version__))

#
# Setup tap_service
#
name = 'Gaia@AIP'
url = "https://gaia.aip.de/tap"
token = 'Token YOUR_TOKEN'
query = "select source_id,xgal,ygal,zgal,rgal,ruwe,mg0,bprp0 from gaiadr2_contrib.starhorse as s where s.SH_OUTFLAG LIKE '00000' AND s.SH_GAIAFLAG LIKE '000' LIMIT 100000 OFFSET 0"

print('TAP service %s \n' % name)

# Setup authorization
tap_session = requests.Session()
tap_session.headers['Authorization'] = token

tap_service = vo.dal.TAPService(url, session=tap_session)

#
# Submit the query as an asyn job
#
lang = 'PostgreSQL'
job = tap_service.submit_job(query, language=lang)
job.run()

#
# Follow the phases of the job.
#
# Possible phases:
# "QUEUED", "EXECUTING", "COMPLETED", "ERROR", "ABORTED"
#
print('JOB ' + str(job.phase))

# Wait for Executing
job.wait(phases=["EXECUTING", "ERROR", "ABORTED"], timeout=10.)
print('JOB ' + str(job.phase))

# Wait for Completed
job.wait(phases=["COMPLETED", "ERROR", "ABORTED"], timeout=10.)
print('JOB ' + (job.phase))

#
# Fetch the results
#
job.raise_if_error()
print('\nfetching the results...')
results = job.fetch_result()
print('...DONE\n')

#
# Convert to a pandas.DataFrame
#
results = results.to_table().to_pandas()
results.head()

Devide your query in chunks and run multiple jobs in a script

The following example will fetch results for a query divided into chunks.

from pkg_resources import parse_version
import requests
import pyvo as vo
from pyvo.auth.authsession import AuthSession
import pandas as pd

#
# Verify the version of pyvo 
#
if parse_version(vo.__version__) < parse_version('1.0'):
    raise ImportError('pyvo version must larger than 1.0')
    
print('\npyvo version {version} \n'.format(version=vo.__version__))

#
# Setup tap_service
#
name = 'Gaia@AIP'
url = "https://gaia.aip.de/tap"
token = 'Token YOUR_TOKEN'

print('TAP service %s \n' % name)

#
# Setup authorisation
#
tap_session = requests.Session()
tap_session.headers['Authorization'] = token

tap_service = vo.dal.TAPService(url, session=tap_session)

#
# Submit queries
#
lang='PostgreSQL'
jobs = []
limit = 1000
total = 10000
base_query = "select source_id,xgal,ygal,zgal,rgal,ruwe,mg0,bprp0 from gaiadr2_contrib.starhorse as s where s.SH_OUTFLAG LIKE '00000' AND s.SH_GAIAFLAG LIKE '000' LIMIT {limit:d} OFFSET {offset:d}"

i=0
for offset in range(0, total, limit):

    query = base_query.format(limit=limit, offset=offset)
    print(query)
    job = tap_service.submit_job(query, language=lang, runid='batch_'+str(i))
    job.run()
    jobs.append(job)
    i = i + 1

#
# Collect the results
#
frames = ()
for job in jobs:

    print('getting results from ' + str(job.job.runid))
    job.raise_if_error()

    job.wait(phases=["COMPLETED", "ERROR", "ABORTED"], timeout=10.)
    print(str(job.job.runid) + ' ' + str(job.phase))

    if job.phase in ("ERROR", "ABORTED"):
        pass

    else:
        tap_result = job.fetch_result()
        frames = frames + (tap_result.to_table().to_pandas(),)

#
# Contatenate into a pandas.DataFrame
#
df_results = pd.concat(frames)
df_results.head()

How to use TAP with TOPCAT

Topcat is a Tool for OPerations on Catalogues And Tables.

To use TAP in Gaia@AIP in Topcat, you’ll need to select Gaia@AIP as the service provider in Topcat:
https://gaia.aip.de/tap </code/>

TAP service in Topcat


TAP service in Topcat

Now you can select the table:

TAP interface with Topcat


TAP interface in Topcat

Here you’ll find a documentation how to use TAP in Topcat:

How to use TAP on the command line

TAP can also be used with a HTTP command line client. Here we use HTTPie, but there are a lot of other similar clients (e.g. curl).

As with the Python interface, you can also use your personal token to authenticate with the system to use your personal account (and your personal joblist, quota etc.). To do so, you need to send the token as part of the Authorization header with every HTTP request.

http https://gaia.aip.de/tap/async Authorization:"Token YOURTOKEN"
# retrieve the job list
http https://gaia.aip.de/tap/async

# submit an asyncronous job (using PostgreSQL and the 5 minutes queue)
http -f --follow POST https://gaia.aip.de/tap/async \
    QUERY="SELECT ra, dec FROM gaiadr2.gaia_source WHERE random_index < 100" \
    LANG="postgresql-9.6" QUEUE="5m" PHASE="RUN"

# get all the information about a job
http https://gaia.aip.de/tap/async/78d9c528-8cf0-46e3-8a5b-ec151229a30b

# check the status of a job
http https://gaia.aip.de/tap/async/78d9c528-8cf0-46e3-8a5b-ec151229a30b/phase

# get the results of a job as csv or votable
http https://gaia.aip.de/tap/async/78d9c528-8cf0-46e3-8a5b-ec151229a30b/results/csv
http https://gaia.aip.de/tap/async/78d9c528-8cf0-46e3-8a5b-ec151229a30b/results/votable

# archive the job (this deletes the database table and frees up space)
http --follow DELETE https://gaia.aip.de/tap/async/78d9c528-8cf0-46e3-8a5b-ec151229a30b