Introduction
Welcome to the Catalog Archive Server Jobs System, or CasJobs. This guide
assumes you have experience with basic SQL syntax. If you'd like to learn more
about SQL there are plenty of other
tutorials
on the web. This guide will cover the slightly simpler varient of SQL that
CasJobs uses.
Registration
Before you can query this system you'll need to
register
and then
log in. Your queries and their resulting data
are associated with your account, so don't forget your login! But if lightning
strikes, skies fall and you've consequently forgot your password it can be
emailed to you from
here.
Your email may be used to contact you about your account, as well as optionally
notifing about query completion. Your UserID will be used to identify you to
the group world; should you choose to participate in this please pick a 'nice'
UserID. A personal database, or MyDB is also created and assigned to your
account on registration.
MyDB
Your MyDB is a personal database created just for you upon registration. You
have full privileges in this context; you can create/alter/drop tables,
functions and stored procedures. It is designed to be a sort of staging area
where you refine your requested data before eventually extracting it to a local
copy in CSV, FITS, etc. format. Please note that you can only extract data from
MyDB, so any data you will eventually want a local copy of must first be put
into MyDB.
Getting Data Into MyDB
Getting data into MyDB is easy. Just write your query, then click submit and it
will automatically create a table using the name from the 'Table' field
(directly above the query box) in MyDB containing the results of your query.
Alternatively, you use an 'into' statement in your query; an example of this is
below.
Select top 1000 objid,ra,dec into mydb.MyFirstTable
from photoobjall
This will create a table called 'MyFirstTable' in MyDB containing the columns
specified by the select clause. 'Into' statements are generally placed directly
before the 'from' keyword of a query. If your query doesn't have a 'from', then
'into' just goes at the end. An into statement in your query always takes
precedence over whatever is in the 'Table' field, so if you have both, only the
one in the query is used. Please note that the 'Table' field is only used when
the submit button is clicked; queries submitted using the quick method will
display results by default instead of automatically creating a table.
Importing Data Into MyDB
You can also import your own, custom data into MyDB. This is done from the
'Import' page, link located at the top. More info on importing tables is on
that page.
Downloading Data From MyDB
Once you've got a table in MyDB, you can then extract and download that table
in CSV, VOTABLE or FITS file format. This is done from the MyDB page. To do
this, click MyDB (at the top), then click the table on the left, choose your
output type then click extract. You can only extract tables in MyDB.
Context and Queues
'Context' is the database on which your query will be run. It is determined by
the Context drop down list on the query page. So if something like 'GALEXGR1' is
selected, then your query will be run on GALEXGR1, if 'MyDB' then MyDB, etc. The
'Queue' selection, also a drop down list, will determine the maximum time a
query will be permitted to run. Every context has at least one queue. If your
chosen context has more than one queue it is best to choose the queue that is
closest to the amount of time you think your query will take. Queues are
optimized for queries of their time limit, so chosing an appropriate queue is
generally faster.
Referencing Objects Outside of Context
For those situations where you'd like to reference data outside of your
selected context you can prefix the foreign object. If you've been following
this guide, you've already done this. This query uses a 'MyDB' prefix:
Select top 1000 objid,ra,dec into mydb.MyFirstTable
from photoobjall
So assuming the context of this query was 'GALEXGR1', this means that 'MyFirstTable'
is to be created at 'MyDB', not 'GALEXGR1'. Using prefixes, you could rewrite this
query so it runs in 'MyDB' context, like:
Select top 1000 objid,ra,dec into MyFirstTable from GALEXGR1.photoobjall
Since the context has changed to 'MyDB', this time the prefix 'GALEXGR1' indicates
that the photoobjall table is not 'mydb', but 'GALEXGR1'. Context prefix can be used on
tables and stored procedures, but not functions (see function limitations).
Valid context prefixes are any that appear in the context drop down list in the
query page.
Quick Execution
If you head over to the
query
page, you'll notice there are two buttons at the bottom of the query space, one
labelled 'submit', the other 'quick'. These are different types of query
execution. 'Quick' is a synchronous execution action; it will run your query
right away at the currently selected context and then return the results right
below the query window. It is the quickest way to run something and is ideal
for quick things like creating functions, procedures, top 10's, etc. However
there are restrictions on this form of execution; quick queries are limited to
one minute of execution and can only return about one meg of data before they
will be cancelled.
Submit Execution
For queries that excede 'quick' limitations use the 'Submit' button. 'Submit'
is an asynchronous execution action; it does not return the results right away.
It will submit your query to the selected context and queue and will run it as
soon as a slot in that queue is available. You can check the status of your
queries from the History page, link located at the top of the page. (must be
logged in)
Sharing Data, Using Groups
Groups are a mechanism for allowing other people to view specified tables in
your mydb. They are managed from the 'Groups' page, link located at the top.
From there you can join a group, create a group, or invite other users to join
your group. Users within a group can see and reference tables that other group
members have published. You can publish any table in your mydb from the mydb
page by clicking on the desired table and then clicking publish. You can remove
a table from publication at any time, but you cannot drop or rename that table
until it has been depublished. Group tables that you can see are listed at the
mydb page under tables. You can reference that table in your own query by using
the same name you see at the mydb page, specifically group.[user].[tablename].
Other group members have read only access to your published tables; they cannot
change or drop them.
CasJobs Limitations due to Distributed Execution
One of the main purposes of the CasJobs service is to provide load-balancing,
i.e. distribute the queries among different servers that serve the GALEXGR1 data.
This is accomplished using the SQL Server linked server functionality. However,
this does introduce certain limitations and differences between standard SQL
capabilities and CasJobs which may be confusing for users. The most prominent
example of this is the inability to run
DROP TABLE and
CREATE TABLE
commands outside of the MyDB context. For example, the following commands will
not work in the GALEXGR1 context:
DROP TABLE mydb.mytable_21
CREATE TABLE mydb.mytable_21 (objid int, ra float, dec float)
CasJobs will return an error message indicating that there are too many prefixes
in front of the table name. This error is in fact coming from SQL Server which
does not allow tables to be dropped or created across linked servers
(remotely). The difference between INSERT and SELECT ... INTO commands
and DROP/CREATE TABLE commands is that SQL Server actually permits
inserting into a table remotely, but not dropping or creating it
.
The same applies to calling functions remotely - SQL server does not allow
calling functions remotely
. Hence you cannot call functions outside your current context, e.g., you
cannot run the command
SELECT GALEXGR1.fGetNearestObjEqICM(180,-0.5,1.0)
in the MyDB context.
CasJobs Syntax vs. SkyServer/sdssQA Syntax
If you're used to joining against the target db through the skyserver interface
with something like GALEXGR1..photoobjall, CasJobs works very much the same way.
The difference is the prefix you use for CasJobs must be a valid context
(listed in the context drop down menu). So, if 'GALEXGR1' was your current context
and you were interested in joining with a context called 'GALEXGR1', you could
say 'GALEXGR1.photoobjall', assuming you were running from within 'GALEXGR1' context.
CasJobs will also accept the '..' construct. See 'Referencing Objects Outside
of Context' for more info.
CasJobs vs. SQL
CasJobs uses a virtual prefix system rather than the three part prefix system
used by sql server. A virtual prefix is simply a context name, such as "GALEXGR1" or
"MyDB". So instead of using a multi-part prefix, like:
Select x,y,z from GALEXGR1.dbo.sometable
or
Select x,y,z from GALEXGR1..sometable
You would simply use the context as the prefix, like this:
Select x,y,z from GALEXGR1.sometable
Just like SQL, you would use prefixes whenever you are referencing something
that is outside of your current selected context.
Contact MAST
CASJobs is made possible by the Sloan Digital Sky Survey Collaboration
$Name: v2_8_3 $
,$Revision: 1.14 $, Last modified: Thursday, January 19, 2006 at 12:40:50 PM