Start of Main Content

Overview

The Kellogg Data Cloud (KDC) is an Amazon Athena platform that hosts various Kellogg datasets. Some of these were previously stored in our on-premise Microsoft SQL Server (the Kellogg Data Center).  To find whether a dataset is available in this platform, please check the “Data” tab.

For access, please contact us at rs@kellogg.northwestern.edu.  Note that access is granted to each specific database and not to the platform as a whole.

Once you have access, you can use the data through one of the following methods:

  • On the AWS Console
  • On KLC through an ODBC connection

On the AWS console:

Gaining access

Request access to a specific dataset by contacting rs@kellogg.northwestern.edu

Logging into AWS

Use the NUIT provided link here https://www.it.northwestern.edu/support/login/aws.html to login to AWS with your net ID credentials.
Select “ksm-rch-data” and then find the database you’d like to query.
Select “Management console”

 

Access restrictions 

Once logged in, you only have access to Athena.  Please note that this account will not grant you access to any other AWS tool.

AWS Athena setup

  • From the “Search” field, navigate to Athena. 
  • Adjust your workgroup to match the database name. 
  • Check the Region at the upper right corner of the webpage. Please make sure you are in the US-EAST-2 (OHIO) region. 

Explore your dataset and submit queries

After setup, you can submit SQL queries directly from the Athena console. 

Query Limits

Note that most AWS databases have a daily query limit of 2TBs. Please contact rs@kellogg.northwestern.edu if you need to increase this limit.

Download Results

After your query is complete, there are multiple ways to download the results:

  • Download directly by clicking the “Download results” button on the screen
  • Go to the “Recent queries” tab to download results for a specific query

 

 

On KLC:

Gaining access

Request access to a specific dataset by contacting rs@kellogg.northwestern.edu

Logging into KLC

Follow the instructions here (https://www.kellogg.northwestern.edu/research-support/computing/kellogg-linux-cluster/connect.aspx) to login to KLC through any method you prefer.

Locate and copy AWS credentials

Navigate to the AWS login page here: https://www.it.northwestern.edu/support/login/aws.html

  • Select “ksm-rch-data” and then find the database you’d like to query.
  • Select “Command line and programmatic access.” 
  • Copy your temporary “AWS credentials file” from Option 2. 

    Note that these credentials will need to be updated every few hours.

Create a credentials file on KLC

After copying your credentials, create a hidden “.aws” folder in your KLC home directory.  Within that folder, create a “credentials” file that contains the copied contents.

Use the AWS command line interface (CLI)

Load the AWS command line tool with:

module load awscli/2

Check that your credentials work by displaying the S3 buckets you can access:

aws s3 ls --profile <account profile>

Please replace <account profile> with the name of your database account profile.  

Set up the ODBC environment

Set two paths to the ODBC instantiation files with:

export ODBCSYSINI=/kellogg/software/.odbc/<database_name>

 

export ODBCINI = /kellogg/software/.odbc/<database_name>

Please replace <database name> with the workgroup name you are provided.

Query limits

Note that most AWS databases have a daily query limit of 2TBs.  Please contact rs@kellogg.northwestern.edu if you need to increase this limit.

Connect from code

Connect to your Athena Database through any software platform you prefer.  We provide sample files to write queries here:

/kellogg/software/aws_odbc_samples

 

Python

Load the version of python you would like to use.  Then, modify the athena_odbc.py file with your preferred database name and table name. Run the file with:

python athena_odbc.py 

R

Load the version of R you would like to use.  Then modify the athena_odbc.R file with your preferred database name and table name. Run the file with:

Rscript athena_odbc.R

Stata

Load the version of Stata you would like to use.  Then modify the athena_odbc.do file with your preferred database name and table name. Run the file with:

stata-mp -b athena_odbc.do

Contact us about Kellogg Research Support

Email Research Support