First Steps at Coffea-Casa Analysis Facility @ UChicago

Prerequisites

The primary mode of analysis with coffea-casa is coffea. Coffea provides plenty of examples to users in its documentation. A good starting point may be this tutorial on columnar analysis in coffea.

Knowledge of Python is also assumed. Standard coffea analyses are contained within Jupyter Notebooks, which allow for dynamic, block-by-block execution of code. Coffea-casa employs the JupyterLab interface. JupyterLab is designed for hosting Jupyter Notebooks on the web, and permits the usage of additional features within its environment, including Git access, compatibility with cluster computing tools, and much, much more.

If you aren’t familiar with any of these tools, please click on the links above for additional resources, and get acquainted with how they work. If you want examples of how coffea-casa merges these tools, refer to the gallery of coffea-casa examples.

Access

Important

Please use https://coffea.af.uchicago.edu as an access point to the Coffea-Casa Analysis Facility @ UChicago.

Access to Coffea-casa Analysis Facility @ UChicago

ATLAS AuthZ Authentication Instance

Currently Coffea-Casa Analysis Facility @ UChicago can support any member of ATLAS.

Sign in with your ATLAS CERN credential:

ATLAS Authz authentification to Coffea-casa Analysis Facility @ UChicago Approval required for CMS Authz authentification to Coffea-casa Analysis Facility @ UChicago

Docker Image Selection

For high efficient analysis using coffea package, powered with Dask and HTCondor please select:

Coffea Casa analysis image available at Coffea-casa Analysis Facility @ UChicago

After you will be forwarded to your personal Jupyterhub instance running at Analysis Facility @ UChicago:

Jupyterhub instance together with Dask Labextention powered cluster available at Coffea-casa Analysis Facility @ UChicago

Cluster Resources in Coffea-Casa Analysis Facility @ UChicago

By default, the Coffea-casa Dask cluster should provide you with a scheduler and workers, which you can see by clicking on the colored Dask icon in the left sidebar.

Default Dask Labextention powered cluster available Coffea-casa Analysis Facility @ UChicago

As soon as you will start your computations, you will notice that available resources at the Coffea-Casa Analysis Facility @ UChicago can easily autoscale depending on available resources in the HTCondor pool at AF UChicago.

Autoscaling with Dask Labextention powered cluster available at Coffea-casa Analysis Facility @ UChicago

Opening a New Console or File

There are three ways by which you can open a new tab within coffea-casa. Two are located within the File menu at the very top of the JupyterLab interface: New and New Launcher.

The File menu of the coffea-casa JupyterLab interface.

The New dropdown menu allows you to open the console or a file of a specified format directly. The New Launcher option creates a new tab with buttons that permit you to launch a console or a new file, exactly like the interface you are shown when you first open coffea-casa.

The final way is specific to the File Browser tab of the sidebar.

The File Browser tab of the coffea-casa JupyterLab interface, showcasing the New Launcher button.

This behaves exactly like the New Launcher option above.

Note

Regardless of the method you use to open a new file, the file will be saved to the current directory of your File Browser.

Using Git

Cloning a repository in the Coffea-casa Analysis Facility @ UChicago is simple, though it can be a little confusing because it is spread across two tabs in the sidebar: the File Browser and the Git tabs.

In order to clone a repository, first go to the Git tab. It should look like this:

The Git tab at Coffea-casa Analysis Facility @ UChicago

Simply click the appropriate button (initialize a repository, or clone a repository) and you’ll be hooked up to GitHub. This should then take you to the File Browser tab, which is where you can see all of the repositories you have cloned in your JupyterLab instance. The File Browser should look like this:

The File Browser tab at Coffea-casa Analysis Facility @ UChicago

If you wish to change repositories, simply click the folder button to enter the root directory. If you are in the root directory, the Git tab will reset and allow you to clone another repository.

If you wish to commit, push, or pull from the repository you currently have active in the File Browser, then you can return to the Git tab. It should change to look like this, so long as you have a repository open in the File Browser:

The Git tab at Coffea-casa Analysis Facility @ UChicago, after a repository is activated

The buttons in the top right allow for pulling and pushing respectively. When you have edited files in a directory, they will show up under the Changed category, at which point you can hit the + to add them to a commit (at which point they will show up under Staged). Filling out the box at the bottom of the sidebar will file your commit, and prepare it for you to push.

ServiceX

When dealing with very large datasets it is often better to do initial data filtering, augmentation using ServiceX. ServiceX transformation produces output as an Awkward Array. The array can then be used in a regular Coffea processing. Here a scheme explaining the workflow:

ServiceX and Coffea-Casa workflow schema.

There are two different, UC AF deployed ServiceX instances. The only difference between them is the type of input data they are capable of processing. Uproot processes any kind of “flat” ROOT files, while xAOD processes only Rucio registered xAOD files.

To use them one has to register and get approved. Sign in will lead you to a Globus registration page, where you may choose to use account connected to your institution:

ServiceX registration.

Once approved, you will be able to see status of your requests in the dashboard:

ServiceX dashboard.

For your code to be able to authenticate your requests, you need to download a servicex.yaml file, that should be placed in your working directory. The file is downloaded from your profile page:

ServiceX profile.

For an example analysis using ServiceX and Coffea look here.

More Examples

Columnar data analysis with DAOD_PHYSLITE here.