Skip to content

Latest commit

 

History

History
197 lines (132 loc) · 7.03 KB

File metadata and controls

197 lines (132 loc) · 7.03 KB

Getting Started

:mod:`pystackql` allows you to run StackQL queries against cloud and SaaS providers within a native Python environment. The :class:`pystackql.StackQL` class can be used with Pandas, Matplotlib, Jupyter and more.

pystackql can be installed from PyPi using pip:

$ pip install pystackql

or you can use the setup.py script:

$ git clone https://github.com/stackql/pystackql && cd pystackql
$ python setup.py install

to confirm that the installation was successful, you can run the following command:

from pystackql import StackQL
stackql= StackQL()

print(stackql.version)

you should see a result like:

v0.5.396

StackQL providers will have different authentication methods. To see the available authentication methods for a provider, consult the StackQL provider docs. In general, most providers will use API keys or service account files, which can be generated and revoked from the provider's console.

StackQL will use the designated environment variable or variables for each respective provider for authentication. For instance, if the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables are set on the machine you are running pystackql on, these will be used to authenticate requests to the aws provider.

If you wish to use custom variables for providers you can override the defaults by supplying the auth keyword/named argument to the :class:`pystackql.StackQL` class constructor. The auth argument can be set to a dictionary or a string. If a dictionary is used, the keys should be the provider name and the values should be the authentication method. If a string is supplied, it needs to be a stringified JSON object with the same structure as the dictionary.

Note

Keyword arguments to the :class:`pystackql.StackQL` class constructor are simply command line arguments to the stackql exec command.

The :class:`pystackql.StackQL` class has a single method, :meth:`pystackql.StackQL.execute`, which can be used to run StackQL queries and return results in json, csv, text or table format.

The following example demonstrates how to run a query and return the results as a pandas.DataFrame:

from pystackql import StackQL
region = "ap-southeast-2"
stackql = StackQL(output='pandas')

query = """
SELECT instance_type, COUNT(*) as num_instances
FROM aws.ec2.instances
WHERE region = '%s'
GROUP BY instance_type
""" % (region)

df = stackql.execute(query)
print(df)

StackQL is a fully functional SQL programming environment, enabling the full set of SQL relational algebra (including UNION and JOIN) operations, here is an example of a simple UNION query:

...
regions = ["ap-southeast-2", "us-east-1"]
query = """
SELECT '%s' as region, instance_type, COUNT(*) as num_instances
FROM aws.ec2.instances
WHERE region = '%s'
GROUP BY instance_type
UNION
SELECT  '%s' as region, instance_type, COUNT(*) as num_instances
FROM aws.ec2.instances
WHERE region = '%s'
GROUP BY instance_type
""" % (regions[0], regions[0], regions[1], regions[1])

df = stackql.execute(query)
print(df)

The preceding example will print a pandas.DataFrame which would look like this:

 instance_type  num_instances          region
0    t2.medium              2  ap-southeast-2
1     t2.micro              7  ap-southeast-2
2     t2.small              4  ap-southeast-2
3     t2.micro              6       us-east-1

In addition to UNION DML operators, you can also run a batch (list) of queries asynchronously using the :meth:`pystackql.StackQL.executeQueriesAsync` method. The results of each query will be combined and returned as a single result set.

...
regions = ["ap-southeast-2", "us-east-1"]

queries = [
    f"""
    SELECT '{region}' as region, instance_type, COUNT(*) as num_instances
    FROM aws.ec2.instances
    WHERE region = '{region}'
    GROUP BY instance_type
    """
    for region in regions
]

df = stackql.executeQueriesAsync(queries)
print(df)

StackQL has a complete library of built in functions and operators for manipulating scalar and complex fields (JSON objects), for more information on the available functions and operators, see the StackQL docs. Here is an example of using the json_extract function to extract a field from a JSON object as well as the split_part function to extract a field from a string:

from pystackql import StackQL
subscriptionId = "273769f6-545f-45b2-8ab8-2f14ec5768dc"
resourceGroupName = "stackql-ops-cicd-dev-01"
stackql = StackQL() # output format defaults to 'dict'

query = """
SELECT name,
split_part(id, '/', 3) as subscription,
split_part(id, '/', 5) as resource_group,
json_extract(properties, '$.hardwareProfile.vmSize') as vm_size
FROM azure.compute.virtual_machines
WHERE resourceGroupName = '%s'
AND subscriptionId = '%s';
""" % (resourceGroupName, subscriptionId)

res = stackql.execute(query)
print(res)

For those using Jupyter Notebook or Jupyter Lab, pystackql offers a Jupyter magic extension that makes it even simpler to execute StackQL commands directly within your Jupyter cells.

To get started with the magic extension, first load it into your Jupyter environment:

%load_ext pystackql.magic

After loading the magic extension, you can use the %%stackql magic to execute StackQL commands in a dedicated Jupyter cell. The output will be displayed directly below the cell, just like any other Jupyter command output.

Example:

%%stackql
SHOW SERVICES in aws

This Jupyter magic extension provides a seamless integration of pystackql into your Jupyter workflows, allowing you to explore cloud and SaaS provider data interactively within your notebooks.

To use the magic extension to run queries against a StackQL server, you can use the following command:

%load_ext pystackql.magics