lookups

Lookups for a PostgreSQL database with genomic data.

Lookup functions for the variant browser.

autocomplete(dataset: str, query: str, ds_version: str = None)[source]

Provide autocomplete suggestions based on the query.

Parameters:
  • dataset (str) – short name of dataset
  • query (str) – the query to compare to the available gene names
  • ds_version (str) – the dataset version
Returns:

A list of genes names whose beginning matches the query

Return type:

list

get_awesomebar_result(dataset: str, query: str, ds_version: str = None)[source]

Parse the search input.

Datatype is one of:

  • gene
  • transcript
  • variant
  • dbsnp_variant_set
  • region

Identifier is one of:

  • ensembl ID for gene
  • variant ID string for variant (eg. 1-1000-A-T)
  • region ID string for region (eg. 1-1000-2000)

Follow these steps:

  • if query is an ensembl ID, return it
  • if a gene symbol, return that gene’s ensembl ID
  • if an RSID, return that variant’s string
Parameters:
  • dataset (str) – short name of dataset
  • query (str) – the search query
  • ds_version (str) – the dataset version
Returns:

(datatype, identifier)

Return type:

tuple

get_coverage_for_bases(dataset: str, chrom: str, start_pos: int, end_pos: int = None, ds_version: str = None)[source]

Get the coverage for the list of bases given by start_pos->end_pos, inclusive.

Parameters:
  • dataset (str) – short name for the dataset
  • chrom (str) – chromosome
  • start_pos (int) – first position of interest
  • end_pos (int) – last position of interest; if None it will be set to start_pos
  • ds_version (str) – version of the dataset
Returns:

coverage dicts for the region of interest. None if failed

Return type:

list

get_coverage_for_transcript(dataset: str, chrom: str, start_pos: int, end_pos: int = None, ds_version: str = None)[source]

Get the coverage for the list of bases given by start_pos->end_pos, inclusive.

Parameters:
  • dataset (str) – short name for the dataset
  • chrom (str) – chromosome
  • start_pos (int) – first position of interest
  • end_pos (int) – last position of interest; if None it will be set to start_pos
  • ds_version (str) – version of the dataset
Returns:

coverage dicts for the region of interest

Return type:

list

get_exons_in_transcript(dataset: str, transcript_id: str, ds_version=None)[source]

Retrieve exons associated with the given transcript id.

Parameters:
  • dataset (str) – short name of the dataset
  • transcript_id (str) – the id of the transcript
  • ds_version (str) – dataset version
Returns:

dicts with values for each exon sorted by start position

Return type:

list

get_gene(dataset: str, gene_id: str, ds_version: str = None)[source]

Retrieve gene by gene id.

Parameters:
  • dataset (str) – short name of the dataset
  • gene_id (str) – the id of the gene
  • ds_version (str) – dataset version
Returns:

values for the gene; None if not found

Return type:

dict

get_gene_by_dbid(gene_dbid: str)[source]

Retrieve gene by gene database id.

Parameters:gene_dbid (str) – the database id of the gene
Returns:values for the gene; empty if not found
Return type:dict
get_gene_by_name(dataset: str, gene_name: str, ds_version=None)[source]

Retrieve gene by gene_name.

Parameters:
  • dataset (str) – short name of the dataset
  • gene_name (str) – the id of the gene
  • ds_version (str) – dataset version
Returns:

values for the gene; empty if not found

Return type:

dict

get_genes_in_region(dataset: str, chrom: str, start_pos: int, stop_pos: int, ds_version: str = None)[source]

Retrieve genes located within a region.

Parameters:
  • dataset (str) – short name of the dataset
  • chrom (str) – chromosome name
  • start_pos (int) – start of region
  • stop_pos (int) – end of region
  • ds_version (str) – dataset version
Returns:

values for the gene; empty if not found

Return type:

dict

get_raw_variant(dataset: str, pos: int, chrom: str, ref: str, alt: str, ds_version: str = None)[source]

Retrieve variant by position and change.

Parameters:
  • dataset (str) – short name of the reference set
  • pos (int) – position of the variant
  • chrom (str) – name of the chromosome
  • ref (str) – reference sequence
  • alt (str) – variant sequence
  • ds_version (str) – dataset version
Returns:

values for the variant; None if not found

Return type:

dict

get_transcript(dataset: str, transcript_id: str, ds_version: str = None)[source]

Retrieve transcript by transcript id.

Also includes exons as [‘exons’]

Parameters:
  • dataset (str) – short name of the dataset
  • transcript_id (str) – the id of the transcript
  • ds_version (str) – dataset version
Returns:

values for the transcript, including exons; None if not found

Return type:

dict

get_transcripts_in_gene(dataset: str, gene_id: str, ds_version: str = None)[source]

Get the transcripts associated with a gene.

Parameters:
  • dataset (str) – short name of the reference set
  • gene_id (str) – id of the gene
  • ds_version (str) – dataset version
Returns:

transcripts (dict) associated with the gene; empty if no hits

Return type:

list

get_transcripts_in_gene_by_dbid(gene_dbid: int)[source]

Get the transcripts associated with a gene.

Parameters:gene_dbid (int) – database id of the gene
Returns:transcripts (dict) associated with the gene; empty if no hits
Return type:list
get_variant(dataset: str, pos: int, chrom: str, ref: str, alt: str, ds_version: str = None)[source]

Retrieve variant by position and change.

Parameters:
  • dataset (str) – short name of the dataset
  • pos (int) – position of the variant
  • chrom (str) – name of the chromosome
  • ref (str) – reference sequence
  • alt (str) – variant sequence
  • ds_version (str) – version of the dataset
Returns:

values for the variant; None if not found

Return type:

dict

get_variants_by_rsid(dataset: str, rsid: str, ds_version: str = None)[source]

Retrieve variants by their associated rsid.

Parameters:
  • dataset (str) – short name of dataset
  • rsid (str) – rsid of the variant (starting with rs)
  • ds_version (str) – version of the dataset
Returns:

variants as dict; no hits returns None

Return type:

list

get_variants_in_gene(dataset: str, gene_id: str, ds_version: str = None)[source]

Retrieve variants present inside a gene.

Parameters:
  • dataset (str) – short name of the dataset
  • gene_id (str) – id of the gene
  • ds_version (str) – version of the dataset
Returns:

values for the variants

Return type:

list

get_variants_in_region(dataset: str, chrom: str, start_pos: int, end_pos: int, ds_version: str = None)[source]

Variants that overlap a region.

Parameters:
  • dataset (str) – short name of the dataset
  • chrom (str) – name of the chromosom
  • start_pos (int) – start of the region
  • end_pos (int) – start of the region
  • ds_version (str) – version of the dataset
Returns:

variant dicts, None if no hits

Return type:

list

get_variants_in_transcript(dataset: str, transcript_id: str, ds_version: str = None)[source]

Retrieve variants inside a transcript.

Parameters:
  • dataset (str) – short name of the dataset
  • transcript_id (str) – id of the transcript (ENST)
  • ds_version (str) – version of the dataset
Returns:

values for the variant; None if not found

Return type:

dict