Return values from a Python Entrez dictionary of dictionaries

I want to scrape the Interactions table from the Entrez Gene page.

The Interactions table is populated from a web server and when I tried to use the XML package in R, I could get the Entrez gene page, but the Interactions table body was empty (it had not been populated by the web server).

Dealing with the web server issue in R may be solvable (and I'd love to see how), but it seemed Biopython was an easier path.

I put together the following, which gives me what I want for an example gene:


# Pull the Entrez gene page for MAP1B using Biopython

from Bio import Entrez
Entrez.email = "me@x"
handle = Entrez.efetch(db="gene", id="4131", retmode="xml")
record = Entrez.read(handle)
handle.close()

# Find the Dictionary that contains the Interaction table
for x in range(1, len(record[0]["Entrezgene_comments"])):
    if x in record[0]["Entrezgene_comments"][x].values() == 'Interactions':
        Interactions = record[0]["Entrezgene_comments"][x]

# Return the desired values: I want the Entrez ID and Gene symbol for each interacting protein
for x in range(0, len(Interactions['Gene-commentary_comment'])):
    print Interactions['Gene-commentary_comment'][x]['Gene-commentary_comment'][1]['Gene-commentary_source'][0]['Other-source_src']['Dbtag']['Dbtag_tag']['Object-id']['Object-id_id']     # print the Entrez IDs
    print Interactions['Gene-commentary_comment'][x]['Gene-commentary_comment'][1]['Gene-commentary_source'][0]['Other-source_anchor']     # print the gene symbols

This code works, giving me what I want. But I think its ugly, and am concerned that if the Entrez gene page changes slightly in format it will break the code. In particular, there must be a better way to extract the desired information than specifying the full path, as I do with:


Interactions['Gene-commentary_comment'][x]['Gene-commentary_comment'][1]['Gene-commentary_source'][0]['Other-source_src']['Dbtag']['Dbtag_tag']['Object-id']['Object-id_id']

But I cannot figure out how to search through a dictionary of dictionaries without specifying each level I want to descend. When I try functions like find(), they operate on the next level down, but not all the way to the bottom.

Is there a wildcard symbol, a Python equivalent of "//", or a function I can use to get to ['Object-id_id'] without naming the full path? Other suggestions for cleaner code are also appreciated.

Return values from a Python Entrez dictionary of dictionaries

No comments:

Post a Comment