ML Levels 2&3 - Technical Questions for Colin

Please reply to this post with any questions for Colin.

[Question for the video of dask_tutorial.py]

Hi Colin, I have a few questions about this implementation, can you take a look if it is not much trouble?

  1. For the line - “”" df = dd.read_csv(’/data/pubmed21*.csv’, sample=25000000) df.head() “”" I got an error that said “AttributeError: ‘tuple’ object has no attribute ‘head’” after I ran the cell, do you happen to know why? I am also confused about what does sample=25000000 means?

  2. for the line - “”" get(graph, ‘store’) # executes in parallel, check activity monitor “”" I also got an error that says “concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.” The system I am using is iOS,

Thanks in advance. @ddas @Sarahrp Hi Debaleena&Sarah, I couldn’t find Colin in the name directory, would you mind to forward this to him? Thanks very much!!

To answer completely I’d need slightly more context, but here are my initial responses:

  1. Please ensure the xml files are already downloaded, converted to csv, and accessible from a directory the Python process can access. This error has arisen because the csv files were not actually parsed into pandas data frames. If everything ran up until that point, verify that your csv files have structured data that pandas can parse in. The sample argument specifies a number of bytes for the parsing to do type inference; I included it because omitting results in wrong type inference and a broken parse. By having an adequate byte sample size, you can ensure the parse samples enough data to set up your data frames correctly. Check out the dask documentation for further info: dask.dataframe.read_csv — Dask documentation
  2. I am unsure of why your processes are breaking; you specify you are using ‘iOS’. Do you mean OSX? If so, did the other examples in my tutorial execute in parallel (can verify by seeing how many Python processes spawn in the activity monitor)? To my knowledge, there is no way to natively run a Python process on iOS.

We can discuss these issues during our next call, so please remind me of both of these if you’d still like to talk through them and have your Python session running so we can debug together.

Best,

Colin

1 Like

Thanks, Colin, really appreciate it. I was able to solve the first problem and you are right about it! I am working on the second one and will let you know if I have further questions. Have a great weekend!