3 docs tagged with "pyarrow"

Chunk a Dataframe to Parquet

I have a large pandas dataframe in memory. How can I chunk it into Parquet files using Metaflow?

How to quickly load tabular data from cloud storage?

I have a set of parquet files in the cloud, and want to read them into memory on my remote workers quickly. How can I do this with Metaflow?

Load Parquet Data from S3 to Arrow Table

I have a Parquet dataset stored in AWS S3 and want to access it in a Metaflow flow. How can I read one or several Parquet files at once from a flow and use them in an Arrow table?