Chunk a Dataframe to Parquet
I have a large pandas dataframe in memory. How can I chunk it into Parquet files using Metaflow?
I have a large pandas dataframe in memory. How can I chunk it into Parquet files using Metaflow?
I have a CSV and want to access it in a Metaflow flow. How can I read this data into tasks and write it to disk?
How do I load data from a local directory structure on AWS Batch using Metaflow's IncludeFile?
I have a Parquet dataset stored in AWS S3 and want to access it in a Metaflow flow. How can I read one or several Parquet files at once from a flow and use them in an Arrow table?
I have a Parquet dataset stored in AWS S3 and want to access it in a Metaflow flow. How can I read one or several Parquet files at once from a flow and use them in a Pandas dataframe?
Python is one of the main programming languages of data science and machine learning practitioners. This makes it important for programmers and scientists to understand the most common data structures used in Python’s machine learning ecosystem.
How can I access data in S3 with a SQL query from my Metaflow flow?
How do I query a database with SQL and load the results into Pandas?
How do I load data from a local directory structure on AWS Batch using Metaflow's S3 client?