utils#

aws_sdk_polars.utils.df_to_ascii(df: DataFrame) str[source]#

Convert a polars DataFrame to an ASCII table in string.

aws_sdk_polars.utils.pprint_df(df: DataFrame)[source]#

Pretty print a polars DataFrame.

aws_sdk_polars.utils.get_merged_schema(dfs: List[DataFrame], raise_on_conflict: bool = True) Dict[str, DataType][source]#

Merge the schemas of multiple Polars DataFrames into a single schema.

This function takes a list of Polars DataFrames and combines their schemas. If multiple DataFrames have columns with the same name but different types, the type from the last DataFrame in the list takes precedence.

Parameters:

dfs – A list of Polars DataFrames whose schemas are to be merged.

Returns:

A dictionary representing the merged schema. Keys are column names, and values are Polars DataTypes.

Example

>>> import polars as pl
>>> df1 = pl.DataFrame({'A': [1, 2], 'B': ['a', 'b']})
>>> df2 = pl.DataFrame({'B': [1.0, 2.0], 'C': [True, False]})
>>> merged_schema = get_merged_schema([df1, df2])
>>> print(merged_schema)
{'A': Int64, 'B': Float64, 'C': Boolean}

Note

This function does not handle conflicts between data types. If the same column name appears with different types in multiple DataFrames, the type from the last DataFrame in the list will be used in the merged schema.

aws_sdk_polars.utils.harmonize_schemas(dfs: List[DataFrame], schema: Dict[str, DataType]) List[DataFrame][source]#

Harmonize the schemas of multiple Polars DataFrames to match a given schema.

This function takes a list of DataFrames and a target schema, then modifies each DataFrame to conform to this schema. It adds missing columns with NULL values and the specified data type from the schema. Existing columns in the DataFrames are left unchanged, even if their data type differs from the schema.

Parameters:
  • dfs – A list of Polars DataFrames whose schemas are to be harmonized.

  • schema – A dictionary representing the target schema. Keys are column names, and values are Polars DataTypes.

Returns:

A new list of DataFrames with harmonized schemas. Each DataFrame in this list corresponds to a DataFrame in the input list, but with added columns to match the target schema.

Example

>>> import polars as pl
>>> df1 = pl.DataFrame({'A': [1, 2], 'B': ['a', 'b']})
>>> df2 = pl.DataFrame({'B': [1.0, 2.0], 'C': [True, False]})
>>> target_schema = {'A': pl.Int64, 'B': pl.Utf8, 'C': pl.Boolean, 'D': pl.Float64}
>>> harmonized_dfs = harmonize_schemas([df1, df2], target_schema)
>>> print(harmonized_dfs[0].schema)
{'A': Int64, 'B': Utf8, 'C': Boolean, 'D': Float64}
>>> print(harmonized_dfs[1].schema)
{'B': Float64, 'C': Boolean, 'A': Int64, 'D': Float64}

Note

  1. This function only adds missing columns; it does not modify or remove existing columns in the input DataFrames.

  2. The data type of existing columns is not changed, even if it differs from the type specified in the target schema.

  3. Added columns are filled with NULL values of the appropriate type.

aws_sdk_polars.utils.merge_dataframes(dfs: List[DataFrame]) DataFrame[source]#

Merge multiple Polars DataFrames into a single DataFrame with a unified schema.

This function performs the following steps:

  1. Merges the schemas of all input DataFrames, raising an error if there are conflicts.

  2. Harmonizes the schemas of all DataFrames to match the merged schema.

  3. Concatenates all harmonized DataFrames into a single DataFrame.

Parameters:

dfs – A list of Polars DataFrames to be merged.

Returns:

A single Polars DataFrame containing all data from the input DataFrames, with a schema that includes all columns from all input DataFrames.

Raises:

ValueError – If there are schema conflicts between the input DataFrames (e.g., same column name with different data types).

Example

>>> import polars as pl
>>> df1 = pl.DataFrame({'A': [1, 2], 'B': ['a', 'b']})
>>> df2 = pl.DataFrame({'B': ['c', 'd'], 'C': [True, False]})
>>> merged_df = merge_dataframes([df1, df2])
>>> print(merged_df)
shape: (4, 3)
┌─────┬─────┬───────┐
│ A   ┆ B   ┆ C     │
│ --- ┆ --- ┆ ---   │
│ i64 ┆ str ┆ bool  │
╞═════╪═════╪═══════╡
│ 1   ┆ a   ┆ null  │
│ 2   ┆ b   ┆ null  │
│ null┆ c   ┆ true  │
│ null┆ d   ┆ false │
└─────┴─────┴───────┘

Note

  • This function uses get_merged_schema() with raise_on_conflict=True, so it will raise an error if there are any schema conflicts.

  • The harmonize_schemas() function is used to ensure all DataFrames have the same schema before concatenation.

  • The order of rows in the output DataFrame corresponds to the order of DataFrames in the input list.