使用 python 对csv 数据进行分析,导出表格中相同数据数量,统计某一列中 各个选项的重复数量,并导出为新的表格

使用 python 对csv 数据进行分析,导出表格中相同数据数量,统计某一列中 各个选项的重复数量,并导出为新的表格

It appears there was an issue executing the Python code due to an internal error. However, I can guide you through the process of splitting a large CSV file into smaller files, each containing 1,000,000 rows, using Python and pandas. Here’s how you can do it:

  1. Import pandas: First, you need to import the pandas library. If you don’t have pandas installed, you can install it using pip: pip install pandas.

  2. Read and Split the CSV File:

    • Use pandas.read_csv with the chunksize parameter set to 1,000,000. This will read the file in chunks of 1,000,000 rows at a time.
    • Iterate over these chunks, and for each chunk, save it as a new CSV file.
  3. Example Code:

    import pandas as pd
    
    def split_csv(file_path, chunk_size=1000000):
        # Read the large csv file with specified chunksize 
        chunk_generator = pd.read_csv(file_path, chunksize=chunk_size)
    
        for i, chunk in enumerate(chunk_generator):
            # Define the name for the smaller chunk files
            new_file_name = f'chunk_{i+1}.csv'
            # Save the chunk to a new file
            chunk.to_csv(new_file_name, index=False)
            print(f'Chunk {i+1} saved as {new_file_name}')
    
    # Replace '/path/to/your/large/file.csv' with the path to your large CSV file
    split_csv('/path/to/your/large/file.csv')
    

In this code:

  • Replace '/path/to/your/large/file.csv' with the path to your large CSV file.
  • The function split_csv reads your large CSV file in chunks and saves each chunk as a new file named chunk_1.csv, chunk_2.csv, and so on.

Remember, this process might take some time depending on the size of the CSV file and the performance of your computer.