Skip to content

Five Regular Duties a ChatGPT can accomplish for Data Analysts

Detailed Guide on the Operations of ChatGPT, Including Data Cleansing, Investigation, Graphics Presentation, Statistical Modeling, and More.

Daily Duties Efficiently Tackled by ChatGPT for Data Specialists
Daily Duties Efficiently Tackled by ChatGPT for Data Specialists

Five Regular Duties a ChatGPT can accomplish for Data Analysts

In the bustling city of London, data science is playing a crucial role in improving the efficiency of transportation services, such as Gett, a popular black taxi app. A recent data project, aimed at understanding why some customers did not successfully get a car, has shown promising results by leveraging the power of artificial intelligence (AI) tools like ChatGPT and the open-source Gemini CLI.

The project's approach is twofold. ChatGPT, a powerful AI model, is utilised to create and refine the necessary data science code and logic interactively. Meanwhile, Gemini CLI, an open-source agent, automates the execution of these steps, turning them into efficient workflows and dashboards.

Data Cleaning and Exploration

ChatGPT, with appropriate prompts, can generate code and workflows to clean and organise raw Gett data, significantly reducing the time spent on repetitive manual cleaning. It can also assist in initial exploratory data analysis (EDA) by summarising data distributions, detecting anomalies, and suggesting transformations.

Data Visualization

ChatGPT can produce code snippets to generate charts and visualizations that help understand patterns in the Gett dataset. Automating insightful visual exploration is crucial for identifying trends and making informed decisions.

Preparing Dataset for Machine Learning

ChatGPT can help write preprocessing code such as encoding categorical variables, scaling features, handling missing values, and splitting data into training and testing sets, readying the data for modeling.

Machine Learning Modeling

ChatGPT can suggest suitable algorithms, generate model training and evaluation code, and help interpret results, applying machine learning methods on the processed Gett data.

Gemini CLI Integration

Gemini CLI acts as an interactive, command-line interface tool leveraging Google’s Gemini large language model with multimodal and built-in tool capabilities. It can automate the execution of data science workflows created or refined via ChatGPT prompts, such as running cleaning scripts, generating visualizations, and training models via a single command or a streamlined workflow.

Dashboard Automation

In the Gett data project example, Gemini CLI was used to build a Streamlit dashboard that bundles all these steps (cleaning, exploration, visualization, modeling) into an automated interface executed with one click, enhancing usability and reproducibility.

Together, ChatGPT and Gemini CLI can automate routine data science tasks, saving substantial time for data scientists and allowing them to focus on higher-level analysis and decision-making. The Gemini CLI is available at no cost and provides a straightforward command-line interface, making it an accessible tool for data scientists.

The final step in the data project is to prepare the dataset for machine learning, which involves encoding categorical variables, scaling numerical features, and returning a clean DataFrame ready for modeling. ChatGPT generated six different graphs as part of the data project.

This article explores five routine tasks that ChatGPT can handle in a data project. However, the model used in the project is not specified. The Streamlit app, built using the Gemini CLI, requires user input for the target variable. It displays each step in a different tab, covering all steps, including basic EDA, data cleaning, auto visualizations, machine learning preparation, and model application. Machine learning evaluation metrics like accuracy, precision, recall, and F1-score are reported.

  1. The power of artificial intelligence (AI), demonstrated through tools like ChatGPT and Gemini CLI, is being leveraged to enhance transportation services in London, as seen in Gett's data project.
  2. ChatGPT aids in generating code and workflows for data cleaning and organization, reducing the time spent on repetitive manual tasks.
  3. In the realm of education, ChatGPT can assist in initial exploratory data analysis (EDA) by summarizing data distributions, detecting anomalies, and suggesting transformations.
  4. Gemini CLI, integrated with ChatGPT, automates the execution of these data science steps, creating efficient workflows and dashboards.
  5. Data visualization is crucial for identifying trends, and ChatGPT can generate code snippets to create charts and visualizations that help understanding patterns in datasets.
  6. Preparing datasets for machine learning is a key step, and ChatGPT can help write preprocessing code such as encoding categorical variables, scaling features, handling missing values, and splitting data.
  7. For machine learning modeling, ChatGPT can suggest algorithms, generate training and evaluation code, and help interpret results, applying machine learning methods on processed datasets.
  8. In the field of technology, tools like Gemini CLI can automate the execution of data science workflows, making it an accessible resource for data scientists.
  9. The article discusses five routine tasks that ChatGPT can handle in a data project, and how the Gemini CLI, in combination with ChatGPT, can save time for data scientists, enabling them to focus on higher-level analysis and decision-making.
  10. As trends in technology continue to evolve, sustainable living initiatives are integrating AI and data science, such as using AI to optimize home-and-garden systems for energy efficiency, demonstrating the versatility and importance of these tools across various sectors.

Read also:

    Latest