Data Detectives: Decode the World of Numbers!

Solve fun mysteries about stats, charts, and data in everyday life — sharpen your curiosity and become a data detective!

  1. What does 'data cleaning' mainly remove from a dataset?
    1. Errors
    2. New records
    3. Metadata
    4. Encryption
  2. Which chart is best for showing parts of a whole at once?
    1. Pie chart
    2. Scatter plot
    3. Line graph
    4. Histogram
  3. What term describes data that reveals personal identity?
    1. Anonymous data
    2. Aggregate data
    3. Personal data
    4. Open data
  4. If a dataset has one value far from others, that value is called a what?
    1. Mode
    2. Median
    3. Outlier
    4. Trend
  5. Which statistic divides a sorted dataset into two equal halves?
    1. Variance
    2. Mean
    3. Median
    4. Range
  6. What is the practice of checking study results against new data called?
    1. Visualization
    2. Collection
    3. Validation
    4. Sampling
  7. What file format is commonly used for tabular data and is plain text?
    1. PDF
    2. CSV
    3. JPEG
    4. MP3

Answers and explanations

  1. Question: What does 'data cleaning' mainly remove from a dataset?
    Answer: Errors
    Explanation: Data cleaning fixes mistakes like typos, duplicates, or impossible values so analyses are accurate. Fun fact: cleaning often takes more time than modelling in real projects!
  2. Question: Which chart is best for showing parts of a whole at once?
    Answer: Pie chart
    Explanation: Pie charts display proportions as slices of a circle, making it easy to compare parts to the whole. Fun fact: too many slices make pies hard to read!
  3. Question: What term describes data that reveals personal identity?
    Answer: Personal data
    Explanation: Personal data includes names, IDs, or anything that can identify someone directly or indirectly. Fun fact: regulations often require special handling of personal data.
  4. Question: If a dataset has one value far from others, that value is called a what?
    Answer: Outlier
    Explanation: An outlier is unusually high or low compared to most values and can indicate errors or interesting cases. Fun fact: outliers can reveal rare events or measurement mistakes.
  5. Question: Which statistic divides a sorted dataset into two equal halves?
    Answer: Median
    Explanation: The median is the middle value, so half the observations are above and half below. Fun fact: medians are better than means for skewed data.
  6. Question: What is the practice of checking study results against new data called?
    Answer: Validation
    Explanation: Validation tests whether a model or finding holds up on fresh data, preventing overfitting. Fun fact: cross-validation is a common method used in machine learning.
  7. Question: What file format is commonly used for tabular data and is plain text?
    Answer: CSV
    Explanation: CSV (comma-separated values) stores tables in plain text, making it easy to open across programs. Fun fact: some CSVs use semicolons instead of commas in certain countries.