top of page

UFO Sightings 1906-2014

Platforms used: Alteryx, Tableau

Alteryx: Data Cleaning, Data Processing

Alteryx Workflow:

image0.jpeg

When pulling this data set, I noticed that the country, city, and duration fields of this data set had strings that were inconsistent in formatting. They had leading white spaces, punctuation, misspellings, and other formatting issues. 

First Step: Clean the overall syntax of the data

  • Data Cleansing Tool: clean the leading & trailing whitespaces and punctuation from country and city columns

  • Data Cleansing Tool: clean the leading & trailing whitespace, replace null or empty spaces with zero in the length of sighting column

  • Text to Columns Tool: City field had City (country) in some rows. To remove the (country) element into it's own column, I split these into 3 columns with ( as the delimiter. Third column was for extra text.

  • Data Cleansing Tool: Cleaned the new city columns from the Text to Column tool. Got rid of leading & trailing white space, and punctuation. 

  • Formula Tool: 

    • Formula: IF Contains([city1], "39" THEN "'" ELSE [city1] ENDIF​

    • This was to replace the '39' text with an apostrophe. If I had data cleansed it earlier, it would have made two words into one word.

  • Select Tool: Change the "length of encounter in seconds" column to an integer. This is so I can clean and calculate the time in minutes.

Second Step: Clean the duration columns

  • Formula Tool: Create a new column labeled "Seconds or Minutes"

    • IF [length of encounter seconds] < 60 THEN 1 ELSE ([length of encounter seconds]/60) ENDIF ​

    • This was so I could assign the length as minutes or seconds. Anything less than 60 seconds will be labeled as "seconds" with a '1' and anything above 60 will be labeled as "minutes".

  • Select Tool: Change Seconds to Minutes column to a V_String. This is so I can add either "minutes" or "seconds" to the end of the integer.

  • Formula Tool: Add seconds or minutes depending on the duration.

    • IF [seconds or minutes] = "1" THEN "seconds" ELSE [seconds or minutes]+" minutes" ENDIF​

    • IF [seconds or minutes] = "seconds" THEN [length of encounter seconds] ELSE [seconds or minutes] ENDIF

    • IF !Contains([seconds or minutes], "minutes") THEN [seconds or minutes]+" seconds" ELSES [seconds or minutes] ENDIF 

Third Step: Clean country column

  • Multi-Field Formula: Assign the country abbreviation. Select City1 and City2 column

    • IF Contains([_CurrentField_], "Australia") THEN "AU" ELSEIF Contains([_CurrentField_], "England") THEN "GB" ELSEIF Contains([_CurrentField_], "Wales") THEN "GB" ELSEIF Contains([_CurrentField_], "New York") THEN "US" ELSEIF Contains([_CurrentField_], "Canada") THEN "CA" ELSEIF Contains([_CurrentField_], "Of") THEN " " ELSEIF Contains([_CurrentField_], "About") THEN " " ELSE [_CurrentField_] ENDIF​

  • Filter Tool: Filter out the Null countries to assign them based on columns [city2] or [city3]

    • Country is Null​

  • Formula Tool: 

    • IF [city3] != Null() THEN Replace([city2], [city2], [city3]) ELSE [city2] ENDIF​

  • Select Tool: Remove excess columns 

  • Join Tool (with Input Tool): Input another file of all country names and abbreviations

    • Join on Country Name and [city2] column

  • Union Tool: Union the joined files and the false value from the earlier filter tool

  • Data Cleansing Tool: Remove punctuation, leading & trailing white space, and numbers.

  • Select Tool: Clean the name of the columns and uncheck unnecessary columns for analysis

Tableau Dashboard: Data Visualization

Please click this link to navigate to Tableau Public, where you can see the correct formatting of the dashboard. The embedded version changes the formatting, but it is here for easy reference.

Key Insights from this data set: 

  • Light is the most common UFO shape, accounting for 20.7% of all encounters.

  • The US by far has the most sightings of any country provided in this data set. It accounts for almost 55% of all sightings.

  • Encounters lasting 1-5 minutes account for 37% of all encounters.

  • For the entire dataset, the average encounters per year is 307.

  • From the year 2000, the average is 1,218 sightings per year.

bottom of page