Aicraft’s Lateral Deviation - Part 1

EDA

ETL

data

distribution

sample

Assessment of different classification algorithms and their respective results

Author

Oscar Cardec

Published

November 12, 2021

Introduction

On any given day, thousands of flights are maneuvering throughout the U.S. national airspace. All of these are constantly monitored by one entity, the Federal Aviation Administration (FAA). The FAA’s primary mission involves ensuring that flight operations are conducted efficiently, and to the highest levels of safety and security. In support of such endeavor, the continuous monitoring and accurate prediction of an aircraft position is a vital process across aeronautics and the FAA’s mission. Accurate forecasting of a flight could have a significant impact on businesses’ schedules, transportation logistics, or even protecting the environment. In today’s era of big data and technology advances monitoring of en-route flights its an imperative.

Disclaimer: The views and opinions expressed in this report are those of the author and do not necessarily reflect the views or positions of any of the entities herein referred.

The following assessment builds out of a previously conducted analysis (Paglione et al. 2010) which documented a comprehensive evaluation of numerous aircraft’s lateral deviations. For context, lateral deviations enclose divergent measurements of an aircraft’s actual position in comparison to its authorized flight route. Here I assess and identify alternate options to sustain aerial operations management using modern machine learning algorithms to expose aircraft lateral anomaly detection. It employs innovative statistical analyses, compare results with the previous findings, and introduces a more sophisticated approach to improve the tracking of civil and military aviation on near real-time basis.

Data

To accomplish the aforementioned, historical data of numerous flights is utilized. This data involves different continuous and categorical observations including the aircraft’s altitude, measurement times, calculated distances from targeted route, lateral and vertical statuses and suggested corrective heading among other.

The original data encompasses 20 control centers within the Continental United States averaging around 500,000 observations per center. That aggregates to over 10,000,000 measurements nation-wide in less than a 24-hour window. Analysis of such figures result costly when it comes to computational power and time. For such reason, I take a sampled-data approach, assuming that the data is representative of the entire population and statistically speaking inferences may be applied to the entire population. The following diagram provides a basic depiction of involved variables.

Exploratory Data Analysis

As mentioned, the sampled data contains approximately 1.9 million flight observations from 4 specific Air Route Traffic Control Centers (ARTCC), namely, Chicago (ZAU), New York (ZNY), Miami (ZMA), and Los Angeles (ZLA). These observations contain attributes of an aircraft while cruising from one fix point or ARTCC to another, and the recorded data in increments of 10 seconds.

Note: During exploratory steps the data is ingested and analyzed from a descriptive statistics standpoint. The 14 variables (different format types) are confirmed along with the total of 1.9 million observations. Also, notice how the “buildTime” variable is given as cumulative seconds, and the “latAdherStatus” as a character type. Notice, the “latAdherStatus” (lateral adherence status) variable which describes how distant the aircraft is from its authorized route (threshold recorded in nautical miles). This multivariate attribute is used as the target or dependable variable.

str(rawdf)

'data.frame':   1988068 obs. of  14 variables:
 $ acid          : chr  "AAL1016" "AAL1016" "AAL1016" "AAL1016" ...
 $ cid           : chr  "141" "141" "141" "141" ...
 $ buildTime     : num  61560 61570 61580 61590 61600 ...
 $ TE            : chr  "HMLP2" "HMLP2" "HMLP2" "HMLP2" ...
 $ curAlt        : num  14766 15550 16233 16750 17000 ...
 $ stCenterTm    : num  61560 61560 61560 61560 61560 ...
 $ endCenterTm   : num  62180 62180 62180 62180 62180 ...
 $ mergeTime     : chr  "null" "null" "null" "null" ...
 $ vertCnfStatus : chr  "inCnfDefault" "inCnfAscent" "inCnfAscent" "inCnfAscent" ...
 $ vertCnfDist   : num  -26234 -25450 -24767 -24250 -24000 ...
 $ redLat        : num  -1.42 -1.72 -1.91 -1.96 -2.08 ...
 $ angle2NextFix : chr  "10.71785671" "8.889666928" "8.00452716" "6.044758616" ...
 $ latAdherStatus: chr  "midNonConf" "outerNonConf" "outerNonConf" "outerNonConf" ...
 $ artcc         : chr  "ZAU" "ZAU" "ZAU" "ZAU" ...

Note: After gaining a basic understanding of the data set structure, I selected a particular set of variables, renamed them for easier understanding, factorized the labels from categorical values to a numerical for best calculation purposes, and defined a transformation function to convert time values from numeric to HMS format.

# data selection
mdata <- rawdf %>% 
  as_tibble() %>% 
  dplyr::select(acid,cid,buildTime,stCenterTm,endCenterTm,redLat,angle2NextFix,
                latAdherStatus, artcc)

# factorization of lateral adherance status
levels <-  c("innerInConf", "midInConf", "midNonConf", "outerNonConf", "endOfRoute")
labels <-  c("1", "2", "3", "4", "5")
mdata$latAdherStatus <- factor(mdata$latAdherStatus, levels = levels, labels = labels)

# variables renaming
maindf <- dplyr::rename(.data = mdata,
                        AircraftID = acid,
                        ComputerID = cid,
                        MeasureTaken = buildTime,
                        ControlStartTm = stCenterTm,
                        ControlEndTm = endCenterTm,
                        LateralDeviation = redLat,
                        CorrectionAngle = angle2NextFix,
                        LateralStatus = latAdherStatus,
                        ARTCC = artcc
                        )

# time conversion 
pacman::p_load(lubridate, hms)
timeconvert <- function(x){
  # Transform time period from string to hms format ## 
  mt <- seconds_to_period(x)
  mtstring <- sprintf('%02d:%02d:%02d', mt@hour, minute(mt), second(mt))
  hms::as_hms(mtstring)}

# trans with dplyr::mutate
df <- maindf %>% 
   dplyr::mutate(ARTCC = as.factor(ARTCC), 
    UniqueID = paste(AircraftID, ComputerID, sep = "_"), 
    Airline = substring(AircraftID, 1, 3),
    Airline = as.factor(Airline),
    MeasureTaken = as.numeric(MeasureTaken),
    CorrectionAngle = as.character(CorrectionAngle),
    LateralDeviation = as.double(LateralDeviation),
    xMeasureTaken = timeconvert(MeasureTaken),
    xControlStartTm = timeconvert(ControlStartTm),
    xControlEndTm = timeconvert(ControlEndTm))

df$CorrectionAngle <- as.double(df$CorrectionAngle)

# exclusion of NAs introduced by coercion 
df$CorrectionAngle[is.na(df$CorrectionAngle)] <- max(df$CorrectionAngle, na.rm = TRUE) + 1
head(df, 10)

# A tibble: 10 × 14
   AircraftID ComputerID MeasureTaken ControlStartTm ControlEndTm
   <chr>      <chr>             <dbl>          <dbl>        <dbl>
 1 AAL1016    141               61560          61560        62180
 2 AAL1016    141               61570          61560        62180
 3 AAL1016    141               61580          61560        62180
 4 AAL1016    141               61590          61560        62180
 5 AAL1016    141               61600          61560        62180
 6 AAL1016    141               61610          61560        62180
 7 AAL1016    141               61620          61560        62180
 8 AAL1016    141               61630          61560        62180
 9 AAL1016    141               61640          61560        62180
10 AAL1016    141               61650          61560        62180
# ℹ 9 more variables: LateralDeviation <dbl>, CorrectionAngle <dbl>,
#   LateralStatus <fct>, ARTCC <fct>, UniqueID <chr>, Airline <fct>,
#   xMeasureTaken <time>, xControlStartTm <time>, xControlEndTm <time>

Note: Next, I portray the distributions of some variables. Considering the following graphics, I anticipate significant variance across the target variable. The original data appears to be consistent along the mean, of course with the exception of some outliers.

# Lateral Adherence Status Labels
par(mfrow = c(1, 1))
plot(df$LateralDeviation, type = "l", ylab = "Nautical Miles",
     col = "darkblue",  main = "Lateral Deviations")

# lateral deviation boxplot
par(mfrow=c(2,2), mar=c(3,3,3,3))
boxplot(df$LateralDeviation, outline = TRUE, border = "darkred", 
        ylab = "Nautical Miles", main = "Boxplot Lateral Deviations")
boxplot(df$LateralDeviation, outline = FALSE, col = "lightgray", border = "darkred",
        ylab = NULL, main = "Boxplot Lateral Deviations\n(No Outliers)")
# histogram target variable
hist((df$LateralDeviation), main="Histogram Lateral Deviations",
     xlab="Nautical Miles", border="white", col = "darkblue", labels = FALSE)
# Lateral Deviations Log2 Calc Histogram
hist((log2(df$LateralDeviation)), main="Histogram Lateral Deviations\n(Log2)",
     xlab="Nautical Miles", ylab = NULL, border="white", col = "darkblue", labels = FALSE)

Note: The correction angle attribute refers to the degrees required to regain the proper heading in order to reach the next fix point associated with the flight plan. The following is a granular view of the correction angle data observations.

par(mfrow=c(2,1))
# Before
hist((sel1$CorrectionAngle), main="Correction Angle Histogram",
     xlab="Degrees", border="white", col = "darkred", labels = FALSE)
# After Log
hist(log2(sel1$CorrectionAngle), main="Correction Angle Histogram\n(Log2)",
     xlab="Degrees", border="white", col = "darkblue", labels = FALSE)

Note: Next I display each ARTCC and their respective aircraft behaviors in relation to their assigned route. Bottom line, the data distribution maintains uniformity when I separate the levels of deviations by their centers.

#Lateral Deviation/Status per ARTCC
pacman::p_load(gridExtra)
par(mfrow = c(1,1))

ls1 <- sel1 %>% 
  filter(LateralStatus == "1") %>% 
  ggplot()+
  aes(LateralDeviation, 
      ARTCC)+
  xlab(NULL)+
  geom_jitter(col = "grey", size = .5, alpha = 0.2, show.legend = F)+
  theme_light()

ls2 <- sel1 %>% 
  filter(LateralStatus == "2") %>% 
  ggplot()+
  aes(LateralDeviation, 
      ARTCC) +
  ylab(NULL)+xlab(NULL)+
  geom_jitter(col = "red", size = .5, alpha = 0.2, show.legend = F)+
  theme_light()

ls3 <- sel1 %>% 
  filter(LateralStatus == "3") %>% 
  ggplot()+
  aes(LateralDeviation, 
      ARTCC)+xlab(NULL)+
  geom_jitter(col = "green", size = .5, alpha = 0.2, show.legend = F)+
  theme_light()

ls4 <- sel1 %>% 
  filter(LateralStatus == "4") %>% 
  ggplot()+
  aes(LateralDeviation, 
      ARTCC)+
  ylab(NULL)+xlab(NULL)+
  geom_jitter(col = "blue", size = .5, alpha = 0.2, show.legend = F)+
  theme_light()

ls5 <- sel1 %>% 
  filter(LateralStatus == "5") %>% 
  ggplot()+
  aes(LateralDeviation, 
      ARTCC)+
  geom_jitter(col = "cyan", size = .5, alpha = 0.2, show.legend = F)+
  theme_light()

ls6 <- sel1 %>% 
  ggplot()+
  aes(LateralDeviation, 
      ARTCC, col = sel1$LateralStatus)+
  ylab(NULL)+
  geom_jitter(col = sel1$LateralStatus, size = .5, alpha = 0.1, show.legend = T)+
  theme_light()

grid.arrange(arrangeGrob(ls1, ls2, ls3, ls4, ls5, ls6), nrow=1)

Note: One key insight, the above graph shows ZMA as the control center with the highest variability or dispersion. The trait makes sense considering that ZMA (Miami) sits in a significant different location more susceptible to different weather elements and offshore inbound and transient air traffic.

ssel3 <- sel1

# function to calculate bounds
calc_iqr_bounds <- function(data) {
  Q <- quantile(data, probs = c(0.25, 0.75), na.rm = TRUE)
  iqr <- IQR(data, na.rm = TRUE)
  lower <- Q[1] - 1.5 * iqr
  upper <- Q[2] + 1.5 * iqr
  return(list(lower = lower, upper = upper))
}

# calculation of bounds
bounds <- calc_iqr_bounds(ssel3$LateralDeviation)
NOutliers <- subset(ssel3$LateralDeviation, ssel3$LateralDeviation > bounds$lower & 
                    ssel3$LateralDeviation < bounds$upper)

# distributions with and without outliers 
par(mfrow = c(1, 2))
boxplot(ssel3$LateralDeviation, col = "darkgrey", border = "darkred", 
        horizontal = FALSE, main = "Distribution with Outliers")

boxplot(NOutliers, col = "darkgrey", border = "darkblue", 
        main = "Distribution without Outliers")

Note: For best perception of these deviations, I went ahead and calculated additional measures applying moving average filters like weighted moving average with a backward window of two positions, the absolute values of the weighted moving average, and other backward windows to include the 4th prior period and the 7th period.

pacman::p_load(forecast, pracma)
df1 <- ssel3
df1["WMA"] <- pracma::movavg(df1$LateralDeviation, n = 2, type = "w")
df1["Abs_WMA"] <- abs(df1$WMA)
df1["ma4_lateraldev"] <- forecast::ma(df1$LateralDeviation, order=4)
df1["ma7_lateraldev"] <- forecast::ma(df1$LateralDeviation, order=7)
summary(df1)

 ARTCC           Airline          UniqueID          MeasureTaken  
 ZAU:549977   SWA    : 190572   Length:1988068     Min.   :61210  
 ZLA:547118   AAL    : 158419   Class :character   1st Qu.:68980  
 ZMA:473121   UAL    : 101846   Mode  :character   Median :74820  
 ZNY:417852   AWE    :  97578                      Mean   :74696  
              DAL    :  85748                      3rd Qu.:80410  
              COA    :  73412                      Max.   :86390  
              (Other):1280493                                     
 xMeasureTaken     xControlStartTm   xControlEndTm     LateralDeviation   
 Length:1988068    Length:1988068    Length:1988068    Min.   :-748.9390  
 Class1:hms        Class1:hms        Class1:hms        1st Qu.:  -0.8658  
 Class2:difftime   Class2:difftime   Class2:difftime   Median :   0.0043  
 Mode  :numeric    Mode  :numeric    Mode  :numeric    Mean   :   0.3468  
                                                       3rd Qu.:   0.8736  
                                                       Max.   : 596.9131  
                                                                          
 CorrectionAngle    LateralStatus      WMA               Abs_WMA        
 Min.   :  0.0000   1:802138      Min.   :-748.7875   Min.   :  0.0000  
 1st Qu.:  0.8003   2:228148      1st Qu.:  -0.8656   1st Qu.:  0.2012  
 Median :  2.5434   3:161562      Median :   0.0045   Median :  0.8702  
 Mean   : 24.9097   4:711040      Mean   :   0.3468   Mean   :  7.4450  
 3rd Qu.: 17.2016   5: 85180      3rd Qu.:   0.8747   3rd Qu.:  3.5313  
 Max.   :180.9997                 Max.   : 596.6656   Max.   :748.7875  
                                                                        
 ma4_lateraldev      ma7_lateraldev     
 Min.   :-747.9141   Min.   :-747.3517  
 1st Qu.:  -0.8684   1st Qu.:  -0.8775  
 Median :   0.0046   Median :   0.0051  
 Mean   :   0.3467   Mean   :   0.3467  
 3rd Qu.:   0.8808   3rd Qu.:   0.8919  
 Max.   : 595.2119   Max.   : 594.3059  
 NA's   :4           NA's   :6

Note: In conclusion of the EDA, I went from having limited understanding of the variables, to a robust and feature-engineered data frame while maintaining the original characteristics of the data. At this point, we want to split the tidy data frame, train and test the classification model.

clean_df1 <- df1[complete.cases(df1),]
head(clean_df1)

# A tibble: 6 × 14
  ARTCC Airline UniqueID    MeasureTaken xMeasureTaken xControlStartTm
  <fct> <fct>   <chr>              <dbl> <time>        <time>         
1 ZAU   AAL     AAL1016_141        61590 17:06:30      17:06          
2 ZAU   AAL     AAL1016_141        61600 17:06:40      17:06          
3 ZAU   AAL     AAL1016_141        61610 17:06:50      17:06          
4 ZAU   AAL     AAL1016_141        61620 17:07:00      17:06          
5 ZAU   AAL     AAL1016_141        61630 17:07:10      17:06          
6 ZAU   AAL     AAL1016_141        61640 17:07:20      17:06          
# ℹ 8 more variables: xControlEndTm <time>, LateralDeviation <dbl>,
#   CorrectionAngle <dbl>, LateralStatus <fct>, WMA <dbl>, Abs_WMA <dbl>,
#   ma4_lateraldev <dbl>, ma7_lateraldev <dbl>

References

Paglione, Mike, Ibrahim Bayraktutar, Greg McDonald, and Jesper Bronsvoort. 2010. “Lateral Intent Error’s Impact on Aircraft Prediction.” Air Traffic Control Quarterly 18 (1): 29–62. https://doi.org/10.2514/atcq.18.1.29.