**Questions: 12 **

Which characteristic applies mainly to Data Science as opposed to Business Intelligence?

A. Advanced analytical methods

B. Robust reporting

C. Focus on structured data

D. Data dashboards

**Answer: A**

**Questions: 11**

The web analytics team uses Hadoop to process access logs. They now want to correlate this data with structured user data residing in their massively parallel database. Which tool should they use to export the structured data from Hadoop?

A. Sqoop

B. Pig

C. Chukwa

D. Scribe

**Answer: A**

**Questions : 10**

You are attempting to find the Euclidean distance between two centroids:

Centroid A’s coordinates: (X = 2, Y = 4)

Centroid B’s coordinates (X = 8, Y = 10)

Which formula finds the correct Euclidean distance?

A. SQRT((2-8)2+(4-10)2) or 8.49

B. SQRT(((2-8) x 2) + ((4-10) x 2)) or 12.17

C. ((2-8)2+(4-10)2) or 72

D. ((2-8) x 2 + (4-10) x 2) or 148

**Answer: A**

**Question : 9**

What describes a true limitation of Logistic Regression method?

A. It does not handle missing values well.

B. It does not handle redundant variables well.

C. It does not handle correlated variables well.

D. It does not have explanatory values.

**Answer: A**

**Question: 8**

Which functionality do regular expressions provide?

A. text pattern matching

B. underflow prevention

C. increased numerical precision

D. decreased processing complexity

**Answer: A**

**Question: 7**

Data visualization is used in the final presentation of an analytics project. For what else is this technique commonly used?

A. Data exploration

B. Descriptive statistics

C. ETLT

D. Model selection

**Answer: A**

**Question: 6**

You are testing two new weight-gain formulas for puppies. The test gives the results:

Control group: 1% weight gain

Formula A. 3% weight gain

Formula B. 4% weight gain

A one-way ANOVA returns a p-value = 0.027

What can you conclude?

A. Either Formula A or Formula B is effective at promoting weight gain.

B. Formula B is more effective at promoting weight gain than Formula A.

C. Formula A and Formula B are both effective at promoting weight gain.

D. Formula A and Formula B are about equally effective at promoting weight gain.

**Answer: A**

**Question: 5**

In which phase of the data analytics lifecycle do Data Scientists spend the most time in a project?

A. Discovery

B. Data Preparation

C. Model Building

D. Communicate Results

**Answer: B**

**Question: 4**

What is one modeling or descriptive statistical function in MADlib that is typically not provided in a standard relational database?

A. Linear regression

B. Expected value

C. Variance

D. Quantiles

**Answer: A**

**Question: 3**

Which word or phrase completes the statement? A spreadsheet is to a data island as a centralized database for reporting is to a ________?

A. Data Warehouse

B. Data Repository

C. Analytic Sandbox

D. Data Mart

**Answer: A**

