【文档说明】Hypothesis-Testing假设检验讲义(中英文版)(ppt-34页)课件.ppt,共(32)页,272.500 KB,由小橙橙上传
转载请保留链接:https://www.ichengzhen.cn/view-2959.html
以下为本文档部分文字说明:
1HypothesisTesting2假设检验5Population:TheUniverseDataorinformationthatdefinestheentiresetParameters(m,s)may
,ormaynotbeknown.Sample:Asubsetdataorinformationthatpossessesthesamecharacteristicsasthatofthepopulat
ion.Wecancalculatestatistics(XBar,s).WemakedecisionsaboutthepopulationbasedonthesampleHowmanysamplesshouldbetaken?Wh
yshouldwetakeasample?Shouldthesampleberandom?Isitpossibletohavesamplingerror?PopulationsandSamples6总
体:统计总体用以定义所有可知或不可知参数(m,s)的数据或信息样品:总体中具有共同特征的子集。可以计算其形成的统计表(X).我们以样本为基础做出总体决策应取多少样本?为何要选取样本?样本需要随机抽取吗?可能出现取样错误
吗?总体和样本7Samples?WhyUseThem?Whyuseasampleinsteadofapopulation?UsingasamplereducestimeandcostCapturin
gdataontheentirepopulationmaybeverydifficult,ifnotimpossible.WhentouseasampleWeusesamplestoBaselineaprocessUs
esamplestoevaluatetheresultsofacontrolledchangetoaprocess.Howshouldthesamplebetaken?Seesection5.8样本?为何使用样本?为何采用样本而非总体?采用样本可减少时间和成本消耗即使可
能,获取总体数据也是非常困难的.何时采用样本?我们利用样本定流程基线利用样本对过程的可控变化结果进行评估.如何获取样本?请看第五部分.9SampleAAllprocesseshavevariation.Samplesfromagivenprocessmayvary.SampleBHow
canwedifferentiatebetweensamplebased“chance”variationandatrueprocessdifference?Howcanwedependonasamp
le?10样本A所有的过程都有差异.来源于给定过程的样本也可能是多样化的.样本B我们怎样区分随机变化的样本和真实总体的差别呢?怎样使用样本?11ConfidenceIntervalsandPointEstimatesConfidenceintervalsidentifyarangeofplau
siblevaluesforasamplestatisticofapopulationparameter.Theycanbeeitherone-sidedortwo-sided.122RiskRisk122RiskRiskX111222222RiskR
isk111222222RiskRiskXSampleMeans,SampleStandarddeviation,SampleVariancesandothersamplestatisticsareknownasPointEstimatorsbecau
setheyaresinglevaluesusedtorepresentpopulationparameters12可信区间和特征值的估计可信区间确定了总体参数中样本统计可能的数值范围.它们可以是单边也可是双
边。122RiskRisk122RiskRiskX111222222RiskRisk111222222RiskRiskX样本均值、样本标准偏差、样本方异和其它样本
统计被称为特征值评估者。因为它们是用以代表总体参数的单一数值。13HypothesisTestsPointEstimatesofparametersandConfidenceIntervalInterpretationarebothmeansformakinginferences
aboutsampledata.Hypothesistestsaredesignedtohelpusmakeaninferenceaboutthetruepopulationvalueatthedesiredlevelofconf
idence.Wewilluseconfidenceintervalsandtestsofsamplemeans,variancesandsamplestandarddeviationtoinvestigat
edifferenceandcause/effectrelationshipsusingdata.HypothesisTestshelpdetermineifanapparentdifferenceisrealorcouldbeduetochance.Byusingdataandhypot
hesistesting,wecanquantifyourlevelofconfidencethatthedifferenceisreal.14假设检验对参数特征值估计和可信区间的诠释都是得出样本数据推论的路径.假设检验是用以帮助我们在需要的可信度上对真实的总体数值进行推论的。我们将用可
信区间和样本均值、样本差异及样本标准偏差测验来研究使用数据的差別和因果关系。假设检验有助于判断一个明显的差别是否真实存在还是偶然的,而且还可以提高差异真实性的可信度.15AStatisticalHypothesisAnassertionorcon
jectureaboutoneormoreparametersofthepopulationTodeterminewhetheritistrueorfalse,wemustexaminetheentirepopulation.Thisisimp
ossible!!Insteadusearandomsampletoprovideevidencethateithersupportsordoesnotsupportthehypothesis.Theconclusionisthenbasedu
ponstatisticalsignificance.Itisimportanttorememberthatthisconclusionisaninferenceaboutthepopulationdeterminedfromthesampledata.
16统计假设对于一个或多个总体里的参数的肯定或推断为了判断它的正误,我们必须检查总体的全部。这是不可能的!!我们应使用随机样本,观察其是否能支持该假设.从而该结论是建立在统计学意义的基础之上的.必须记住该有关总体的结论是由样本推测出的.17WhyDoHypothesisTesting
?1.Toimproveprocesses,weneedtoidentifyfactorswhichimpactthemeanorstandarddeviation.2.Oncewehaveidentifiedthesefactorsan
dmadeadjustmentsforimprovement,weneedtovalidateactualimprovementsinourprocesses.3.Sometimeswecannotdecid
egraphicallyorbyusingcalculatedstatistics(samplemeanandstandarddeviation)ifthereisastatisticallysignificantdifferencebetweenprocesses.4.Insuchc
asesthedecisionwillbesubjective.5.Weperformaformalstatisticalhypothesistesttodecideobjectivelywhetherthereisadifference.Datahelpseveryone
makesthesamedecision.18为何要做假设检验?1.为了改进过程,我们需要确定影响均值和标准偏差的因素.2.一旦确定了这些因素并对改进措施进行了调整,我们就需要验证其在过程中的切实效果。3.若过程中存在统计上的重大差别,有时我们就不能利用图表或算得
的统计数据(样本均值和样本标准偏差)作出决策.4.在这种情况下,决定可能是主观的.5.我们采用正统假设检验以客观地判断是否存在差別。数据帮助每个人作出同样的决定。19NatureofHypothesesNullHypothesis(Ho):Usually
describesastatusquoTheoneyouassumeunlessotherwiseshownSignsusedinMinitab:=AlternativeHypothesis(Ha):U
suallydescribesadifferenceTheoneyouacceptorrejectbaseduponevidenceSignsusedinMinitab:not=or<or>OrItseitherNull(same)orAlternative(Differe
nt)20假设的种类虚无性假设通常用以描述现状除非其它方面有所说明,否则就是人为设想的。在Minitab中用“=”表示选择性假设(Ha):通常用以描述差別以证据为基础接受或拒绝的类型在Minitab中用“not=
or<or>”表示Or不是全虚性假设(相同)就是选择性假设(相区别的)21HypothesisTestingGuiltyvs.InnocentExampleTheAmericanjusticesystemcanbeu
sedtoillustratetheconceptofhypothesistesting.InAmericaweassumeinnocenceuntilprovenguilty.Innocencecorrespondstothenullhypoth
esis.Itrequiresstrongevidence,“beyondareasonabledoubt,”toconvictthedefendant.Returningaguiltyverdictcorrespondstorejectingthenullhypothesis
andacceptingthealternativehypothesis.Morespecifically,wehavesignificantevidencetosupportthatadifferenceexists.Ho:Pers
onisinnocent.Ha:Personisguilty.Whatarethepossibleoutcomeswhenthetruthisknown?22假设检验有罪vs.无罪的案例美国的司法体系可以用于阐述
假设检验的概念.在美国罪犯在被判有罪之前均是清白的.清白对应虚无性假设.它需要强而有力的证据,必需“排除所有合理的怀疑”才能把被告定罪.若陪审团裁定被告有罪则相当于拒绝虚无性假设接受选择性假设.更具体些,我们拥有重要的证据
证明差別的存在.Ho:被告是清白的.Ha:被告有罪.当得知真相后,可能的结果是什么?23TruthVerdictHo,=Ha,not=Ho,=Ha,not=InnocentJailedGuiltySetFreeInnocentSetFree
GuiltyJailedSetFreeJailInnocentGuiltyTheTypeIError(error)isrejectingHowhenitistrue–sometimescalledtheproducer’srisk.TheTypeIIError(berror)isfailing
torejectHowhenitisfalse–sometimescalledtheconsumer’srisk.DecisionType1ErrorType2ErrorbCorrectDecisionCorrectDecisionTruthHo,=Ha,not=Ho,=Ha,not=R
iskDecisionMakinginourCourtsandinBusiness24真相裁决Ho,=Ha,not=Ho,=Ha,not=清白监禁有罪释放清白释放有罪监禁释放监禁清白有罪TypeI错误
(错误)当Ho是无误时而拒绝–有时称作生产者风险TypeII错误(b错误)是当Ho有错误时却接受–有时称作消费者风险.DecisionType1错误Type2错误b正确决定正确决定真相Ho,=Ha,not=Ho,=Ha,not=法庭和商业上的决策风险25
ThepValueAnotherwaytomeasuretheriskinthedecisionisthroughthepValue.Thep-valueisknownastheObservedLevelofSignificanceforafactor.Itisth
echanceofobservingthisamountofdifferenceifthesampleisconsistentwiththepopulation.Thep-valueisalsotheprobabilityofbeingwrongifwerejecttheN
ullHypothesis(TypeIError.)Unlessthereisanexceptionbasedonengineeringjudgment,wewillsetanacceptancelevelofaTypeIer
rorat=0.05.Thus,anyp-valuelessthan0.05meanswerejecttheNullhypothesis.26p值衡量决策风险的另一种方法是通过P值.P值是指一个因素可测的重要性水平.当样本和总体相对时,P值是指观测到
其中差別的机会率.P值也指如果拒绝虚无性假设可能发生错误的概率(错误I)除非在基于工程判断上的例外,我们将错误I的可接受水平定在=0.05.从而,任何小于0.05的P值就表示虚无性假设被拒绝。27DefiningHypothesesNullHypothesesHO
:X1=TargetHO:X1=mHO:X1-X2=0HO:m1m2=0HO:X1=X2=X3=….XnHO:s1=s2HO:S1=S2HO:S1=S2=S3=….SnAlternativeHypothesesHA:m1m2Inequal
itiesaretwosidedtestsHA:X1X2HA:m1<m2HA:X1<X2OneSidedtestareusedfor<or>hypotheses.HA:m1>m2HA:X1>X2HA:X1X20___________________HA:X1X2<0______
_____________HA:X1X2>0___________________HA:s1s2_______________________HA:s1<s2_______________________HA:S1>S2___
____________________ScriptingHypothesesasequationsisusefulwhenstating.28定义假设虚无性假设HO:X1=TargetHO:X1=mHO:X1-X2=
0HO:m1m2=0HO:X1=X2=X3=….XnHO:s1=s2HO:S1=S2HO:S1=S2=S3=….Sn选择性假设HA:m1m2不等式是针对两边的测试HA:X1X2HA:m1<m2HA:X1<X2单边测试用于<或>假设.HA:m1>m2HA:X1>X2HA:X1
X20___________________HA:X1X2<0___________________HA:X1X2>0___________________HA:s1s2_______________________HA:
s1<s2_______________________HA:S1>S2_______________________把假设以等式陈述很有用.29HypothesisTestingProtocolThehypothesesarealwaysstatement
saboutthepopulationparameters.1.Stateyournullhypothesis(Ho)HO:TheheightofcitizensincountryAisequaltotheheightofcitizensincountryB(mA=mB).St
ateyourAlternativeHypothesis(Ha)HA:TheheightofcitizensincountryAislessthantheheightofcitizensincount
ryB(mA<mB).2.DeterminetheappropriatestatisticaltestbasedontheHypothesisbeingtested.3.Determinethelevelofacceptablerisk.Ri
sk:usually5%(Default)bRisk:Usually10–20%(Default)30假设检验协议假设总是关于总体参数的陈述.1.定明虚无性假设(Ho)HO:A国与B国居民身高相等(mA=mB).规定选择性假
设(Ha)HA:A国居民身高低于B国居民的身高(mA<mB).2.基于在需被测试的假设上,决定适合的统计测试.3.决定可接受的风险程度.风险:通常5%(预设值)b风险:通常10–20%(预设值)3
1HypothesisTestingProtocol(Cont)4.Determinethepropersamplesizeforthetest(Section5)5.Collectasampleofobs
ervationsfromthepopulation.6.Calculatestatisticsbasedonthesample.7.Useastatisticaltesttotestalternativehypothesi
s.8.Basedonthetestresult,weacceptorrejectHobasedonthepreviouslydeterminedcriterion.9.Translatetheresults.Translatethestatisticalconclusiontoapract
icalone.StatisticalConclusion:Canweprovedifferencestatistically.PracticalConclusion:Dowecareaboutthediffer
ence?32假设检验协议(续)4.决定适合测试的样本数量(第五部分)5.从总体中选取需观察的样本群.6.计算样本的统计.7.用统计测试方法测验选择性假设.8.在测试结果的基础上,根据先前确定的标准,做出选
择或拒绝Ho的决定.9.转化结果.将统计的结论转换为现实的结果.统计结论:我们能从统计的角度证明差別吗?现实结果:我们需注意这些差別吗?33We’llbeusinghypothesistestingasamethodtoprovechangeTargetordesiredval
ueOneSampletAremultiplesamplesthesameMean:twosampletorANOVAVariation:FtestorTestforEqualVarianceProportionofOccurrence:ChiSqua
reC234我们将假设检验作为证明差异的一种方法目标数值一个样本t若干样本是否一致一致:两个t样本或ANOVA不一致:用F测试或均等差异测试事情发生的概率:ChiSquareC2